ECE 3485 Paper Reference List
SURVEY ARTICLES:
AUTHOR: Stok, L.
TITLE: Data path synthesis
SOURCE: Integration, The VLSI Journal, vol.18, no.1, p. 1-71
YEAR: Dec. 1994
ABSTRACT: This paper reviews all the phases in data path synthesis:
register allocation, storage grouping, module allocation and
interconnect allocation. In addition, a new phase for the
storage value insertion is introduced. For each of these
phases a formal problem description is given. Restrictions
on the data path allocation phases are presented, which
delimit the problems to cases which can be solved by
polynomial algorithms. For the general cases, heuristics are
provided which have appeared to be effective in the
literature. Special data path architectures may require
special algorithms to make use of their features. Throughout
the paper architectural constraints are described and
effective algorithms for them derived. To construct an
effective data path allocation system, a scheme has to be
defined. The scheme determines which subproblems are solved
in what order and which constraints are taken into account
in each phase. The data flow graph and schedule and their
match with the data path architecture have a major impact on
the development of a scheme. This paper will point out the
trade-offs that have to be made when developing such a
scheme. This paper provides a reference to most of the data
path allocation algorithms published in the scope of high-
level synthesis. Most of the techniques are explained in
considerable detail and various examples are given. The
paper comments on the applicability of most of the
algorithms for particular data path allocation problems
(74 Refs.)
1996
----
AUTHOR: Ernst, R.; Henkel, J.; Benner, Th.; Ye, W.; Holtmann, U.;
Herrmann, D.; Trawny, M.
TITLE: The COSYMA environment for hardware/software cosynthesis of
small embedded systems
SOURCE: Microprocessors and Microsystems, vol.20, no.3, p. 159-66
YEAR: May 1996
ABSTRACT: COSYMA is a platform for the investigation of
hardware/software cosynthesis of small embedded systems.
Target architecture is currently limited to processor-
coprocessor configurations executing a single process or a
system of communicating processes which are statically
scheduled. Many aspects of cosynthesis such as automatic
hardware/software partitioning, efficient hardware/software
communication, timing and hardware overhead estimation and
analysis, interdependence of different cosynthesis phases,
data representation, etc., can successfully be investigated
in this manageable domain. COSYMA covers the complete design
flow from an input language similar to C down to netlist and
object code. Current focus is on high performance data
dominated systems, but first steps to incorporate control
dominated subtasks can be presented. Using a specific high-
level synthesis tool, the results show a considerable
speedup of the resulting processor-coprocessor system even
compared to modern RISC processors which is typically
limited by memory bandwidth (38 Refs.)
AUTHOR: Hong-Shin Jun; Sun-Young Hwang
TITLE: Automatic synthesis of dynamically configured pipelines
supporting variable data initiation intervals
SOURCE: IEEE Transactions on Very Large Scale Integration (VLSI)
Systems, vol.4, no.2, p. 279-85
YEAR: June 1996
ABSTRACT: The authors propose a new approach for synthesizing the
dynamically configured pipelines supporting variable data
initiation intervals (DIIs). Compared to the previous
research where the pipeline synthesis is confined to those
with fixed DIIs, the proposed system allows powerful design
space exploration by removing the constraints of fixed DIIs.
The proposed algorithm tries to optimize the area of
pipeline structures by fully utilizing hardware resources to
which abstract operations in high-level design descriptions
are assigned, while meeting the given timing constraints in
the clock cycle time, number of stages, and data initiation
sequence. Experimental results on benchmarks show that new
design points, efficient in speed and in area, can be found
by removing the restriction of fixed DIIs in the synthesis
of pipeline structures (7 Refs.)
AUTHOR: Chaudhuri, S.; Walker, R.A.
TITLE: Computing lower bounds on functional units before scheduling
SOURCE: IEEE Transactions on Very Large Scale Integration (VLSI)
Systems, vol.4, no.2, p. 273-9
YEAR: June 1996
ABSTRACT: The authors present a new polynomial-time algorithm for
computing lower bounds on the number of functional units
(FUs) of each type required to schedule a data flow graph in
a specified number of control steps. A formal approach is
presented that is guaranteed to find the tightest possible
bounds that can be found by relaxing either the precedence
constraints or integrality constraints on the scheduling
problem. This tight, yet fairly efficient, bounding method
can be used to estimate FU area, to generate resource
constraints for reducing the search space, or in conjunction
with exact techniques for efficient optimal design space
exploration (19 Refs.)
AUTHOR: Jha, P.K.; Dutt, N.D.
TITLE: High-level library mapping for arithmetic components
SOURCE: IEEE Transactions on Very Large Scale Integration (VLSI)
Systems, vol.4, no.2, p. 157-69
YEAR: June 1996
ABSTRACT: We describe high-level library mapping (HLLM), a technique
that permits reuse of complex RT-level databook components
(specifically ALUs). HLLM can be used to couple existing
databook libraries, module generators and custom-designed
components with the output of architectural or behavioral
synthesis. In this paper, we define the problem of high-
level library mapping, present some algorithmic formulations
for HLLM of ALUs, and demonstrate the versatility of our
approach on a variety of libraries. We also compare HLLM
against the traditional mapping approach using logic
synthesis. Our experiments show that HLLM for ALUs
outperforms logic synthesis in area, delay, and runtime,
indicating that HLLM is a promising approach for reuse of
datapath components in architectural design and high-level
synthesis (23 Refs.)
AUTHOR: Minjoong Rim; Jain, R.
TITLE: Valid transformations: a new class of loop transformations
for high-level synthesis and pipelined scheduling
applications
SOURCE: IEEE Transactions on Parallel and Distributed Systems,
vol.7, no.4, p. 399-410
YEAR: April 1996
ABSTRACT: In this paper we present a new class of loop optimizing
transformations called valid transformations, which are
suitable for fine-grain parallelization applications such as
high-level synthesis of VLSI designs or compilers for super-
scalar or VLIW machines. This class of transformations are
different from existing ones in that valid transformations
can be illegal. Nevertheless, if a transformation is valid,
the transformed loop has a feasible pipeline schedule. We
present an example valid transformation called loop
expansion which can help produce cost-performance efficient
designs and explore a larger design space for a satisfactory
design. Several examples are used to demonstrate the
efficacy of the proposed technique (53 Refs.)
AUTHOR: Mecha, H.; Fernandez, M.; Tirade, F.; Septien, J.; Motes,
D.; Olcoz, K.
TITLE: A method for area estimation of data-path in high level
synthesis
SOURCE: IEEE Transactions on Computer-Aided Design of Integrated
Circuits and Systems, vol.15, no.2, p. 258-65
YEAR: Feb. 1996
ABSTRACT: This paper describes a new method to estimate the area of
data paths generated during a High Level Synthesis (HLS)
process, when the information concerning the circuit is not
yet complete. Our method is more accurate and considers more
factors than those used by other HLS systems of which we are
aware. Our main concern is the interconnection area, often
neglected by HLS systems, which has a strong influence on
the final circuit area being optimized, as well as a high
dependency on the technology used and on the circuit area
itself. Predicting the area of a design layout with accuracy
is important because it allows one to foresee whether the
design will satisfy the area constraints, and will lend the
allocator towards the best design among several
possibilities with guarantees. Our estimations of the final
standard-cell layout area are similar, or even more
accurate, than those obtained following methods used by low-
level design systems, which have much more information
available. Due to the performance penalty their relatively
high complexity will produce, these methods are unusable in
an HLS system exploring a wide design space. Our estimation,
on the contrary, has a low complexity and can be repeated
time and again as the HLS design space is searched
(21 Refs.)
AUTHOR: Yang, J.C.-Y.; de Micheli, G.; Damiani, M.
TITLE: Scheduling and control generation with environmental
constraints based on automata representations
SOURCE: IEEE Transactions on Computer-Aided Design of Integrated
Circuits and Systems, vol.15, no.2, p. 166-83
YEAR: Feb. 1996
ABSTRACT: We introduce a framework for synthesis of behavioral models
in which design information is represented using an
automaton model. This model offers the advantage of
supporting different constraints (e.g., timing, resource,
synchronization, etc.) with a uniform formalism. The set of
all feasible execution traces (schedules) is constructed and
traversed using efficient BDD-based implicit state-traversal
techniques. As an application example of this formalism, we
present a novel scheduling/control-generation algorithm
under environmental constraints where both the design and
constraints are represented using automata. We present an
algorithm that generates a minimum-latency schedule and a
control unit representation. This approach is able to
exploit degrees of freedom among interacting components of a
multimodule system during scheduling, and is well suited for
system-level design, where component encapsulation and
interfacing are important (26 Refs.)
AUTHOR: Potkonjak, M.; Srivastava, M.B.; Chandrakasan, A.P.
TITLE: Multiple constant multiplications: efficient and versatile
framework and algorithms for exploring common subexpression
elimination
SOURCE: IEEE Transactions on Computer-Aided Design of Integrated
Circuits and Systems, vol.15, no.2, p. 151-65
YEAR: Feb. 1996
ABSTRACT: Many applications in DSP, telecommunications, graphics, and
control have computations that either involve a large number
of multiplications of one variable with several constants,
or can easily be transformed to that form. A proper
optimization of this part of the computation, which we call
the multiple constant multiplication (MCM) problem, often
results in a significant improvement in several key design
metrics, such as throughput, area, and power. However, until
now little attention has been paid to the MCM problem. After
defining the MCM problem, we introduce an effective problem
formulation for solving it where first the minimum number of
shifts that are needed is computed, and then the number of
additions is minimized using common subexpression
elimination. The algorithm for common subexpression
elimination is based on an iterative pairwise matching
heuristic. The power of the MCM approach is augmented by
preprocessing the computation structure with a new scaling
transformation that reduces the number of shifts and
additions. An efficient branch and bound algorithm for
applying the scaling transformation has also been developed.
The flexibility of the MCM problem formulation enables the
application of the iterative pairwise matching algorithm to
several other important and common high level synthesis
tasks, such as the minimization of the number of operations
in constant matrix-vector multiplications, linear
transforms, and single and multiple polynomial evaluations.
All applications are illustrated by a number of benchmarks
(66 Refs.)
AUTHOR: Ti-Yen Yen; Wolf, W.
TITLE: An efficient graph algorithm for FSM scheduling
SOURCE: IEEE Transactions on Very Large Scale Integration (VLSI)
Systems, vol.4, no.1, p. 98-112
YEAR: March 1996
ABSTRACT: This paper presents a new algorithm for scheduling control-
dominated designs during high-level synthesis. Our algorithm
can schedule systems with arbitrary control flow, including
conditional branches and multiple loops. It can handle both
upper bound and lower bound timing constraints. The timing
constraints can cross basic block boundaries, span different
iterations of a loop, and form interlocking cycles in the
control flow. A scheduling problem is described by the
behavior finite-state machine model, an automaton model for
the behavioral specification and synthesis of control-
dominated systems. We optimize the performance of the
produced digital circuit implementation by minimizing the
execution time of each state transition in the state
transition graph. The finite-state machines (FSM) scheduling
algorithm is based on previous work on cylindrical layout
compaction; we extend that work to handle upper bound
constraints, allow multiple loops, and not require an
initial feasible solution. Experimental results for examples
derived from real designs and benchmark descriptions
demonstrate that the algorithm can handle complex
combinations of constraints very efficiently (30 Refs.)
AUTHOR: Prihozhy, A.
TITLE: Net scheduling in high-level synthesis
SOURCE: IEEE Design & Test of Computers, vol.13, no.1, p. 26-35
YEAR: Spring 1996
ABSTRACT: A new net scheduling and allocation model generates net
schedules that minimize either execution time or resources.
The author tested the model within a VHDL-based high-level
synthesis system called Ahiles (21 Refs.)
AUTHOR: Radivojevic, I.; Brewer, F.
TITLE: A new symbolic technique for control-dependent scheduling
SOURCE: IEEE Transactions on Computer-Aided Design of Integrated
Circuits and Systems, vol.15, no.1, p. 45-57
YEAR: Jan. 1996
ABSTRACT: This paper describes an exact symbolic formulation of
control-dependent, resource-constrained scheduling. The
technique provides a closed-form solution set in which all
satisfying schedules are encapsulated in a compressed OBDD-
based representation. This solution format greatly increases
the flexibility of the synthesis task by enabling
incremental incorporation of additional constraints and by
supporting solution space exploration without the need for
rescheduling. The technique provides a systematic treatment
of speculative operation execution in arbitrary forward-
branching control/data paths. An iterative construction
method is presented along with benchmark results. The
experiments demonstrate the ability of the proposed
technique to efficiently exploit parallelism not explicitly
specified in the input description (48 Refs.)
AUTHOR: Sait, S.M.; Elleithy, K.M.; Masud-Ul-Hasan
TITLE: Formal synthesis of VLSI layouts from algorithmic
specifications
SOURCE: Computer Systems Science and Engineering, vol.11, no.2,
p. 67-81
YEAR: March 1996
ABSTRACT: Due to advances in VLSI technology, it is possible to
implement complex digital systems on a single chip. However,
modeling such large and complex systems at the structural
level is tedious and error-prone. This fact has motivated
the development of several high-level synthesis systems. The
process consists of translating the abstract behavioural
representation generally at the algorithmic level into a
structure-realizable representation. In this paper, we
present a formal approach for high-level synthesis. This
formal high-level synthesis system uses mu -recursive
algorithms to model the behaviour to be synthesized. These
algorithms can be mathematically verified for correctness
before being subjected to the task of translation. As a case
study, the modelling and synthesis of VLSI layouts for
matrix-matrix multipliers is discussed (37 Refs.)
AUTHOR: Chen, C.H.
TITLE: Data path synthesis in digital electronics. I. Memory
allocation
SOURCE: IEEE Transactions on Aerospace and Electronic Systems,
vol.32, no.1, p. 2-15
YEAR: Jan. 1996
ABSTRACT: A data path consists of memory elements (i.e., registers),
data operators (i.e. ALUs) and interconnection units (i.e.
buses) to control the data transfers in the digital system.
Many approaches to hardware allocation for data path
synthesis have been proposed in the literature; however,
only single-port memory is considered for register
allocation and no efficient synthesis approach for multiport
memory synthesis. A novel design methodology for data path
synthesis using multiport memories is proposed which can be
applied to hardware allocation algorithms or to already
synthesized data path as a postprocessor to achieve a better
design. Illustrations of applying this method to different
synthesis examples are presented. Results and improvements
over previous techniques are demonstrated. Experiments on
benchmarks show very promising results (22 Refs.)
AUTHOR: Chen, C.H.
TITLE: Data path synthesis in digital electronics. II. Bus
synthesis
SOURCE: IEEE Transactions on Aerospace and Electronic Systems,
vol.32, no.1, p. 16-33
YEAR: Jan. 1996
ABSTRACT: For pt. I see ibid., vol. 32, no. 1, p. 1-15 (1996). Common
buses are an extremely efficient structure for achieving
area minimization so that the bus-oriented interconnection
of registers and data operators plays an important role in
data path synthesis. The overriding design goal is
efficiently allocating the minimum number of buses and
gating elements (i.e. multiplexers) for achieving
communication between the data path elements. New efficient
algorithms for the automated allocation of buses in data
paths have been developed. The entire allocation process can
be formulated as a graph partitioning problem. This
formulation readily lends itself to the use of a varieties
of heuristics for solving the allocation problem We present
efficient algorithms which provide excellent solutions to
this formulation of the allocation problem The operation of
the algorithms is clearly demonstrated using detailed
examples (35 Refs.)
*******************************************************************************
1995
----
AUTHOR: Hirakawa, Y.; Yoshida, M.; Harashima, K.; Fukunaga, K.
TITLE: A method of data path allocation by pattern matching on the
data flow graph
SOURCE: VLSI Signal Processing, VIII (Cat. No.95TH8067), p. xv+582,
254-63
YEAR: 1995
ABSTRACT: Allocation and scheduling are important tasks in high level
synthesis. Allocation is an assignment of operations and
variables into registers and functional units and
interconnects these circuit units, while ensuring the given
circuit conditions. Since the result of allocation has
direct effects upon the size of a chip, design of allocation
is strongly required to make use of hardware (i.e.
registers, functional units, buses etc.) efficiently to
minimize the size of a chip. In this paper, we propose an
allocation method which minimizes the measure of area
concerning the interconnection of circuit units, based on
the dependence of operations given by the data flow graph
(DFG) (4 Refs.)
AUTHOR: Diguet, J.P.; Sentieys, O.; Philippe, J.L.; Martin, E.
TITLE: Probabilistic resource estimation for pipeline architecture
SOURCE: VLSI Signal Processing, VIII (Cat. No.95TH8067), p. xv+582,
217-26
YEAR: 1995
ABSTRACT: This paper presents a new approach to resource estimation in
high level synthesis. Given a set of operators and a data
flow graph specification, we apply a probability based
method to compute the probable numbers of operators,
registers, bus and operator connections for each time step
or algorithm latency. Combined with statistical metrics, we
obtain a quick and accurate estimation module that includes
real precedence constraints. The aim of this work is to
provide ASICs designers with a guidance tool for a better
use of high level synthesis. Algorithm properties like dated
resource concurrence and operator link statistics are
computed to guide algorithmic, transformations and hardware
selection (13 Refs.)
AUTHOR: Ito, R.; Parhi, K.K.
TITLE: Register minimization in cost-optimal synthesis of DSP
architectures
SOURCE: VLSI Signal Processing, VIII (Cat. No.95TH8067), p. xv+582,
207-16
YEAR: 1995
ABSTRACT: In this paper we propose a generalized technique to count
the number of registers supporting overlapped scheduling and
a general digit-serial data format. This technique is
integrated into an integer linear programming model which
minimizes the cost of registers as well as the cost of
processors and data format converters to synthesize a cost-
optimal architecture for a given digital signal processing
algorithm. It is shown that by including the cost of
registers in the synthesis task as proposed in this paper
leads to up to 12.8% savings in the total cost of the
synthesized architecture when compared with synthesis
performed without including the register cost in the total
cost (16 Refs.)
AUTHOR: Munch, M.; Wehn, N.; Glesner, M.
TITLE: Optimum simultaneous placement and binding for bit-slice
architectures
SOURCE: Proceedings of the ASP-DAC`95/CHDL`95/VLSI`95. Asia and
South Pacific Design Automation Conference. IFIP
International Conference on Computer Hardware Description
Languages and their Applications. IFIP Interntional
Conference on Very Large Scale Integration (IEEE Cat.
No.95TH8102), p. xxxii+860, 735-40
YEAR: 1995
ABSTRACT: Traditionally, the problems of binding and placement in the
synthesis of digital circuits have been formulated and
solved separately. However, placement and binding strongly
interact and design decisions taken during these phases
determine the interconnect structure. As feature sizes
continue to decrease, the delay caused by signal propagation
through interconnect more and more dominate the overall
system performance. Hence, optimizing interconnect becomes
increasingly important. In this paper, we propose for the
first time an analytical approach to capture the placement
and binding problems in a single, unified Mixed Integer
Linear Programming (MILP) model which also allows minimizing
the overall interconnect structure. Such a model can serve
as a starting point for deriving efficient heuristics in
that it captures all information required for a
comprehensive analysis of the problem. The target
architecture is a linear bit-slice datapath, however, the
model can be easily extended to handle two-dimensional
datapath models (23 Refs.)
AUTHOR: Rajan, S.P.
TITLE: Correctness of transformations in high level synthesis
SOURCE: Proceedings of the ASP-DAC`95/CHDL`95/VLSI`95. Asia and
South Pacific Design Automation Conference. IFIP
International Conference on Computer Hardware Description
Languages and their Applications. IFIP Interntional
Conference on Very Large Scale Integration (IEEE Cat.
No.95TH8102), p. xxxii+860, 597-603
YEAR: 1995
ABSTRACT: This paper presents a formal approach to address the
correctness of transformations in high-level synthesis. The
novelty of the work is that a small set of properties that
capture a general notion of refinement of control/data-flow
graphs used in an industrial synthesis framework have been
given, and the properties are independent of the underlying
behaviour model. We have mechanized the specification and
verification of several optimization and refinement
transformations used in industrial hardware design. This
work has enabled to find and rectify errors in the
transformations. Further, the work has led to generalization
of transformations typically used in high-level synthesis
(22 Refs.)
AUTHOR: Hartenstein, R.W.; Kress, R.
TITLE: A datapath synthesis system for the reconfigurable datapath
architecture
SOURCE: Proceedings of the ASP-DAC`95/CHDL`95/VLSI`95. Asia and
South Pacific Design Automation Conference. IFIP
International Conference on Computer Hardware Description
Languages and their Applications. IFIP Interntional
Conference on Very Large Scale Integration (IEEE Cat.
No.95TH8102), p. xxxii+860, 479-84
YEAR: 1995
ABSTRACT: A datapath synthesis system (DPSS) for the reconfigurable
datapath architecture (rDPA) is presented. The DPSS allows
automatic mapping of high level descriptions onto the rDPA
without manual interaction. The required algorithms of this
synthesis system are described in detail. Optimization
techniques like loop folding or loop unrolling are sketched.
The rDPA is scalable to arbitrarily large arrays and
reconfigurable to be adaptable to the computational problem.
Fine grained parallelism is achieved by using simple
reconfigurable processing elements which are called datapath
units (DPUs). The rDPA can be used as a reconfigurable ALU
for bus oriented systems as well as for rapid prototyping of
high speed datapaths (7 Refs.)
AUTHOR: Masuda, A.; Imai, H.; Hansen, J.P.; Sekine, M.
TITLE: Search space reduction in high level synthesis by use of an
initial circuit
SOURCE: Proceedings of the ASP-DAC`95/CHDL`95/VLSI`95. Asia and
South Pacific Design Automation Conference. IFIP
International Conference on Computer Hardware Description
Languages and their Applications. IFIP Interntional
Conference on Very Large Scale Integration (IEEE Cat.
No.95TH8102), p. xxxii+860, 471-7
YEAR: 1995
ABSTRACT: Most existing high-level synthesis (HLS) systems attempt to
generate a circuit from a behavioral description "out of the
void", using the entire design space as the search domain.
Because of the vastness of the search space, it is
impossible to do more than a coarse grain search, often
resulting in inefficient designs. This approach ignores the
designer`s knowledge of the general structure of the circuit
to be synthesized. In this paper, we describe the HLS system
SIDER (Synthesis by Initial Design Extension and
Refinement). SIDER utilizes designer knowledge about the
design space in the form of an initial circuit. By limiting
search to the neighborhood of this initial circuit, much
finer grain search can be performed yielding a higher
quality design. The effectiveness of the SIDER approach is
shown by HLS of a 300 line C description of 27 instructions
from a MC6502 CPU (11 Refs.)
AUTHOR: Chapman, R.
TITLE: Timed dependence flow graphs, an intermediate form for
verified high-level synthesis
SOURCE: Proceedings Eighth Annual IEEE International ASIC Conference
and Exhibit (Cat. No.95TH8087), p. xv+422, 109-12
YEAR: 1995
ABSTRACT: We present timed dependence flow graphs, an intermediate
form for high-level synthesis from specifications written in
behavioral hardware description languages. Timed dependence
flow graphs express data, control, resource access, and
timing dependences, and call be constructed in linear time
from behavioral VHDL descriptions. We also present a formal
execution semantics for timed dependence flow graphs, which
we are using in verification of our high-level synthesis
tools (11 Refs.)
AUTHOR: Hae-Dong Lee; Sun-Young Hwang
TITLE: A scheduling algorithm for multiport memory minimization in
datapath synthesis
SOURCE: Proceedings of the ASP-DAC`95/CHDL`95/VLSI`95. Asia and
South Pacific Design Automation Conference. IFIP
International Conference on Computer Hardware Description
Languages and their Applications. IFIP Interntional
Conference on Very Large Scale Integration (IEEE Cat.
No.95TH8102), p. xxxii+860, 93-100
YEAR: 1995
ABSTRACT: In this paper, we present a new scheduling algorithm that
generates area-efficient register transfer level datapaths
with multiport memories. The proposed scheduling algorithm
assigns an operation to a specific control step such that
maximal sharing of functional units can be achieved with
minimal number of memory ports, while satisfying given
constraints. We propose a measure of multiport memory cost,
MAV (Multiple Access Variable) which is defined as a
variable accessed at several control steps, and overall
memory cost is reduced by equally distributing the MAVs
throughout all the control steps. When compared with
previous approaches for several benchmarks available from
the literature, the proposed algorithm generates the
datapaths with less memory modules and interconnection
structures by reflecting the memory cost in the scheduling
process (13 Refs.)
AUTHOR: Heijligers, M.J.M.; Cluitmans, L.J.M.; Jess, J.A.G.
TITLE: High-level synthesis scheduling and allocation using genetic
algorithms
SOURCE: Proceedings of the ASP-DAC`95/CHDL`95/VLSI`95. Asia and
South Pacific Design Automation Conference. IFIP
International Conference on Computer Hardware Description
Languages and their Applications. IFIP Interntional
Conference on Very Large Scale Integration (IEEE Cat.
No.95TH8102), p. xxxii+860, 61-6
YEAR: 1995
ABSTRACT: In this article a scheduling method is presented which is
capable of allocating supplementary resources during
scheduling. This makes it very suitable in synthesis
strategies based on lower bound estimations techniques. The
method is based on genetic algorithms. Special coding
techniques and analysis methods are used to improve the
runtime and quality of the results. The scheduler can easily
be extended to cover other architectural issues and (for
example) provides ways to make trade-offs between functional
unit allocation and register allocation. Experiments and
comparisons show high quality results and fast run times
that outperform results produced by other heuristic
scheduling methods (23 Refs.)
AUTHOR: Schonfeld, M.; Franzen, J.; Schwiegershausen, M.; Pirsch,
P.; Vehlies, U.; Munzner, A.
TITLE: The LISA design environment for the synthesis of array
processors including memories for the data transfer and
fault tolerance by reconfiguration and coding techniques
SOURCE: Journal of VLSI Signal Processing, vol.11, no.1-2, p. 51-74
YEAR: Oct.-Nov. 1995
ABSTRACT: The LISA design environment transforms computation extensive
digital signal processing algorithms into array processor
architectures. It supports the complete design flow from
algorithmic specification in a high-level programming
language to circuit description at the gate level. From the
input description a graph representation is derived by
symbolic execution and further mapped onto different
architectures. Netlists in different formats can be
extracted using the integrated synthesis of arithmetic
building blocks. For the adaptation of the architectures to
external interfaces LISA includes the synthesis of
intermediate memories consisting of multiport RAMs or
register circuits. Another part of LISA allows the
application of reconfiguration and coding techniques at
different design levels to incorporate fault tolerance into
both, array processors and intermediate memories. By the
homogeneous representation in LISA all parts of the design
process are handled in the same environment. Thus, different
architectures of array processors including data transfer,
control, and fault tolerance can be compared with respect to
area, computation time, and reliability (37 Refs.)
AUTHOR: Gebotys, C.H.
TITLE: An optimal methodology for synthesis of DSP multichip
architectures
SOURCE: Journal of VLSI Signal Processing, vol.11, no.1-2, p. 9-19
YEAR: Oct.-Nov. 1995
ABSTRACT: An optimization approach to the synthesis of multichip DSP
architectures is presented. This research is important for
industry since it is well known that these early design
decisions have a significant impact on the final VLSI
implementation. A mathematical programming approach to
simultaneously scheduling, partitioning (into multiple
chips) and allocating minimum hardware (functional units on
each chip) for the DSP application is formulated.
Throughput, input/output timing, and latency constraints are
supported along with interchip communication delays. By
using polyhedral theory, the optimal solution to the integer
programming problem can be obtained in fast cpu times.
Results show that one can synthesize optimal two-chip, three-
chip and four-chip architectures for a realistic industrial
DSP application in reasonable cpu times. This research
breaks new ground by simultaneously partitioning, scheduling
and allocating multichip DSP architectures with optimal area
in fast cpu times (21 Refs.)
AUTHOR: Eisenbiegler, D.; Kumar, R.
TITLE: Formally embedding existing high level synthesis algorithms
SOURCE: Correct Hardware Design and Verification Methods. IFIP WG
10.5 Advanced Research Working Conference, CHARME `95.
Proceedings, p. viii+342, 71-83
YEAR: 1995
ABSTRACT: This paper introduces a general scheme for formally
embedding high level synthesis by formulating its basic
steps as transformations within higher order logic. A
functional representation of a data flow graph is
successively refined by means of generic logical
transformations. Algorithms that are based on logical
transformations guarantee "correctness by design". They not
only construct an implementation but also derive the proof
for its formal correctness, on the fly. An extra post-
synthesis-verification step becomes obsolete. The logical
transformations presented in this paper form a framework for
formally embedding existing high-level-synthesis procedures
(17 Refs.)
AUTHOR: Hong-Shin Jun; Sun-Young Hwang
TITLE: Design of a synthesis system for pipeline structures with
variable data initiation intervals
SOURCE: Journal of Microelectronic Systems Integration, vol.3, no.3,
p. 189-203
YEAR: Sept. 1995
ABSTRACT: In this paper, we propose a novel approach for synthesizing
the dynamically configured pipelines supporting variable
data initiation intervals (DIIs). Compared to the previous
researches in which the pipeline synthesis is confined to
those with fixed DIIs, the proposed system allows powerful
design space exploration by removing the restriction of
fixed DIIs. The proposed algorithm tries to optimize the
area of pipeline by fully utilizing hardware resources to
which abstract operations in high-level design descriptions
are assigned, while meeting the given time constraints in
the data initiation intervals, the clock cycle time and the
number of stages (15 Refs.)
AUTHOR: Kim, T.; Liu, C.L.
TITLE: A new approach to the multiport memory allocation problem in
data path synthesis
SOURCE: Integration, The VLSI Journal, vol.19, no.3, p. 133-60
YEAR: Nov. 1995
ABSTRACT: A new approach to the problem of allocation of multiport
memory modules for data storage in data path synthesis is
presented. Previous approaches solve the allocation problem
in two separate steps: (i) grouping the variables (or
registers) to form memory modules and (ii) determining the
interconnections between the memory modules and functional
units. Yet, there is no easy way to predict the result of
step (ii) during step (i). In our approach, we place primary
importance on the cost of interconnections. Consequently, we
try to minimize the cost of interconnections first and then
to group the variables to form memory modules later. For a
number of benchmark problems, it was demonstrated that this
approach is quite effective and produces very good solutions
to the allocation problem for multiport-memory-based
architecture (20 Refs.)
AUTHOR: Potkonjak, M.; Wolf, W.
TITLE: Cost optimization in ASIC implementation of periodic hard-
real time systems using behavioral synthesis techniques
SOURCE: 1995 IEEE/ACM International Conference on Computer-Aided
Design. Digest of Technical Papers (Cat. No.95CB35859),
p. xxviii+743, 446-51
YEAR: 1995
ABSTRACT: Modern applications are often defined as sets of several
computational tasks. This paper presents a synthesis
algorithm for ASIC implementations which realize multiple
computational tasks under hard real-time deadlines. The
algorithm analyzes constraints imposed by task sharing as
well as the traditional datapath synthesis criteria. In
particular we demonstrated an efficient technique to combine
rate-monotonic scheduling, a widely used hard real-time
systems scheduling discipline, with estimations and
scheduling and allocation algorithms. Matching the number of
bits in tasks assigned to the same processor was the most
important factor in obtaining good designs. We have
demonstrated the effectiveness of our algorithms on several
multiple-task examples (21 Refs.)
AUTHOR: Wing Hang Wong; Jain, R.
TITLE: PARAS: System-level concurrent partitioning and scheduling
SOURCE: 1995 IEEE/ACM International Conference on Computer-Aided
Design. Digest of Technical Papers (Cat. No.95CB35859),
p. xxviii+743, 440-5
YEAR: 1995
ABSTRACT: Partitioning for the ASIC designs is examined and the
interaction between high-level synthesis and partitioning is
studied and incorporated in the solution. Four algorithms
(called PARAS) which can exploit this interaction by solving
the scheduling and partitioning problems concurrently are
presented. PARAS maximizes the overall performance of the
final design and considers different chip configurations and
communication structures. Experiments, conducted with
specifications ranging in size from few to hundreds of
operations, demonstrate the success of this approach
(15 Refs.)
AUTHOR: Kropf, T.; Schneider, K.; Kumar, R.
TITLE: A formal framework for high level synthesis
SOURCE: Theorem Provers in Circuit Design. Theory, Practice and
Experience. Second International Conference, TPCD '94.
Proceedings, p. viii+303, 223-38
YEAR: 1995
ABSTRACT: In this paper, we propose a new approach to formal synthesis
which focuses on the generation of verification-friendly
circuits. Starting from a high-level implementation
description, which may result from the application of usual
scheduling and allocation algorithms, hardware is
automatically synthesized. The target architecture is based
on handshake processes, modules which communicate by a
simple synchronizing handshake protocol. The circuits result
from the application of only a few basic operations like
synchronization, sequential execution or iteration of base
handshake processes. Each process is guided by an abstract
theorem that is used to derive proof obligations, to be
justified after synthesis. Automation has been achieved to
the extend that only those "relevant" proof obligations
remain to be proven manually, e.g. theorems for data-
dependent loops and lemmata about the used data types. The
process-oriented implementation language is enriched by loop
invariants. If those are given prior to the synthesis
process and the underlying data types are only Booleans,
i.e. finite-length bitvectors, then the complete synthesis
and verification process runs automatically (17 Refs.)
AUTHOR: Shih-Hsu Huang; Yu-Chin Hsu; Yen-Jen Oyang
TITLE: A new scheduling algorithm for synthesizing the control
blocks of control-dominated circuits
SOURCE: Microprocessing & Microprogramming, vol.41, no.7, p. 501-19
YEAR: Nov. 1995
ABSTRACT: This paper describes a new scheduling algorithm for
automatic synthesis of the control blocks of control-
dominated circuits. The proposed scheduling algorithm is
distinctive in its approach to partition a control/data flow
graph (CDFG) into an equivalent state transition graph. It
works on the CDFG to exploit operation relocation, chaining,
duplication, and unification. The optimization goal is to
schedule each execution path as fast as possible. Benchmark
data shows that this approach achieved better results over
the previous ones in terms of the speedup of the circuit and
the number of states and transitions (18 Refs.)
AUTHOR: Jong, C.C.; Lam, Y.Y.H.
TITLE: Using time zones for data path allocation in high-level
synthesis of digital systems
SOURCE: International Journal of Electronics, vol.79, no.5, p. 627-
40
YEAR: Nov. 1995
ABSTRACT: The paper describes the data path allocation in high-level
synthesis of digital systems using a new approach named the
time-zone approach. Using the time-zone approach, where time
steps are partitioned into zones, the register allocation
and the module allocation for a given scheduled data flow
graph (DFG) are performed in the same phase. The number of
registers required to store the values in the DFG is
minimized, and at the same time the interconnections between
the registers and the modules (as well as the number of
multiplexers required) are optimized. The experimental
results obtained from testing several published benchmarks
show that the allocation results are improved in terms of
reduction of interconnections and the number of multiplexers
and registers (15 Refs.)
AUTHOR: Septien, J.; Mozos, D.; Tirado, J.F.; Hermida, R.;
Fernandez, M.; Mecha, H.
TITLE: FIDIAS: an integral approach to high-level synthesis
SOURCE: IEE Proceedings-Circuits, Devices and Systems, vol.142,
no.4, p. 227-35
YEAR: Aug. 1995
ABSTRACT: An integral approach to the problem of high-level synthesis
of digital architectures as implemented in the FIDIAS
synthesis system is described. The synthesis task is split
into different subtasks, the interrelation between them is
no longer ignored as happens in most systems. One of the
system's outstanding features is the control of global
system behaviour performed by an expert system that deals
with such interrelations and aims it towards the required
goals. The most important subtasks are explained in detail.
Among these subtasks, cycle-time estimation, operation
scheduling, allocation search algorithm and area estimation
are found. Heuristics for intelligent bounding of the design
space during allocation are shown. Also presented is one of
the most novel features of FIDIAS, which is the use of
heuristics to guide the search performed by the allocation
algorithm through the accessible design space (28 Refs.)
AUTHOR: Hae-Dong Lee; Sun-Young Hwang
TITLE: Design of an area-efficient allocation algorithm for
datapaths with multiport memories
SOURCE: Journal of Microelectronic Systems Integration, vol.3, no.1,
p. 3-17
YEAR: March 1995
ABSTRACT: In this paper, we present a heuristic algorithm that
generates area-efficient datapaths with multiport memories.
The proposed algorithm assigns variables to multiport memory
modules, so that overall cost can be reduced. In the
process, efforts are made to reduce the number of memory
modules, the number of elements in each memory module, and
the number of interconnections in the generated datapaths.
Interconnection cost is further reduced in a number of
multiplexer inputs by exchanging connections between memory
modules and FUs performing commutative operations in the
interconnection binding phase. Experimental results show the
effectiveness of the proposed algorithm. When compared with
previous approaches for several benchmarks available from
the literature, the proposed algorithm generates the
datapaths with fewer registers in memory modules and less
interconnection cost (17 Refs.)
AUTHOR: Ohm, S.Y.; Kurdahi, F.J.; Dutt, N.; Xu, M.
TITLE: A comprehensive estimation technique for high-level
synthesis
SOURCE: Proceedings of the Eighth International Symposium on System
Synthesis (IEEE Cat. No.95TH8050), p. xiii+175, 122-7
YEAR: 1995
ABSTRACT: We present an integrated approach aimed at predicting layout
area needed to implement a behavioral description for a
given performance goal. Our approach is novel because: (1)
it accounts for all types of RT level components (FUs,
buses, registers), (2) it is highly flexible, allowing the
designer to tradeoff one type of resource with another and
considers dependencies between these different types, (3) it
is vertically integrated to include provably accurate
physical level estimators, and hence provides realistic
accounting of layout effects, and (4) it uses a timing model
with finer granularity, accounting for various delays in RTL
datapaths. We demonstrate our technique on a variety of HLS
benchmarks and show that efficient and effective design
space exploration can be accomplished using this technique
(21 Refs.)
AUTHOR: Musoll, E.; Cortadella, J.
TITLE: Scheduling and resource binding for low power
SOURCE: Proceedings of the Eighth International Symposium on System
Synthesis (IEEE Cat. No.95TH8050), p. xiii+175, 104-9
YEAR: 1995
ABSTRACT: Decisions taken at the earliest steps of the design process
may have a significant impact on the characteristics of the
final implementation. This paper illustrates how power
consumption issues can be tackled during the scheduling and
resource-binding steps of high-level synthesis. Algorithms
for these steps targeting at low-power data-paths and
trading off, in some cases, speed and area for low power are
presented. The algorithms focus on reducing the activity of
the functional units (adders, multipliers) by minimizing the
transitions of their input operands. The power consumption
of the functional units accounts for a large fraction of the
overall data-path power budget (21 Refs.)
TEM: 172
AUTHOR: Chaudhuri, S.; Blythe, S.A.; Walker, R.A.
TITLE: An exact methodology for scheduling in a 3D design space
SOURCE: Proceedings of the Eighth International Symposium on System
Synthesis (IEEE Cat. No.95TH8050), p. xiii+175, 78-83
YEAR: 1995
ABSTRACT: This paper describes an exact solution methodology,
implemented in Rensselaer's Voyager design space exploration
system, for solving the scheduling problem in a 3-
dimensional (3D) design space: the usual 2D design space
(which trades off area and schedule length), plus a third
dimension representing clock length. Unlike design space
exploration methodologies which rely on bounds or estimates,
this methodology is guaranteed to find the globally optimal
solution to the 3D scheduling problem. Furthermore, this
methodology efficiently prunes the search space, eliminating
provably inferior design points through: a careful selection
of candidate clock lengths; and tight bounds on the number
of functional units of each type or on the schedule length
(27 Refs.)
AUTHOR: Economakos, G.; Papakonstantinou, G.; Tsanakas, P.
TITLE: An attribute grammar approach to high-level automated
hardware synthesis
SOURCE: Information and Software Technology, vol.37, no.9, p. 493-
502
YEAR: Sept. 1995
ABSTRACT: Attribute grammars have been used extensively in every phase
of traditional compiler construction. Since some of these
phases have also been used in automated hardware synthesis
(hardware compilation), attribute grammars can be
effectively adopted to handle the two major tasks of high
level hardware synthesis, operation scheduling and hardware
allocation, implementing various algorithms. The paper
presents an attribute grammar driven scheduling system, as a
more abstract way of handling the whole high level hardware
synthesis task, while maintaining the desired functionality
by the utilization of existing and well tested tools and
techniques transferred from traditional compiler
construction (32 Refs.)
AUTHOR: Al-Sukhni, H.F.; Youssef, H.; Sait, S.M.; Benten, M.S.T.
TITLE: Loop based scheduling for high level synthesis
SOURCE: Conference Proceedings of the 1995 IEEE Fourteenth Annual
International Phoenix Conference on Computers and
Communications (Cat. No.95CH35751), p. xvii+742, 76-81
YEAR: 1995
ABSTRACT: This paper describes a new loop based scheduling algorithm.
The algorithm aims at reducing the runtime processing
complexity of path based scheduling techniques. It
partitions the control flow graph of the input specification
into subgraphs before scheduling the different paths of each
subgraph. Benchmark tests as well as simulation results on
the scheduling algorithm indicate that the proposed
algorithm results in sizeable reduction in runtime (9 Refs.)
AUTHOR: Rahmouni, M.; Jerraya, A.A.
TITLE: PPS: a pipeline path-based scheduler
SOURCE: Proceedings. The European Design and Test Conference. ED&TC
1995 (Cat. No.95TH8058), p. xxvii+611, 557-61
YEAR: 1995
ABSTRACT: This paper presents a scheduling algorithm that improves on
other approaches when dealing with the synthesis of control-
flow dominated behavioral descriptions. It achieves this
through the use of a constraint-driven path-based scheduling
algorithm. The suboptimality of the original path-based
algorithms when dealing with loops is overcome through a new
technique for pipelining different loop iterations during
execution path generation. Results show that the algorithm
always generates the fastest solution in terms of clock
cycles (9 Refs.)
AUTHOR: Holtmann, U.; Ernst, R.
TITLE: Combining MBP-speculative computation and loop pipelining in
high-level synthesis
SOURCE: Proceedings. The European Design and Test Conference. ED&TC
1995 (Cat. No.95TH8058), p. xxvii+611, 550-6
YEAR: 1995
ABSTRACT: Frequent control dependencies caused by IF- and loop-
statements limit the parallelism usable in High-Level
Synthesis. Loop pipelining is a powerful way to increase
parallelism, but is often limited by these control
dependencies. Multiple branch prediction (MBP-SC) applies
loop pipelining and speculative computation to the most
probable path and serves other paths during the restore
phase (prediction error correction). In this paper we
combine MBP-SC and loop pipelining and give a scheduling
algorithm. Further MBP-SC improvement comes from parallel
branch execution. The results show a considerable speedup
compared to previous approaches (12 Refs.)
AUTHOR: Minjoong Rim; Yaw Fann; Rajiv Jain
TITLE: Global scheduling with code-motions for high-level synthesis
applications
SOURCE: IEEE Transactions on Very Large Scale Integration (VLSI)
Systems, vol.3, no.3, p. 379-92
YEAR: Sept. 1995
ABSTRACT: In this paper, we present a global scheduling technique for
synthesis applications. The algorithm accepts a
specification containing conditional branches and while-loop
constructs and schedules it for a given set of resources.
The algorithm performs several types of code motions across
different basic blocks and trades off cost with performance.
Several real-life examples taken from Numerical Recipes in C
are used to demonstrate the efficacy of the approach. The
results indicate that code-motions are very important for
achieving significant speed-ups for synthesis applications
(57 Refs.)
AUTHOR: Kucukcakar, K.; Parker, A.C.
TITLE: A methodology and design tools to support system-level VLSI
design
SOURCE: IEEE Transactions on Very Large Scale Integration (VLSI)
Systems, vol.3, no.3, p. 355-69
YEAR: Sept. 1995
ABSTRACT: System-level design involves making major design decisions
without having accurate information on the eventual system
characteristics. This paper presents a novel constraint-
driven methodology to support system-level design. The
software assists a designer or a tool in partitioning
behavioral specifications onto multiple VLSI chips and in
system design while satisfying hard constraints such as
individual chip areas, chip pin counts, system throughput
(inverse of system initiation interval) and system latency
(delay). The software uses search and estimation techniques
to perform comprehensive design-space exploration and
evaluates partitions supplied by the user or by other
synthesis software. The technique determines what design
characteristics each partition must possess in order to
satisfy area, pin, throughput and latency constraints. The
paper also includes results of extensive experiments with
the methodology (24 Refs.)
AUTHOR: Sharma, A.; Jain, R.
TITLE: InSyn: integrated scheduling for DSP applications
SOURCE: IEEE Transactions on Signal Processing, vol.43, no.8,
p. 1966-77
YEAR: Aug. 1995
ABSTRACT: We present the InSyn algorithm for high-level synthesis of
DSP applications. InSyn combines allocation and scheduling
of functional, storage, and interconnect units into a single
phase and uses the following unique optimizations. (i) The
concept of register states (free, busy, and undecided) is
used for optimizing registers in a partial schedule where
lifetimes of data values are not yet available. (ii)
Reusable data values and broadcast are used to alleviate bus
contention. (iii) InSyn can alternate between performance-
guided and resource-guided measures. For example, InSyn can
forgo its priority in favor of completing partially
evaluated paths when the availability of allocated registers
becomes low. (iv) InSyn ran selectively increase execution
time of noncritical operations to alleviate bus contention.
(V) InSyn can optimize and trade off distinct (functional
units, interconnect, and registers) resource sets
concurrently leading to more area-delay efficient designs.
(vi) InSyn utilizes estimation tools towards resource
allocation, design space pruning, and evaluation of
synthesized designs. The experiments show that the features
incorporated in inSyn result in very good designs (26 Refs.)
AUTHOR: Potkonjak, M.; Dey, S.; Roy, R.K.
TITLE: Behavioral synthesis of area-efficient testable designs
using interaction between hardware sharing and partial scan
SOURCE: IEEE Transactions on Computer-Aided Design of Integrated
Circuits and Systems, vol.14, no.9, p. 1141-54
YEAR: Sept. 1995
ABSTRACT: We introduce BETS, a behavioral test synthesis system, for
the synthesis of high-throughput, area-efficient testable
designs. While hardware sharing is a powerful technique to
achieve area efficiency, it may adversely affect the
testability of the synthesized design by introducing new
loops. Besides CDFG loops, hardware sharing introduces three
other types of loops: assignment loops, sequential false
loops, and register files cliques. We provide a
comprehensive analysis and a formal grammar characterization
of the sources of loops in the data path during behavioral
synthesis. Partial scan is a cost-effective technique for
sequential circuit testing. Hardware sharing of scan
registers can be used to minimize the number of scan
registers required to synthesize data paths with minimal
number of loops. The scan registers can be shared amongst
several variables of the CDFG, to break not only the loops
in the CDFG, but also the very loops introduced in the data
path by hardware sharing. A new random walk based algorithm
is proposed to break all CDFG loops using a minimal number
of scan registers. The subsequent scheduling and assignment
phase avoids formation of loops in the data path by reusing
the scan registers, while ensuring high resource
utilization. The experimental results demonstrate the
effectiveness of the new technique to synthesize easily
testable data paths, with nominal hardware overhead, while
maintaining the performance of the designs. The partial scan
overhead incurred by the technique is significantly less
than that of a gate-level partial scan approach (44 Refs.)
AUTHOR: Verhaegh, W.F.J.; Lippens, P.E.R.; Aarts, E.H.L.; Korst,
J.H.M.; van Meerbergen, J.L.; van der Werf, A.
TITLE: Improved force-directed scheduling in high-throughput
digital signal processing
SOURCE: IEEE Transactions on Computer-Aided Design of Integrated
Circuits and Systems, vol.14, no.8, p. 945-60
YEAR: Aug. 1995
ABSTRACT: This paper discusses improved force-directed scheduling and
its application in the design of high-throughput DSP
systems, such as real-time video VLSL circuits. We present a
mathematical justification of the technique of force-
directed scheduling, introduced by Paulin and Knight (1989),
and we show how the algorithm can be used to find cost-
effective time assignments and resource allocations,
allowing trade-offs between processing units and memories.
Furthermore, we present modifications that improve the
effectiveness and the efficiency of the algorithm. The
significance of the improvements is illustrated by an
empirical performance analysis based on a number of problem
instances (30 Refs.)
AUTHOR: Dhodhi, M.K.; Hielscher, F.H.; Storer, R.H.; Bhasker, J.
TITLE: Datapath synthesis using a problem-space genetic algorithm
SOURCE: IEEE Transactions on Computer-Aided Design of Integrated
Circuits and Systems, vol.14, no.8, p. 934-44
YEAR: Aug. 1995
ABSTRACT: This paper presents a new approach to datapath synthesis
based on a problem-space genetic algorithm (PSGA). The
proposed technique performs concurrent scheduling and
allocation of functional units, registers, and multiplexers
with the objective of finding both a schedule and an
allocation which minimizes the cost function of the hardware
resources and the total time of execution. The problem-space
genetic algorithm based datapath synthesis system (PSGA-
Synth) combines a standard genetic algorithm with a known
heuristic to search the large design space in an intelligent
manner. PSGA-Synth handles multicycle functional units,
structural pipelining, conditional code and loops, and
provides a mechanism to specify lower and upper bounds on
the number of control steps. The PSGA-Synth was tested on a
set of problems selected from the literature, as well as
larger problems created by us, with promising results. PSGA-
Synth not only finds the best known results for all the test
problems examined in a relatively small amount of CPU time,
but also has the ability to efficiently handle large
problems (21 Refs.)
AUTHOR: Kawaguchi, T.; Todaka, T.
TITLE: Operation scheduling by annealed neural networks
SOURCE: IEICE Transactions on Fundamentals of Electronics,
Communications and Computer Sciences, vol.E78-A, no.6,
p. 656-63
YEAR: June 1995
ABSTRACT: The operation scheduling is an important subtask in the
automatic synthesis of digital systems. Many greedy
heuristics have been proposed for the operation scheduling,
but they cannot find the globally best schedule. In this
paper we present an algorithm to construct near optimal
schedules. The algorithm combines characteristics of
simulated annealing and neural networks. The neural network
used in our scheduling algorithm is similar to that proposed
by Hellstrom et al. (1992). However, while the problems of
these neural networks have a single type of constraint, the
problem considered in this paper has three types of
constraints. As a result, the energy function of the
proposed neural network is given by the weighted sum of
three energy functions. To minimize the weighted sum of two
or more energy functions, conventional methods try to find a
good set of weights using a trial and error method. Our
algorithm takes a different approach than these methods.
Results of the experiments show that the proposed algorithm
can be used as an alternative heuristic for solving the
operation scheduling problem. In addition, the proposed
algorithm can exploit the inherent parallelism of the neural
network (13 Refs.)
AUTHOR: Wilson, T.C.; Mukherjee, N.; Garg, M.K.; Banerji, D.K.
TITLE: An ILP solution for optimum scheduling, module and register
allocation, and operation binding in datapath synthesis
SOURCE: VLSI Design, vol.3, no.1, p. 21-36
YEAR: 1995
ABSTRACT: We present an integrated and optimal solution to the
problems of operator scheduling, module and register
allocation, and operator binding in datapath synthesis. The
solution is based on an integer linear programming (ILP)
model that minimizes a weighted sum of module area and total
execution time under very general assumptions of module
capabilities. In particular, a module may execute an
arbitrary combination of operations, possibly using
different numbers of control steps for different operations.
Furthermore, operations may be implemented by a variety of
modules, possibly requiring different numbers of control
steps depending on the modules chosen. This generality in
the complexity and mixture of modules is unique to our
system and leads to an optimum selection of modules to meet
specified design constraints. Significant extensions include
the ability to incorporate pipelined functional units and
operator chaining in an integrated manner. Straightforward
extension to multi-block synthesis is discussed briefly but
the details are omitted due to space considerations
(13 Refs.)
AUTHOR: Duncan, A.A.; Hendry, D.C.
TITLE: High-level synthesis of DSP datapaths by global optimisation
of variable lifetimes
SOURCE: IEE Proceedings-Computers and Digital Techniques, vol.142,
no.3, p. 215-24
YEAR: May 1995
ABSTRACT: COBRA (Column-Oriented Butted Regular Architecture) is a
behavioural high-level synthesis tool for datapath-dominated
applications. It globally optimises the synthesised datapath
by performing the scheduling and allocation tasks
simultaneously. COBRA uses a bit-sliced target architecture
and layout style which, when compared with conventional
approaches, significantly reduces the area of the final
datapaths. The synthesis problem is formulated as an
optimisation problem on the configuration of variable
lifetimes when mapped into a 3D 'datapath space'. The
configuration of the data in the datapath space implies the
structure required to achieve the data configuration and
hence the datapath. Simulated annealing is used to optimise
the solution. A description is given of the target
architecture, the mapping of the input description into the
datapath space, the optimisation of the data configuration
in the datapath space, and the post-processing operations.
Results for a number of examples are presented (44 Refs.)
AUTHOR: Shih-Hsu Huang; Cheng-Tsung Hwang; Yu-Chin Hsu; Yen-Jen
Oyang
TITLE: A new approach to schedule operations across nested-ifs and
nested-loops
SOURCE: Microprocessing & Microprogramming, vol.41, no.1, p. 37-52
YEAR: April 1995
ABSTRACT: This paper presents a new global scheduling algorithm for
automatic synthesis of the control blocks of special-purpose
microprocessors. The main distinction of the proposed
algorithm is that it exploits the inheritances of structured
programs. The optimization goal is to maximize the speedup
of the processor and minimize the size of the control block.
If compared with existing global scheduling algorithms such
as trace scheduling, tree compaction, and percolation
scheduling, the proposed algorithm consistently achieves
better results in terms of the speedup of the processor and
the size of the control block (12 Refs.)
AUTHOR: Parhi, K.K.
TITLE: High-level algorithm and architecture transformations for
DSP synthesis
SOURCE: Journal of VLSI Signal Processing, vol.9, no.1-2, p. 121-43
YEAR: Jan. 1995
ABSTRACT: This survey paper reviews numerous high-level transformation
techniques which can be applied at the algorithm or the
architecture level to improve the performance of digital
signal and image processing architectures and circuits
implemented using VLSI technology. Successful design of VLSI
signal and image processors requires careful selection of
algorithms, architectures, implementation styles, and
synthesis techniques. High-level transformations can play an
important role in reducing silicon area or power at the same
speed or in increasing the speed for same area. These
transformations can also increase the suitability of an
algorithm for a particular architectural style. The
transformation techniques reviewed in this paper include
pipelining, parallel processing, retiming, unfolding,
folding, look-ahead, relaxed look-ahead, associativity,
distributivity, and reduction in strength (94 Refs.)
AUTHOR: Van Meerbergen, J.L.; Lippens, P.E.R.; Verhaegh, W.E.J.; Van
der Werf, A.
TITLE: PHIDEO: high-level synthesis for high throughput
applications
SOURCE: Journal of VLSI Signal Processing, vol.9, no.1-2, p. 89-104
YEAR: Jan. 1995
ABSTRACT: The paper describes a new approach to high-level synthesis
for high throughput applications. Such applications are
typically found in real-time video systems such as HDTV. The
method is capable of dealing with hierarchical flow graphs
containing loops with manifest boundaries and linear index
expressions. The algorithm is based on the model of periodic
operations which allows optimizations across loop
boundaries. Processing units and storage units are minimized
simultaneously. The algorithm is implemented in the PHIDEO
system. The major parts of this system are the processing
unit synthesis, the scheduler and the memory synthesis
including address generation (37 Refs.)
AUTHOR: Verbauwhede, I.; Rabaey, J.M.
TITLE: Synthesis for real time systems: solutions and challenges
SOURCE: Journal of VLSI Signal Processing, vol.9, no.1-2, p. 67-88
YEAR: Jan. 1995
ABSTRACT: A real time system typically combines a variety of
implementation technologies and hardware architectures.
Deciding how to partition the system and selecting an
architectural technology for the sub-systems is by no means
a trivial task. These architectural decisions have to be
made at the early stages in the design process, when the
impact of the decisions is unclear and can only be
quantified using some primitive measures. the authors
present their vision on how a next generation of design
environment can aid the designer in this decision process.
They first identify the problems of designing a
heterogeneous real time system by walking through the design
process of a complex speech recognition system. Based on
this analysis, they propose a system design methodology
build on top of current synthesis tools. Today, DSP
synthesis tools are application and/or architecture
specific, covering subparts of the application once the
partitioning is made. To make them useful in the proposed
methodology, a unified view on the underlying architecture
assumptions is needed. Secondly, good decision making
requires an "as-good-as-possible" estimation of the
implications of the decision. Therefore, it is important
that current manual estimation be enlarged by high level
estimation and performance analysis tools. The HYPERSPACE
environment, which is currently under development,
therefore, consists of three complementary components: a set
of architecture specific compilers, a set of estimation and
performance analysis tools and an architecture selection and
partitioning framework, steered by the designer (54 Refs.)
AUTHOR: Goossens, G.; Lanneer, D.; Pauwels, M.; Depuydt, F.;
Schoofs, K.; Kifli, A.; Cornero, M.; Petroni, P.; Catthoor,
F.; de Man, H.
TITLE: Integration of medium-throughput signal processing
algorithms on flexible instruction-set architectures
SOURCE: Journal of VLSI Signal Processing, vol.9, no.1-2, p. 49-65
YEAR: Jan. 1995
ABSTRACT: Integrated circuits in telecommunications and consumer
electronics are rapidly evolving towards single chip
solutions. New IC architectures are emerging, which combine
instruction-set processor cores with customised hardware.
The paper describes a high-level synthesis system for
integration of real-time signal processing systems on such
processor cores. The compiler supports a flexible
architectural model. It can handle certain types of
incompletely specified architectures, and offers
capabilities for retargetable compilation and architectural
exploration. Results for a realistic application from the
domain of audio processing indicate the feasibility and
power of the presented approach (43 Refs.)
AUTHOR: Ching-Yi Wang; Parhi, K.K.
TITLE: High-level DSP synthesis using concurrent transformations,
scheduling, and allocation
SOURCE: IEEE Transactions on Computer-Aided Design of Integrated
Circuits and Systems, vol.14, no.3, p. 274-95
YEAR: March 1995
ABSTRACT: This paper addresses high-level synthesis methodologies for
dedicated digital signal processing (DSP) architectures used
in the iterative Loop-based Minnesota Architecture Synthesis
(MARS) design system. We present a novel concurrent
scheduling and resource allocation algorithm which exploits
inter-iteration and intra-iteration precedence constraints.
The novel algorithm implicitly performs algorithmic
transformations, such as pipelining and retiming, on the
data-flow graphs during the scheduling process to produce
solutions which are as good as those previously published
and which executes in less time. MARS is capable of
producing optimal and near-optimal schedules in fractions of
seconds. Previous synthesis systems have focused on DSP
algorithms which have single or lumped delays in the
recursive loops. In contrast, MARS is capable of generating
valid architectures for algorithms which have randomly
distributed delays. MARS exploits these delays to produce
more efficient architectures and allows our system to be
more general. We are able to synthesize architectures which
meet the iteration bound of any algorithm by unfolding,
retiming, and pipelining the original data-flow graph
(49 Refs.)
AUTHOR: Ohm, S.Y.; Kurdahi, F.J.; Jhon, C.S.
TITLE: An optimal scheduling approach using lower bound in high-
level synthesis
SOURCE: IEICE Transactions on Information and Systems, vol.E78-D,
no.3, p. 231-6
YEAR: March 1995
ABSTRACT: The paper describes an optimal scheduling approach which
finds the scheduling result of the minimum functional unit
cost under the given timing constraint. In this method, a
well-defined search space is constructed incrementally and
traversed in a branch-and-bound manner. During the
traversal, tighter lower bounds are estimated and utilized
coupled with the upper bound on the optimal solution in
pruning the search space effectively. This method is
extended to support multi-cycling operations, operation
chaining, pipelined functional units, and pipelined data
paths. Experimental results on some benchmarks show the
efficiency of the proposed approach (9 Refs.)
AUTHOR: Radivojevic, I.P.; Brewer, F.
TITLE: Symbolic scheduling techniques
SOURCE: IEICE Transactions on Information and Systems, vol.E78-D,
no.3, p. 224-30
YEAR: March 1995
ABSTRACT: Describes an exact symbolic formulation of resource-
constrained scheduling which allows speculative operation
execution in arbitrary forward-branching control/data paths.
The technique provides a closed-form solution set in which
all satisfying schedules are encapsulated in a compressed
OBDD (ordered binary decision diagram) based representation.
An iterative construction method is presented, along with
benchmark results. The experiments demonstrate the ability
of the proposed technique to efficiently extract parallelism
not explicitly specified in the input description (17 Refs.)
AUTHOR: Srivastava, M.B.; Potkonjak, M.
TITLE: Optimum and heuristic transformation techniques for
simultaneous optimization of latency and throughput
SOURCE: IEEE Transactions on Very Large Scale Integration (VLSI)
Systems, vol.3, no.1, p. 2-19
YEAR: March 1995
ABSTRACT: Although throughput alone can be arbitrarily improved for
several classes of systems using previously published
techniques, none of those approaches are effective when
latency constraints, which are increasingly important in
embedded DSP systems, are considered. After formally
establishing the relationship between latency and throughput
in general computation, we explore the effect of pipelining
on latency, and establish necessary and sufficient
conditions under which pipelining does not alter latency.
Many systems are either linear, or have subsystems that are
linear. For such cases we have used a state-space based
approach that treats various transformations in an
integrated fashion, and answers analytically whether it is
possible to simultaneously meet any given combination of
constraints on latency and throughput, The analytic approach
is constructive in nature, and produces a complete
implementation when feasibility conditions are fulfilled. We
also present a suboptimal but hardware efficient heuristic
approach for the special case of initially-relaxed single-
input single-output linear time-invariant computations. A
novel software platform consisting of a high-level synthesis
system coupled to a symbolic algebra system was used to
implement the proposed algorithm transformations. Instead of
optimizing to improve throughput and latency, our
transformations can also be used to increase the
implementation efficiency while achieving the same latency
and throughput as the original design (23 Refs.)
AUTHOR: Ahmad, I.; Dhodhi, M.K.; Chen, C.Y.R.
TITLE: Integrated scheduling, allocation and module selection for
design-space exploration in high-level synthesis
SOURCE: IEE Proceedings-Computers and Digital Techniques, vol.142,
no.1, p. 65-71
YEAR: Jan. 1995
ABSTRACT: High-level synthesis consists of many interdependent tasks
such as scheduling, allocation and binding. To make
efficient use of time and area, functional unit allocation
must be performed using a library of modules which contains
a variety of module types with identical functionality, but
different area and delay characteristics. The synthesis
technique presented in the paper simultaneously performs
scheduling, allocation and module selection, using problem-
space genetic algorithm (PSGA) to produce area and
performance optimised designs. The PSGA-based system uses an
intelligent design-space exploration technique by combining
a genetic algorithm with a simple and fast problem-specific
heuristic to search a large design space effectively and
efficiently. The efficient exploration of design-space is
essential to design cost-effective architectures for
problems of VLSI/ULSI complexity. The PSGA method offers
several advantages such as the versatility, simplicity,
objective independence and the computational advantages for
problems of large size over other existing techniques. The
proposed synthesis system handles multicycle functional
units, chaining, conditional constructs, loops and
structural pipelining. Experiments on benchmarks show very
promising results (15 Refs.)
AUTHOR: Karkowski, I.
TITLE: Architectural synthesis with possibilistic programming
SOURCE: Proceedings of the Twenty-Eighth Hawaii International
Conference on System Sciences, p. 5 vol.
(x+361+xv+762+xv+600+xx+1042+x+362), 14-22 vol.1
YEAR: 1995
ABSTRACT: The knowledge about available resources during high-level
synthesis is usually imprecise. Previous methods seem to
have ignored this fact, possibly to avoid an increase in
the, already high, computational complexity. In this paper
an approach based on so called "possibilistic" programming,
a kind of fuzzy mathematical programming, is presented.
Using this method we can improve existing mathematical
programming methods for the architectural synthesis while
keeping their good properties. Not only architectures which
optimize the most possible value of the cost function can be
generated, but more importantly, also the tradeoff between
this goal and reducing the probability of obtaining worse
solution and enhancing probability of obtaining a better
solution is controlled. At the same time, an increase in the
computational complexity of the algorithms is avoided. To
show the validity of the approach an application to
simultaneous scheduling, selection and allocation of
functional units is described. The approach has been
implemented in a system called FOAS. Experimental results
confirm the advantages of the proposed methodology
(16 Refs.)
AUTHOR: Chandrakasan, A.P.; Potkonjak, M.; Mehra, R.; Rabaey, J.;
Brodersen, R.W.
TITLE: Optimizing power using transformations
SOURCE: IEEE Transactions on Computer-Aided Design of Integrated
Circuits and Systems, vol.14, no.1, p. 12-31
YEAR: Jan. 1995
ABSTRACT: The increasing demand for portable computing has elevated
power consumption to be one of the most critical design
parameters. A high-level synthesis system, HYPER-LP, is
presented for minimizing power consumption in application
specific datapath intensive CMOS circuits using a variety of
architectural and computational transformations. The
synthesis environment consists of high-level estimation of
power consumption, a library of transformation primitives,
and heuristic/probabilistic optimization search mechanisms
for fast and efficient scanning of the design space.
Examples with varying degree of computational complexity and
structures are optimized and synthesized using the HYPER-LP
system. The results indicate that more than an order of
magnitude reduction in power can be achieved over current-
day design methodologies while maintaining the system
throughput; in some cases this can be accomplished while
preserving or reducing the implementation area (38 Refs.)
*******************************************************************************
1994
----
AUTHOR: Potkonjak, M.; Rabaey, J.
TITLE: Area-time high level synthesis laws: theory and practice
SOURCE: VLSI Signal Processing VII (Cat. No.94TH8008), p. xii+511,
53-62
YEAR: 1994
ABSTRACT: We introduce three AT DSP high level synthesis laws that
relate different components of the area of ASIC
implementation cost, namely foreground memory, execution
units, and interconnect to the sampling period (available
time). The laws state that: A=const, AT=const, and AT/sup
2/=const for the area of registers, execution units, and
interconnect respectively. We validate the AT laws using
case studies and statistical analysis of synthesis results
of 80 real life designs. Several applications of the AT laws
for development of high level synthesis tools are presented.
Use of the AT high level synthesis laws as an effective
method for encapsulation of high level synthesis knowledge
is also studied, The effectiveness of the AT laws
applications is documented on numerous designs (15 Refs.)
AUTHOR: Silc, J.
TITLE: Scheduling strategies in high-level synthesis
SOURCE: Informatica, vol.18, no.1, p. 71-9
YEAR: April 1994
ABSTRACT: This paper describes the objectives of high-level synthesis.
It concentrates on operation scheduling strategies and the
interaction with resource allocation. Some transformational
and iterative/constructive scheduling algorithms are
described. Moreover, a new scheduling/allocation approach is
presented and compared with other known algorithms. Finally,
some open problems of high-level synthesis are given
(51 Refs.)
AUTHOR: Silc, J.
TITLE: Scheduling strategies in high-level synthesis
SOURCE: Informatica, vol.18, no.1, p. 71-9
YEAR: April 1994
ABSTRACT: This paper describes the objectives of high-level synthesis.
It concentrates on operation scheduling strategies and the
interaction with resource allocation. Some transformational
and iterative/constructive scheduling algorithms are
described. Moreover, a new scheduling/allocation approach is
presented and compared with other known algorithms. Finally,
some open problems of high-level synthesis are given
(51 Refs.)
AUTHOR: Tabuenca, P.; Sanchez, P.; Villar, E.
TITLE: ULSB: a fast scheduling and binding algorithm for
unrestricted libraries inside a RT space exploration
strategy
SOURCE: Proceedings of the 20th EUROMICRO Conference. EUROMICRO 94.
System Architecture and Integration, p. xxi+720, 200-7
YEAR: 1994
ABSTRACT: This paper presents a method for solving scheduling problem
in high-level synthesis. The main advantage of this method
over those proposed up to now, lies in its capacity to
handle unrestricted libraries. A library of this kind
includes combinational and sequential multi-cycle functional
units, multi-functional operational units, pipeline modules
as well as different execution and different latency times
for the same operation type in different modules. The
computing time of this algorithm is quite low and gives very
good results, so it has powerful applications in RT space
exploration. The tool FIRES (Fast Intelligent RT Exploration
System) is also presented (11 Refs.)
AUTHOR: Grass, W.; Mutz, M.; Tiedemann, W.-D.
TITLE: High level synthesis based on formal methods
SOURCE: Proceedings of the 20th EUROMICRO Conference. EUROMICRO 94.
System Architecture and Integration, p. xxi+720, 83-91
YEAR: 1994
ABSTRACT: High-level synthesis, the task of automatically transforming
behavioural descriptions at the algorithmic level into
register transfer level structures, is conquered today
almost always by use of heuristics. This prevents giving a
guarantee either for the design optimality or the
correctness, although these are the most essential criteria.
By applying formal methods, both issues can be formally
proven. This paper presents 3 aspects of HLS, that have been
solved using formal methods, namely (1) the formal
verification that those hardware modules allocated to
realize a particular RT level operation are actually doing
so with respect to functionality and timing, (2) an exact
solution of the scheduling problem that is yet able to cope
with more practical constraints than any heuristical
procedure currently known and (3) the formally based
synthesis of communication-dominated hardware, an important
application area that is largely unnoticed by today's HLS
systems (20 Refs.)
AUTHOR: ShaoJun Wei; Leroy, J.; Crappe, R.; Nancy, T.
TITLE: Mobility reduction based scheduling with allocation emphasis
in datapath design
SOURCE: Proceedings of the 20th EUROMICRO Conference. EUROMICRO 94.
System Architecture and Integration, p. xxi+720, 12-18
YEAR: 1994
ABSTRACT: Operation scheduling is one of the most important steps in
datapath design. The object of this paper is to propose a
new idea of scheduling which may potentially lead to an
optimal result. Different from the algorithms proposed in
the past, which try to find the control step that is
suitable for a selected operation, the new idea tries to
find the control step that is not suitable for the
operation. An operation is scheduled when all the unsuitable
steps are found. Some allocation problems such as functional
unit sharing and connection minimisation, are combined into
the scheduling procedure. Thus, the allocation procedure is
simplified and design quality increases. The experiment show
that the optimality of scheduling is distinctly improved
comparing to those of related works published before
(8 Refs.)
AUTHOR: Yamada, A.; Yamazaki, T.; Ishiura, N.; Shirakawa, I.; Kambe,
T.
TITLE: Datapath scheduling for behavioral description with
conditional branches
SOURCE: IEICE Transactions on Fundamentals of Electronics,
Communications and Computer Sciences, vol.E77-A, no.12,
p. 1999-2009
YEAR: Dec. 1994
ABSTRACT: A new approach is described for the datapath scheduling of
behavioral descriptions containing nested conditional
branches of arbitrary structures. This paper first
investigates such a complex scheduling mechanism, and
formulates an optimal scheduling problem as a 0-1 integer
programming problem such that given a prescribed number of
control steps, the total cost of functional units can be
minimized. In this formulation, each constraint is expressed
in the form of a Boolean function, which is set equal to 1
or 0 according to whether the constraint is satisfied or
not, respectively, and a satisfiability problem is defined
by the product of the Boolean functions. A procedure is then
described, which intends to seek an optimal solution by
means of a branch-and-bound method on a binary decision
diagram representing the satisfiability problem.
Experimental results are also shown, which demonstrate that
our approach is of more practical use than the existing
methods (16 Refs.)
AUTHOR: Stok, L.
TITLE: Data path synthesis
SOURCE: Integration, The VLSI Journal, vol.18, no.1, p. 1-71
YEAR: Dec. 1994
ABSTRACT: This paper reviews all the phases in data path synthesis:
register allocation, storage grouping, module allocation and
interconnect allocation. In addition, a new phase for the
storage value insertion is introduced. For each of these
phases a formal problem description is given. Restrictions
on the data path allocation phases are presented, which
delimit the problems to cases which can be solved by
polynomial algorithms. For the general cases, heuristics are
provided which have appeared to be effective in the
literature. Special data path architectures may require
special algorithms to make use of their features. Throughout
the paper architectural constraints are described and
effective algorithms for them derived. To construct an
effective data path allocation system, a scheme has to be
defined. The scheme determines which subproblems are solved
in what order and which constraints are taken into account
in each phase. The data flow graph and schedule and their
match with the data path architecture have a major impact on
the development of a scheme. This paper will point out the
trade-offs that have to be made when developing such a
scheme. This paper provides a reference to most of the data
path allocation algorithms published in the scope of high-
level synthesis. Most of the techniques are explained in
considerable detail and various examples are given. The
paper comments on the applicability of most of the
algorithms for particular data path allocation problems
(74 Refs.)
AUTHOR: Jansen, K.
TITLE: On the complexity of allocation problems in high-level
synthesis
SOURCE: Integration, The VLSI Journal, vol.17, no.3, p. 241-52
YEAR: Nov. 1994
ABSTRACT: In this paper, we study the problem of processor allocation.
Given a scheduled flow graph with a constant number of
conditional branches, the problem is to find an assignment
of the operations in the flow graph to a minimum number of
processors. We show that this problem is NP-complete even if
the flow graph has at most one conditional branch. Finally,
we show that another related problem in high-level synthesis
(mentioned as an open problem on the 24th Design Automation
Conference), the so-called register allocation problem, is
also NP-complete (21 Refs.)
AUTHOR: Chaudhuri, S.; Walker, R.A.; Mitchell, J.E.
TITLE: Analyzing and exploiting the structure of the constraints in
the ILP approach to the scheduling problem
SOURCE: IEEE Transactions on Very Large Scale Integration (VLSI)
Systems, vol.2, no.4, p. 456-71
YEAR: Dec. 1994
ABSTRACT: In integer linear programming (ILP), formulating a "good"
model is of crucial importance to solving that model. In
this paper, we begin with a mathematical analysis of the
structure of the assignment, timing, and resource
constraints in high-level synthesis, and then evaluate the
structure of the scheduling polytope described by these
constraints. We then show how the structure of the
constraints can be exploited to develop a well-structured
ILP formulation, which can serve as a solid theoretical
foundation for future improvement. As a start in that
direction, we also present two methods to further tighten
the formulation. The contribution of this paper is twofold:
1) it provides the first in-depth formal analysis of the
structure of the constraints, and it shows how to exploit
that structure in a well-designed ILP formulation, and 2) it
shows how to further improve a well-structured formulation
by adding new valid inequalities (32 Refs.)
AUTHOR: Aloqeely, M.; Chen, C.Y.R.
TITLE: Sequencer-based data path synthesis of regular iterative
algorithms
SOURCE: 31st Design Automation Conference. 31st DAC Proceedings 1994
(IEEE Cat. No.94CH3408-2), p. xxviii+739, 155-60
YEAR: 1994
ABSTRACT: In many applications, especially signal processing and
matrix computations, algorithms are in a highly regular
iterative form; access patterns for most variables are
highly regular and uniform. Instead of always storing the
values of variables back to and retrieving them from memory
or register files, it will be much more efficient and cost
effective to let those variables intelligently "stay" or
"flow" in the data path for future use. In this paper, low
cost and simple structured sequencers which are best
exemplified by hardware stacks and queues are introduced in
the data path for efficiently implementing such a novel
concept. Various algorithms are developed to map variables
to sequencers and to integrate sequencers into conventional
high-level synthesis procedures. Experimental results show
very encouraging improvement in the performance of designs
as well as significant reduction in hardware cost (16 Refs.)
AUTHOR: Wilson, T.C.; Grewal, G.W.; Banerji, D.K.
TITLE: An ILP solution for simultaneous scheduling, allocation, and
binding in multiple block synthesis
SOURCE: Proceedings IEEE International Conference on Computer
Design: VLSI in Computers and Processors (Cat.
No.94CH35712), p. xvii+639, 581-6
YEAR: 1994
ABSTRACT: Presents a novel approach to the high-level synthesis
problems of scheduling, allocation, and binding for
multiblock behavioral descriptions. Our design tool, JOSHUA,
uses an integer linear programming (ILP) formulation to
solve the three interdependent subproblems simultaneously
and optimally. The system allows the designer to minimize
time, area, and the number of microwords for the entire
design, or for specific segments of the design. A diverse
module library provides a selection of modules that can
perform a specific operation in differing amounts of time
(control steps). A novel feature is the ability to select an
implementation for part of an algorithm from among a set of
implementation alternatives. The system can also handle the
issues of path frequencies, loops, parallel threads of
execution, and register allocation (16 Refs.)
AUTHOR: Landwehr, B.; Marwedel, P.; Domer, R.
TITLE: OSCAR: optimum simultaneous scheduling, allocation and
resource binding based on integer programming
SOURCE: Proceedings EURO-DAC '94 with EURO-VHDL '94 (IEEE Cat.
No.94CH35704), p. xix+697, 90-5
YEAR: 1994
ABSTRACT: The paper presents an approach to high-level synthesis which
is based upon a 0/1 integer programming model. In contrast
to other approaches, this model allows solving all three
subtasks of high-level synthesis (scheduling, allocation and
binding) simultaneously. As a result, designs which are
optimal with respect to the cost function are generated. The
model is able to exploit large component libraries with
multiplier-functional units and complex components such as
multiplier-accumulators. Furthermore, the model is capable
of handling mixed speeds and chaining in its general form
(14 Refs.)
AUTHOR: Ali, S.; Sait, S.M.; Benten, M.S.T.
TITLE: GSA: scheduling and allocation using genetic algorithm
SOURCE: Proceedings EURO-DAC '94 with EURO-VHDL '94 (IEEE Cat.
No.94CH35704), p. xix+697, 84-9
YEAR: 1994
ABSTRACT: The paper describes a unique approach to the scheduling and
allocation problem in high-level synthesis using genetic
algorithms (GA). This approach is different from a previous
attempt using GA (N. Wehn et al., 1990), in many respects.
Its contributions include: a new chromosomal representation
for scheduling and two subproblems of allocation; and two
novel crossover operators to generate legal schedules. The
approach has been tested on various benchmarks and results
are compared with those obtained by other approaches such as
simulated evolution, tabu search, HAL, SALSA II, STAR, etc
(11 Refs.)
AUTHOR: Antola, A.; Distante, F.
TITLE: High level synthesis: node activation synthesis for acyclic
and cyclic data flow graphs
SOURCE: Journal of Microelectronic Systems Integration, vol.2, no.1,
p. 13-22
YEAR: March 1994
ABSTRACT: In the process of High Level Synthesis, the behavioral
description of a device is usually translated into a graph
based representation originating a Control Flow Graph and
Data Flow Graph or a single Control Data Flow Graph. In the
proposed approach to high level synthesis, the graph domain,
represented by the DFG or CDFG of the behavioral
description, is exploited to analyze design alternatives and
to evaluate architectural performances. In this paper, we
propose a methodology and a set of algorithms generating an
optimal pipelined design of the graph-based architecture.
General DFGs are considered and the problem of cyclic DFGs
is discussed (13 Refs.)
AUTHOR: Springer, D.L.; Thomas, D.E.
TITLE: Exploiting the special structure of conflict and
compatibility graphs in high-level synthesis
SOURCE: IEEE Transactions on Computer-Aided Design of Integrated
Circuits and Systems, vol.13, no.7, p. 843-56
YEAR: July 1994
ABSTRACT: Coloring of conflict graphs and clique partitioning of
compatibility graphs have been used in high-level synthesis
to map operators, values, and data transfers onto shared
resources. However, finding a minimum sized coloring or
clique partition is NP hard. One method to overcome this
complexity is to identify special types of graphs that can
be colored or clique partitioned in polynomial time.
Existing high-level synthesis systems have exploited two
special types of conflict graphs-interval and circular-arc
graphs. However, they have provided no insight into why and
how frequently these graphs occur. This paper will
investigate the features of behavioral representations and
synthesis algorithms that give rise to special conflict and
compatibility graphs. We will identify two additional types
of graphs useful for high-level synthesis-chordal and
comparability graphs-and demonstrate their use in an
existing high-level synthesis system (25 Refs.)
AUTHOR: Arato, P.; Beres, I.; Rucinski, A.; Davis, R.; Torbert, R.
TITLE: A high-level datapath synthesis method for pipelined
structures
SOURCE: Microelectronics Journal, vol.25, no.3, p. 237-47
YEAR: May 1994
ABSTRACT: This paper presents a model and a method for the high-level
datapath synthesis of pipelined ASIC architectures, starting
with a behavioural description of the system consisting of
theoretical operational units with arbitrary operation
duration. The method calculates the minimal number of
buffers to be inserted, optimally selects the number of
types of additional operational units, and determines the
minimal number of required copies. The aim of the procedure
is to ensure a latency which can be given in advance, and
could not be achieved without additional buffers and extra
copies of the operational units. The method provides a
solution to the resource allocation problem by establishing
a compatibility relation between the concurrent operations.
The constraints for the types of processors to be applied
can vary, depending upon the hardware resources (20 Refs.)
AUTHOR: Rabaey, J.M.; Potkonjak, M.
TITLE: Estimating implementation bounds for real time DSP
application specific circuits
SOURCE: IEEE Transactions on Computer-Aided Design of Integrated
Circuits and Systems, vol.13, no.6, p. 669-83
YEAR: June 1994
ABSTRACT: This paper discusses techniques for estimating
implementation bounds on computational resources and their
role in the high-level synthesis process. Accurate
estimations can be extremely useful in a multitude of
synthesis operations, such as algorithm and architecture
selection, design space search, module selection,
transformations, allocation, assignment, and scheduling.
Several techniques to efficiently estimate sharp minimum and
maximum bounds on the resource requirements of a hardware
implementation are discussed. The performance of the
algorithms as well as their applications is analyzed using
an extensive benchmark set. The proposed techniques have
been implemented in the HYPER synthesis system (63 Refs.)
AUTHOR: Rim, M.; Mujumdar, A.; Jain, R.; de Leone, R.
TITLE: Optimal and heuristic algorithms for solving the binding
problem
SOURCE: IEEE Transactions on Very Large Scale Integration (VLSI)
Systems, vol.2, no.2, p. 211-25
YEAR: June 1994
ABSTRACT: In this paper we present an optimal and a heuristic approach
to solve the binding problem which occurs in high-level
synthesis of digital systems. The optimal approach is based
on an integer linear programming formulation. Given that
such an approach is not practical for large problems, we
then derive a heuristic from the ILP formulation which
produces very good solutions in order of seconds. The
heuristic is based on a network flow model and also
considers floorplanning during the design process to
minimize the interconnection area (60 Refs.)
AUTHOR: Seawright, A.; Brewer, F.
TITLE: Clairvoyant: a synthesis system for production-based
specification
SOURCE: IEEE Transactions on Very Large Scale Integration (VLSI)
Systems, vol.2, no.2, p. 172-85
YEAR: June 1994
ABSTRACT: This paper describes a new high-level synthesis system based
on the hierarchical production based specification (PBS).
Advantages of this form of specification are that the
designer does not describe the control flow in terms of
explicit states or control variables, and that the designer
does not describe a particular form of implementation. The
production-based specification also separates the
specification of the control aspects and data-flow aspects
of the design. The control is implicitly described via the
production hierarchy, while the data-flow is described as
action computations. This approach is a hardware analog of
popular software engineering techniques. The Clairvoyant
system automatically constructs a controlling machine from
the PBS and this process is not impacted by the possibly
exponentially larger deterministic state space of the
designs. The encodings generated by the constructions
compare favorably to encodings derived using graph-based
state encoding techniques in terms of logic complexity and
logic depth. These construction techniques utilize recent
advances in BDD techniques (32 Refs.)
AUTHOR: Dowsing, R.; Elliott, R.; Marshall, I.
TITLE: Automated technique for high-level circuit synthesis from
temporal logic specifications
SOURCE: IEE Proceedings-Computers and Digital Techniques, vol.141,
no.3, p. 145-52
YEAR: May 1994
ABSTRACT: A general-purpose strategy for the synthesis of digital
circuits from high-level behavioural specifications
expressed in the temporal-logic language Tempura is
described. This strategy has been implemented as a synthesis
tool called AST, and the application of AST to part of the
specification for an error-encoder circuit is examined
(13 Refs.)
AUTHOR: Rouzeyre, B.; Dupont, D.; Sagnes, G.
TITLE: Component selection, scheduling and control schemes for high
level synthesis
SOURCE: Proceedings. The European Design and Test Conference. EDAC,
The European Conference on Design Automation. ETC European
Test Conference. EUROASIC, The European Event in ASIC Design
(Cat. No.94TH0634-6), p. xxvii+676, 482-9
YEAR: 1994
ABSTRACT: This paper emphasizes on the performance optimization
problem of automatically synthesized circuits while
respecting area constraints. This optimization is performed
at the earliest step of synthesis, i.e. during the
scheduling of the behavioral specification of the circuit.
It is mainly based on an adequate determination of the type
of component which implements each operation in the
specification. An algorithm performing concurrently
component selection and scheduling is presented. It gives
the designer facilities for design-space exploration, i.e.
time versus area, since components are taken from a user
specified library. The special features of this algorithm
are: several component types can implement an operation and
one component type can implement several operations; the
delays are associated to pairs (component type, operation
type); both the usual synchronous control scheme and a
control scheme allowing to tune independently the delay of
every control step (referred as adjusted control) are
discussed (26 Refs.)
AUTHOR: Dalkilic, M.E.; Pitchumani, V.
TITLE: Optimal operation scheduling using resource lower bound
estimations
SOURCE: Proceedings. The European Design and Test Conference. EDAC,
The European Conference on Design Automation. ETC European
Test Conference. EUROASIC, The European Event in ASIC Design
(Cat. No.94TH0634-6), p. xxvii+676, 319-24
YEAR: 1994
ABSTRACT: Presents an accurate resource lower bound estimation
technique which leads to an efficient, and optimal solution
of time-constrained as well as hardware resource-constrained
scheduling problems in high-level synthesis. A given time or
hardware constrained scheduling problem is transformed into
a cost ordered sequence of feasible scheduling problems
where a solution to this new problem is guaranteed to be an
optimal solution to the original problem. Efficiency of the
approach is demonstrated on large high-level synthesis
benchmarks like the elliptical wave filter and the discrete
cosine transform (10 Refs.)
AUTHOR: Amellal, S.; Kaminska, B.
TITLE: Functional synthesis of digital systems with TASS
SOURCE: IEEE Transactions on Computer-Aided Design of Integrated
Circuits and Systems, vol.13, no.5, p. 537-52
YEAR: May 1994
ABSTRACT: Synthesizing a digital system from a functional description
is a complex process requiring the solution of various
different problems. TASS (Tabu Search Synthesis System) is a
functional synthesis system made up of interdependent
modules based on new and more efficient algorithms. First, a
control and data flow graph model for system representation
is developed and presented. This model generates a single
graph representing both the data and control flows of a VHDL
behavioral description. A new representation of conditional
branches and a mutual exclusion testing procedure offering
optimized resource sharing and critical path reduction
possibilities have been developed. This graph model is an
environment used for various synthesis needs starting from
high-level transformations to FSM synthesis for controller
implementation. A new mathematical formulation of the
scheduling problem is developed using a new approach based
on penalty weights. This approach avoids the inflexibility
of the ILP formulations developed in related works where the
functional unit performing each type of operation is fixed
prior to scheduling. The Tabu Search technique, which has
been effective in finding optimal solutions for many types
of large and difficult combinatorial optimization problems,
has been adapted for this purpose. This technique, which
performs an intelligent and fast solution space exploration,
combined with an effective mathematical formulation makes
the scheduling algorithm presented here very powerful
(38 Refs.)
AUTHOR: Gebotys, C.H.
TITLE: An optimization approach to the synthesis of multichip
architectures
SOURCE: IEEE Transactions on Very Large Scale Integration (VLSI)
Systems, vol.2, no.1, p. 11-20
YEAR: March 1994
ABSTRACT: An optimization approach to the high level synthesis of VLSI
multichip architectures is presented in this paper. This
research is important for industry since it is well known
that these early high level decisions have the greatest
impact on the final VLSI implementation. Optimal application-
specific architectures are synthesized here to minimize
latency given constraints on chip area, I/O pin count and
interchip communication delays. A mathematical integer
programming (IP) model for simultaneously partitioning,
scheduling, and allocating hardware (functional units, I/O
pins, and interchip busses) is formulated. By exploiting the
problem structure, using polyhedral theory, the size of the
search space is decreased and a new variable selection
strategy is introduced based on the branch and bound
algorithm. Multichip optimal architectures for several
examples are synthesized in practical cpu times. Execution
times are comparable to previous heuristic approaches,
however there are significant improvements in optimal
schedules and allocations of multichips. This research
breaks new ground by 1) simultaneously partitioning,
scheduling, and allocating in practical cpu times, 2)
guaranteeing globally optimal architectures for multichip
systems for a specific objective function, and 3) supporting
interchip communication delay, interchip bus allocation, and
other complex interface constraints (24 Refs.)
AUTHOR: Rim, M.; Jain, R.
TITLE: Lower-bound performance estimation for the high-level
synthesis scheduling problem
SOURCE: IEEE Transactions on Computer-Aided Design of Integrated
Circuits and Systems, vol.13, no.4, p. 451-8
YEAR: April 1994
ABSTRACT: A given behavioral specification can be implemented on a
large number of register-transfer level designs. Instead of
producing several designs and selecting the best one,
synthesis systems may use estimation to reduce the design
space. In this paper, we present a new technique for
computing a lower-bound completion time for non-pipelined
resource-constrained scheduling problem. Given a data flow
graph, a set of resources, resource delays and a clock
cycle, we derive a lower-bound on the completion time of a