ECE 3485 Paper Reference List



SURVEY ARTICLES:

AUTHOR:	Stok, L.
TITLE:	Data path synthesis
SOURCE:	Integration, The VLSI Journal, vol.18, no.1, p. 1-71
YEAR:	Dec. 1994
ABSTRACT: This paper reviews all the phases in data path synthesis:
	register allocation, storage grouping, module allocation and
	interconnect allocation. In addition, a new phase for the
	storage value insertion is introduced. For each of these
	phases a formal problem description is given. Restrictions
	on the data path allocation phases are presented, which
	delimit the problems to cases which can be solved by
	polynomial algorithms. For the general cases, heuristics are
	provided which have appeared to be effective in the
	literature. Special data path architectures may require
	special algorithms to make use of their features. Throughout
	the paper architectural constraints are described and
	effective algorithms for them derived. To construct an
	effective data path allocation system, a scheme has to be
	defined. The scheme determines which subproblems are solved
	in what order and which constraints are taken into account
	in each phase. The data flow graph and schedule and their
	match with the data path architecture have a major impact on
	the development of a scheme. This paper will point out the
	trade-offs that have to be made when developing such a
	scheme. This paper provides a reference to most of the data
	path allocation algorithms published in the scope of high-
	level synthesis. Most of the techniques are explained in
	considerable detail and various examples are given. The
	paper comments on the applicability of most of the
	algorithms for particular data path allocation problems
	(74 Refs.)



1996
----

AUTHOR:	Ernst, R.; Henkel, J.; Benner, Th.; Ye, W.; Holtmann, U.;
	Herrmann, D.; Trawny, M.
TITLE:	The COSYMA environment for hardware/software cosynthesis of
	small embedded systems
SOURCE:	Microprocessors and Microsystems, vol.20, no.3, p. 159-66
YEAR:	May 1996
ABSTRACT: COSYMA is a platform for the investigation of
	hardware/software cosynthesis of small embedded systems.
	Target architecture is currently limited to processor-
	coprocessor configurations executing a single process or a
	system of communicating processes which are statically
	scheduled. Many aspects of cosynthesis such as automatic
	hardware/software partitioning, efficient hardware/software
	communication, timing and hardware overhead estimation and
	analysis, interdependence of different cosynthesis phases,
	data representation, etc., can successfully be investigated
	in this manageable domain. COSYMA covers the complete design
	flow from an input language similar to C down to netlist and
	object code. Current focus is on high performance data
	dominated systems, but first steps to incorporate control
	dominated subtasks can be presented. Using a specific high-
	level synthesis tool, the results show a considerable
	speedup of the resulting processor-coprocessor system even
	compared to modern RISC processors which is typically
	limited by memory bandwidth (38 Refs.)

AUTHOR:	Hong-Shin Jun; Sun-Young Hwang
TITLE:	Automatic synthesis of dynamically configured pipelines
	supporting variable data initiation intervals
SOURCE:	IEEE Transactions on Very Large Scale Integration (VLSI)
	Systems, vol.4, no.2, p. 279-85
YEAR:	June 1996
ABSTRACT: The authors propose a new approach for synthesizing the
	dynamically configured pipelines supporting variable data
	initiation intervals (DIIs). Compared to the previous
	research where the pipeline synthesis is confined to those
	with fixed DIIs, the proposed system allows powerful design
	space exploration by removing the constraints of fixed DIIs.
	The proposed algorithm tries to optimize the area of
	pipeline structures by fully utilizing hardware resources to
	which abstract operations in high-level design descriptions
	are assigned, while meeting the given timing constraints in
	the clock cycle time, number of stages, and data initiation
	sequence. Experimental results on benchmarks show that new
	design points, efficient in speed and in area, can be found
	by removing the restriction of fixed DIIs in the synthesis
	of pipeline structures (7 Refs.)

AUTHOR:	Chaudhuri, S.; Walker, R.A.
TITLE:	Computing lower bounds on functional units before scheduling
SOURCE:	IEEE Transactions on Very Large Scale Integration (VLSI)
	Systems, vol.4, no.2, p. 273-9
YEAR:	June 1996
ABSTRACT: The authors present a new polynomial-time algorithm for
	computing lower bounds on the number of functional units
	(FUs) of each type required to schedule a data flow graph in
	a specified number of control steps. A formal approach is
	presented that is guaranteed to find the tightest possible
	bounds that can be found by relaxing either the precedence
	constraints or integrality constraints on the scheduling
	problem. This tight, yet fairly efficient, bounding method
	can be used to estimate FU area, to generate resource
	constraints for reducing the search space, or in conjunction
	with exact techniques for efficient optimal design space
	exploration (19 Refs.)

AUTHOR:	Jha, P.K.; Dutt, N.D.
TITLE:	High-level library mapping for arithmetic components
SOURCE:	IEEE Transactions on Very Large Scale Integration (VLSI)
	Systems, vol.4, no.2, p. 157-69
YEAR:	June 1996
ABSTRACT: We describe high-level library mapping (HLLM), a technique
	that permits reuse of complex RT-level databook components
	(specifically ALUs). HLLM can be used to couple existing
	databook libraries, module generators and custom-designed
	components with the output of architectural or behavioral
	synthesis. In this paper, we define the problem of high-
	level library mapping, present some algorithmic formulations
	for HLLM of ALUs, and demonstrate the versatility of our
	approach on a variety of libraries. We also compare HLLM
	against the traditional mapping approach using logic
	synthesis. Our experiments show that HLLM for ALUs
	outperforms logic synthesis in area, delay, and runtime,
	indicating that HLLM is a promising approach for reuse of
	datapath components in architectural design and high-level
	synthesis (23 Refs.)

AUTHOR:	Minjoong Rim; Jain, R.
TITLE:	Valid transformations: a new class of loop transformations
	for high-level synthesis and pipelined scheduling
	applications
SOURCE:	IEEE Transactions on Parallel and Distributed Systems,
	vol.7, no.4, p. 399-410
YEAR:	April 1996
ABSTRACT: In this paper we present a new class of loop optimizing
	transformations called valid transformations, which are
	suitable for fine-grain parallelization applications such as
	high-level synthesis of VLSI designs or compilers for super-
	scalar or VLIW machines. This class of transformations are
	different from existing ones in that valid transformations
	can be illegal. Nevertheless, if a transformation is valid,
	the transformed loop has a feasible pipeline schedule. We
	present an example valid transformation called loop
	expansion which can help produce cost-performance efficient
	designs and explore a larger design space for a satisfactory
	design. Several examples are used to demonstrate the
	efficacy of the proposed technique (53 Refs.)

AUTHOR:	Mecha, H.; Fernandez, M.; Tirade, F.; Septien, J.; Motes,
	D.; Olcoz, K.
TITLE:	A method for area estimation of data-path in high level
	synthesis
SOURCE:	IEEE Transactions on Computer-Aided Design of Integrated
	Circuits and Systems, vol.15, no.2, p. 258-65
YEAR:	Feb. 1996
ABSTRACT: This paper describes a new method to estimate the area of
	data paths generated during a High Level Synthesis (HLS)
	process, when the information concerning the circuit is not
	yet complete. Our method is more accurate and considers more
	factors than those used by other HLS systems of which we are
	aware. Our main concern is the interconnection area, often
	neglected by HLS systems, which has a strong influence on
	the final circuit area being optimized, as well as a high
	dependency on the technology used and on the circuit area
	itself. Predicting the area of a design layout with accuracy
	is important because it allows one to foresee whether the
	design will satisfy the area constraints, and will lend the
	allocator towards the best design among several
	possibilities with guarantees. Our estimations of the final
	standard-cell layout area are similar, or even more
	accurate, than those obtained following methods used by low-
	level design systems, which have much more information
	available. Due to the performance penalty their relatively
	high complexity will produce, these methods are unusable in
	an HLS system exploring a wide design space. Our estimation,
	on the contrary, has a low complexity and can be repeated
	time and again as the HLS design space is searched
	(21 Refs.)

AUTHOR:	Yang, J.C.-Y.; de Micheli, G.; Damiani, M.
TITLE:	Scheduling and control generation with environmental
	constraints based on automata representations
SOURCE:	IEEE Transactions on Computer-Aided Design of Integrated
	Circuits and Systems, vol.15, no.2, p. 166-83
YEAR:	Feb. 1996
ABSTRACT: We introduce a framework for synthesis of behavioral models
	in which design information is represented using an
	automaton model. This model offers the advantage of
	supporting different constraints (e.g., timing, resource,
	synchronization, etc.) with a uniform formalism. The set of
	all feasible execution traces (schedules) is constructed and
	traversed using efficient BDD-based implicit state-traversal
	techniques. As an application example of this formalism, we
	present a novel scheduling/control-generation algorithm
	under environmental constraints where both the design and
	constraints are represented using automata. We present an
	algorithm that generates a minimum-latency schedule and a
	control unit representation. This approach is able to
	exploit degrees of freedom among interacting components of a
	multimodule system during scheduling, and is well suited for
	system-level design, where component encapsulation and
	interfacing are important (26 Refs.)

AUTHOR:	Potkonjak, M.; Srivastava, M.B.; Chandrakasan, A.P.
TITLE:	Multiple constant multiplications: efficient and versatile
	framework and algorithms for exploring common subexpression
	elimination
SOURCE:	IEEE Transactions on Computer-Aided Design of Integrated
	Circuits and Systems, vol.15, no.2, p. 151-65
YEAR:	Feb. 1996
ABSTRACT: Many applications in DSP, telecommunications, graphics, and
	control have computations that either involve a large number
	of multiplications of one variable with several constants,
	or can easily be transformed to that form. A proper
	optimization of this part of the computation, which we call
	the multiple constant multiplication (MCM) problem, often
	results in a significant improvement in several key design
	metrics, such as throughput, area, and power. However, until
	now little attention has been paid to the MCM problem. After
	defining the MCM problem, we introduce an effective problem
	formulation for solving it where first the minimum number of
	shifts that are needed is computed, and then the number of
	additions is minimized using common subexpression
	elimination. The algorithm for common subexpression
	elimination is based on an iterative pairwise matching
	heuristic. The power of the MCM approach is augmented by
	preprocessing the computation structure with a new scaling
	transformation that reduces the number of shifts and
	additions. An efficient branch and bound algorithm for
	applying the scaling transformation has also been developed.
	The flexibility of the MCM problem formulation enables the
	application of the iterative pairwise matching algorithm to
	several other important and common high level synthesis
	tasks, such as the minimization of the number of operations
	in constant matrix-vector multiplications, linear
	transforms, and single and multiple polynomial evaluations.
	All applications are illustrated by a number of benchmarks
	(66 Refs.)

AUTHOR:	Ti-Yen Yen; Wolf, W.
TITLE:	An efficient graph algorithm for FSM scheduling
SOURCE:	IEEE Transactions on Very Large Scale Integration (VLSI)
	Systems, vol.4, no.1, p. 98-112
YEAR:	March 1996
ABSTRACT: This paper presents a new algorithm for scheduling control-
	dominated designs during high-level synthesis. Our algorithm
	can schedule systems with arbitrary control flow, including
	conditional branches and multiple loops. It can handle both
	upper bound and lower bound timing constraints. The timing
	constraints can cross basic block boundaries, span different
	iterations of a loop, and form interlocking cycles in the
	control flow. A scheduling problem is described by the
	behavior finite-state machine model, an automaton model for
	the behavioral specification and synthesis of control-
	dominated systems. We optimize the performance of the
	produced digital circuit implementation by minimizing the
	execution time of each state transition in the state
	transition graph. The finite-state machines (FSM) scheduling
	algorithm is based on previous work on cylindrical layout
	compaction; we extend that work to handle upper bound
	constraints, allow multiple loops, and not require an
	initial feasible solution. Experimental results for examples
	derived from real designs and benchmark descriptions
	demonstrate that the algorithm can handle complex
	combinations of constraints very efficiently (30 Refs.)

AUTHOR:	Prihozhy, A.
TITLE:	Net scheduling in high-level synthesis
SOURCE:	IEEE Design & Test of Computers, vol.13, no.1, p. 26-35
YEAR:	Spring 1996
ABSTRACT: A new net scheduling and allocation model generates net
	schedules that minimize either execution time or resources.
	The author tested the model within a VHDL-based high-level
	synthesis system called Ahiles (21 Refs.)

AUTHOR:	Radivojevic, I.; Brewer, F.
TITLE:	A new symbolic technique for control-dependent scheduling
SOURCE:	IEEE Transactions on Computer-Aided Design of Integrated
	Circuits and Systems, vol.15, no.1, p. 45-57
YEAR:	Jan. 1996
ABSTRACT: This paper describes an exact symbolic formulation of
	control-dependent, resource-constrained scheduling. The
	technique provides a closed-form solution set in which all
	satisfying schedules are encapsulated in a compressed OBDD-
	based representation. This solution format greatly increases
	the flexibility of the synthesis task by enabling
	incremental incorporation of additional constraints and by
	supporting solution space exploration without the need for
	rescheduling. The technique provides a systematic treatment
	of speculative operation execution in arbitrary forward-
	branching control/data paths. An iterative construction
	method is presented along with benchmark results. The
	experiments demonstrate the ability of the proposed
	technique to efficiently exploit parallelism not explicitly
	specified in the input description (48 Refs.)

AUTHOR:	Sait, S.M.; Elleithy, K.M.; Masud-Ul-Hasan
TITLE:	Formal synthesis of VLSI layouts from algorithmic
	specifications
SOURCE:	Computer Systems Science and Engineering, vol.11, no.2,
	p. 67-81
YEAR:	March 1996
ABSTRACT: Due to advances in VLSI technology, it is possible to
	implement complex digital systems on a single chip. However,
	modeling such large and complex systems at the structural
	level is tedious and error-prone. This fact has motivated
	the development of several high-level synthesis systems. The
	process consists of translating the abstract behavioural
	representation generally at the algorithmic level into a
	structure-realizable representation. In this paper, we
	present a formal approach for high-level synthesis. This
	formal high-level synthesis system uses mu -recursive
	algorithms to model the behaviour to be synthesized. These
	algorithms can be mathematically verified for correctness
	before being subjected to the task of translation. As a case
	study, the modelling and synthesis of VLSI layouts for
	matrix-matrix multipliers is discussed (37 Refs.)

AUTHOR:	Chen, C.H.
TITLE:	Data path synthesis in digital electronics. I. Memory
	allocation
SOURCE:	IEEE Transactions on Aerospace and Electronic Systems,
	vol.32, no.1, p. 2-15
YEAR:	Jan. 1996
ABSTRACT: A data path consists of memory elements (i.e., registers),
	data operators (i.e. ALUs) and interconnection units (i.e.
	buses) to control the data transfers in the digital system.
	Many approaches to hardware allocation for data path
	synthesis have been proposed in the literature; however,
	only single-port memory is considered for register
	allocation and no efficient synthesis approach for multiport
	memory synthesis. A novel design methodology for data path
	synthesis using multiport memories is proposed which can be
	applied to hardware allocation algorithms or to already
	synthesized data path as a postprocessor to achieve a better
	design. Illustrations of applying this method to different
	synthesis examples are presented. Results and improvements
	over previous techniques are demonstrated. Experiments on
	benchmarks show very promising results (22 Refs.)

AUTHOR:	Chen, C.H.
TITLE:	Data path synthesis in digital electronics. II. Bus
	synthesis
SOURCE:	IEEE Transactions on Aerospace and Electronic Systems,
	vol.32, no.1, p. 16-33
YEAR:	Jan. 1996
ABSTRACT: For pt. I see ibid., vol. 32, no. 1, p. 1-15 (1996). Common
	buses are an extremely efficient structure for achieving
	area minimization so that the bus-oriented interconnection
	of registers and data operators plays an important role in
	data path synthesis. The overriding design goal is
	efficiently allocating the minimum number of buses and
	gating elements (i.e. multiplexers) for achieving
	communication between the data path elements. New efficient
	algorithms for the automated allocation of buses in data
	paths have been developed. The entire allocation process can
	be formulated as a graph partitioning problem. This
	formulation readily lends itself to the use of a varieties
	of heuristics for solving the allocation problem We present
	efficient algorithms which provide excellent solutions to
	this formulation of the allocation problem The operation of
	the algorithms is clearly demonstrated using detailed
	examples (35 Refs.)

*******************************************************************************

1995
----


AUTHOR:	Hirakawa, Y.; Yoshida, M.; Harashima, K.; Fukunaga, K.
TITLE:	A method of data path allocation by pattern matching on the
	data flow graph
SOURCE:	VLSI Signal Processing, VIII (Cat. No.95TH8067), p. xv+582,
	254-63
YEAR:	1995
ABSTRACT: Allocation and scheduling are important tasks in high level
	synthesis. Allocation is an assignment of operations and
	variables into registers and functional units and
	interconnects these circuit units, while ensuring the given
	circuit conditions. Since the result of allocation has
	direct effects upon the size of a chip, design of allocation
	is strongly required to make use of hardware (i.e.
	registers, functional units, buses etc.) efficiently to
	minimize the size of a chip. In this paper, we propose an
	allocation method which minimizes the measure of area
	concerning the interconnection of circuit units, based on
	the dependence of operations given by the data flow graph
	(DFG) (4 Refs.)

AUTHOR:	Diguet, J.P.; Sentieys, O.; Philippe, J.L.; Martin, E.
TITLE:	Probabilistic resource estimation for pipeline architecture
SOURCE:	VLSI Signal Processing, VIII (Cat. No.95TH8067), p. xv+582,
	217-26
YEAR:	1995
ABSTRACT: This paper presents a new approach to resource estimation in
	high level synthesis. Given a set of operators and a data
	flow graph specification, we apply a probability based
	method to compute the probable numbers of operators,
	registers, bus and operator connections for each time step
	or algorithm latency. Combined with statistical metrics, we
	obtain a quick and accurate estimation module that includes
	real precedence constraints. The aim of this work is to
	provide ASICs designers with a guidance tool for a better
	use of high level synthesis. Algorithm properties like dated
	resource concurrence and operator link statistics are
	computed to guide algorithmic, transformations and hardware
	selection (13 Refs.)

AUTHOR:	Ito, R.; Parhi, K.K.
TITLE:	Register minimization in cost-optimal synthesis of DSP
	architectures
SOURCE:	VLSI Signal Processing, VIII (Cat. No.95TH8067), p. xv+582,
	207-16
YEAR:	1995
ABSTRACT: In this paper we propose a generalized technique to count
	the number of registers supporting overlapped scheduling and
	a general digit-serial data format. This technique is
	integrated into an integer linear programming model which
	minimizes the cost of registers as well as the cost of
	processors and data format converters to synthesize a cost-
	optimal architecture for a given digital signal processing
	algorithm. It is shown that by including the cost of
	registers in the synthesis task as proposed in this paper
	leads to up to 12.8% savings in the total cost of the
	synthesized architecture when compared with synthesis
	performed without including the register cost in the total
	cost (16 Refs.)

AUTHOR:	Munch, M.; Wehn, N.; Glesner, M.
TITLE:	Optimum simultaneous placement and binding for bit-slice
	architectures
SOURCE:	Proceedings of the ASP-DAC`95/CHDL`95/VLSI`95. Asia and
	South Pacific Design Automation Conference. IFIP
	International Conference on Computer Hardware Description
	Languages and their Applications. IFIP Interntional
	Conference on Very Large Scale Integration (IEEE Cat.
	No.95TH8102), p. xxxii+860, 735-40
YEAR:	1995
ABSTRACT: Traditionally, the problems of binding and placement in the
	synthesis of digital circuits have been formulated and
	solved separately. However, placement and binding strongly
	interact and design decisions taken during these phases
	determine the interconnect structure. As feature sizes
	continue to decrease, the delay caused by signal propagation
	through interconnect more and more dominate the overall
	system performance. Hence, optimizing interconnect becomes
	increasingly important. In this paper, we propose for the
	first time an analytical approach to capture the placement
	and binding problems in a single, unified Mixed Integer
	Linear Programming (MILP) model which also allows minimizing
	the overall interconnect structure. Such a model can serve
	as a starting point for deriving efficient heuristics in
	that it captures all information required for a
	comprehensive analysis of the problem. The target
	architecture is a linear bit-slice datapath, however, the
	model can be easily extended to handle two-dimensional
	datapath models (23 Refs.)

AUTHOR:	Rajan, S.P.
TITLE:	Correctness of transformations in high level synthesis
SOURCE:	Proceedings of the ASP-DAC`95/CHDL`95/VLSI`95. Asia and
	South Pacific Design Automation Conference. IFIP
	International Conference on Computer Hardware Description
	Languages and their Applications. IFIP Interntional
	Conference on Very Large Scale Integration (IEEE Cat.
	No.95TH8102), p. xxxii+860, 597-603
YEAR:	1995
ABSTRACT: This paper presents a formal approach to address the
	correctness of transformations in high-level synthesis. The
	novelty of the work is that a small set of properties that
	capture a general notion of refinement of control/data-flow
	graphs used in an industrial synthesis framework have been
	given, and the properties are independent of the underlying
	behaviour model. We have mechanized the specification and
	verification of several optimization and refinement
	transformations used in industrial hardware design. This
	work has enabled to find and rectify errors in the
	transformations. Further, the work has led to generalization
	of transformations typically used in high-level synthesis
	(22 Refs.)

AUTHOR:	Hartenstein, R.W.; Kress, R.
TITLE:	A datapath synthesis system for the reconfigurable datapath
	architecture
SOURCE:	Proceedings of the ASP-DAC`95/CHDL`95/VLSI`95. Asia and
	South Pacific Design Automation Conference. IFIP
	International Conference on Computer Hardware Description
	Languages and their Applications. IFIP Interntional
	Conference on Very Large Scale Integration (IEEE Cat.
	No.95TH8102), p. xxxii+860, 479-84
YEAR:	1995
ABSTRACT: A datapath synthesis system (DPSS) for the reconfigurable
	datapath architecture (rDPA) is presented. The DPSS allows
	automatic mapping of high level descriptions onto the rDPA
	without manual interaction. The required algorithms of this
	synthesis system are described in detail. Optimization
	techniques like loop folding or loop unrolling are sketched.
	The rDPA is scalable to arbitrarily large arrays and
	reconfigurable to be adaptable to the computational problem.
	Fine grained parallelism is achieved by using simple
	reconfigurable processing elements which are called datapath
	units (DPUs). The rDPA can be used as a reconfigurable ALU
	for bus oriented systems as well as for rapid prototyping of
	high speed datapaths (7 Refs.)

AUTHOR:	Masuda, A.; Imai, H.; Hansen, J.P.; Sekine, M.
TITLE:	Search space reduction in high level synthesis by use of an
	initial circuit
SOURCE:	Proceedings of the ASP-DAC`95/CHDL`95/VLSI`95. Asia and
	South Pacific Design Automation Conference. IFIP
	International Conference on Computer Hardware Description
	Languages and their Applications. IFIP Interntional
	Conference on Very Large Scale Integration (IEEE Cat.
	No.95TH8102), p. xxxii+860, 471-7
YEAR:	1995
ABSTRACT: Most existing high-level synthesis (HLS) systems attempt to
	generate a circuit from a behavioral description "out of the
	void", using the entire design space as the search domain.
	Because of the vastness of the search space, it is
	impossible to do more than a coarse grain search, often
	resulting in inefficient designs. This approach ignores the
	designer`s knowledge of the general structure of the circuit
	to be synthesized. In this paper, we describe the HLS system
	SIDER (Synthesis by Initial Design Extension and
	Refinement). SIDER utilizes designer knowledge about the
	design space in the form of an initial circuit. By limiting
	search to the neighborhood of this initial circuit, much
	finer grain search can be performed yielding a higher
	quality design. The effectiveness of the SIDER approach is
	shown by HLS of a 300 line C description of 27 instructions
	from a MC6502 CPU (11 Refs.)

AUTHOR:	Chapman, R.
TITLE:	Timed dependence flow graphs, an intermediate form for
	verified high-level synthesis
SOURCE:	Proceedings Eighth Annual IEEE International ASIC Conference
	and Exhibit (Cat. No.95TH8087), p. xv+422, 109-12
YEAR:	1995
ABSTRACT: We present timed dependence flow graphs, an intermediate
	form for high-level synthesis from specifications written in
	behavioral hardware description languages. Timed dependence
	flow graphs express data, control, resource access, and
	timing dependences, and call be constructed in linear time
	from behavioral VHDL descriptions. We also present a formal
	execution semantics for timed dependence flow graphs, which
	we are using in verification of our high-level synthesis
	tools (11 Refs.)

AUTHOR:	Hae-Dong Lee; Sun-Young Hwang
TITLE:	A scheduling algorithm for multiport memory minimization in
	datapath synthesis
SOURCE:	Proceedings of the ASP-DAC`95/CHDL`95/VLSI`95. Asia and
	South Pacific Design Automation Conference. IFIP
	International Conference on Computer Hardware Description
	Languages and their Applications. IFIP Interntional
	Conference on Very Large Scale Integration (IEEE Cat.
	No.95TH8102), p. xxxii+860, 93-100
YEAR:	1995
ABSTRACT: In this paper, we present a new scheduling algorithm that
	generates area-efficient register transfer level datapaths
	with multiport memories. The proposed scheduling algorithm
	assigns an operation to a specific control step such that
	maximal sharing of functional units can be achieved with
	minimal number of memory ports, while satisfying given
	constraints. We propose a measure of multiport memory cost,
	MAV (Multiple Access Variable) which is defined as a
	variable accessed at several control steps, and overall
	memory cost is reduced by equally distributing the MAVs
	throughout all the control steps. When compared with
	previous approaches for several benchmarks available from
	the literature, the proposed algorithm generates the
	datapaths with less memory modules and interconnection
	structures by reflecting the memory cost in the scheduling
	process (13 Refs.)

AUTHOR:	Heijligers, M.J.M.; Cluitmans, L.J.M.; Jess, J.A.G.
TITLE:	High-level synthesis scheduling and allocation using genetic
	algorithms
SOURCE:	Proceedings of the ASP-DAC`95/CHDL`95/VLSI`95. Asia and
	South Pacific Design Automation Conference. IFIP
	International Conference on Computer Hardware Description
	Languages and their Applications. IFIP Interntional
	Conference on Very Large Scale Integration (IEEE Cat.
	No.95TH8102), p. xxxii+860, 61-6
YEAR:	1995
ABSTRACT: In this article a scheduling method is presented which is
	capable of allocating supplementary resources during
	scheduling. This makes it very suitable in synthesis
	strategies based on lower bound estimations techniques. The
	method is based on genetic algorithms. Special coding
	techniques and analysis methods are used to improve the
	runtime and quality of the results. The scheduler can easily
	be extended to cover other architectural issues and (for
	example) provides ways to make trade-offs between functional
	unit allocation and register allocation. Experiments and
	comparisons show high quality results and fast run times
	that outperform results produced by other heuristic
	scheduling methods (23 Refs.)



AUTHOR:	Schonfeld, M.; Franzen, J.; Schwiegershausen, M.; Pirsch,
	P.; Vehlies, U.; Munzner, A.
TITLE:	The LISA design environment for the synthesis of array
	processors including memories for the data transfer and
	fault tolerance by reconfiguration and coding techniques
SOURCE:	Journal of VLSI Signal Processing, vol.11, no.1-2, p. 51-74
YEAR:	Oct.-Nov. 1995
ABSTRACT: The LISA design environment transforms computation extensive
	digital signal processing algorithms into array processor
	architectures. It supports the complete design flow from
	algorithmic specification in a high-level programming
	language to circuit description at the gate level. From the
	input description a graph representation is derived by
	symbolic execution and further mapped onto different
	architectures. Netlists in different formats can be
	extracted using the integrated synthesis of arithmetic
	building blocks. For the adaptation of the architectures to
	external interfaces LISA includes the synthesis of
	intermediate memories consisting of multiport RAMs or
	register circuits. Another part of LISA allows the
	application of reconfiguration and coding techniques at
	different design levels to incorporate fault tolerance into
	both, array processors and intermediate memories. By the
	homogeneous representation in LISA all parts of the design
	process are handled in the same environment. Thus, different
	architectures of array processors including data transfer,
	control, and fault tolerance can be compared with respect to
	area, computation time, and reliability (37 Refs.)

AUTHOR:	Gebotys, C.H.
TITLE:	An optimal methodology for synthesis of DSP multichip
	architectures
SOURCE:	Journal of VLSI Signal Processing, vol.11, no.1-2, p. 9-19
YEAR:	Oct.-Nov. 1995
ABSTRACT: An optimization approach to the synthesis of multichip DSP
	architectures is presented. This research is important for
	industry since it is well known that these early design
	decisions have a significant impact on the final VLSI
	implementation. A mathematical programming approach to
	simultaneously scheduling, partitioning (into multiple
	chips) and allocating minimum hardware (functional units on
	each chip) for the DSP application is formulated.
	Throughput, input/output timing, and latency constraints are
	supported along with interchip communication delays. By
	using polyhedral theory, the optimal solution to the integer
	programming problem can be obtained in fast cpu times.
	Results show that one can synthesize optimal two-chip, three-
	chip and four-chip architectures for a realistic industrial
	DSP application in reasonable cpu times. This research
	breaks new ground by simultaneously partitioning, scheduling
	and allocating multichip DSP architectures with optimal area
	in fast cpu times (21 Refs.)

AUTHOR:	Eisenbiegler, D.; Kumar, R.
TITLE:	Formally embedding existing high level synthesis algorithms
SOURCE:	Correct Hardware Design and Verification Methods. IFIP WG
	10.5 Advanced Research Working Conference, CHARME `95.
	Proceedings, p. viii+342, 71-83
YEAR:	1995
ABSTRACT: This paper introduces a general scheme for formally
	embedding high level synthesis by formulating its basic
	steps as transformations within higher order logic. A
	functional representation of a data flow graph is
	successively refined by means of generic logical
	transformations. Algorithms that are based on logical
	transformations guarantee "correctness by design". They not
	only construct an implementation but also derive the proof
	for its formal correctness, on the fly. An extra post-
	synthesis-verification step becomes obsolete. The logical
	transformations presented in this paper form a framework for
	formally embedding existing high-level-synthesis procedures
	(17 Refs.)

AUTHOR:	Hong-Shin Jun; Sun-Young Hwang
TITLE:	Design of a synthesis system for pipeline structures with
	variable data initiation intervals
SOURCE:	Journal of Microelectronic Systems Integration, vol.3, no.3,
	p. 189-203
YEAR:	Sept. 1995
ABSTRACT: In this paper, we propose a novel approach for synthesizing
	the dynamically configured pipelines supporting variable
	data initiation intervals (DIIs). Compared to the previous
	researches in which the pipeline synthesis is confined to
	those with fixed DIIs, the proposed system allows powerful
	design space exploration by removing the restriction of
	fixed DIIs. The proposed algorithm tries to optimize the
	area of pipeline by fully utilizing hardware resources to
	which abstract operations in high-level design descriptions
	are assigned, while meeting the given time constraints in
	the data initiation intervals, the clock cycle time and the
	number of stages (15 Refs.)

AUTHOR:	Kim, T.; Liu, C.L.
TITLE:	A new approach to the multiport memory allocation problem in
	data path synthesis
SOURCE:	Integration, The VLSI Journal, vol.19, no.3, p. 133-60
YEAR:	Nov. 1995
ABSTRACT: A new approach to the problem of allocation of multiport
	memory modules for data storage in data path synthesis is
	presented. Previous approaches solve the allocation problem
	in two separate steps: (i) grouping the variables (or
	registers) to form memory modules and (ii) determining the
	interconnections between the memory modules and functional
	units. Yet, there is no easy way to predict the result of
	step (ii) during step (i). In our approach, we place primary
	importance on the cost of interconnections. Consequently, we
	try to minimize the cost of interconnections first and then
	to group the variables to form memory modules later. For a
	number of benchmark problems, it was demonstrated that this
	approach is quite effective and produces very good solutions
	to the allocation problem for multiport-memory-based
	architecture (20 Refs.)

AUTHOR:	Potkonjak, M.; Wolf, W.
TITLE:	Cost optimization in ASIC implementation of periodic hard-
	real time systems using behavioral synthesis techniques
SOURCE:	1995 IEEE/ACM International Conference on Computer-Aided
	Design. Digest of Technical Papers (Cat. No.95CB35859),
	p. xxviii+743, 446-51
YEAR:	1995
ABSTRACT: Modern applications are often defined as sets of several
	computational tasks. This paper presents a synthesis
	algorithm for ASIC implementations which realize multiple
	computational tasks under hard real-time deadlines. The
	algorithm analyzes constraints imposed by task sharing as
	well as the traditional datapath synthesis criteria. In
	particular we demonstrated an efficient technique to combine
	rate-monotonic scheduling, a widely used hard real-time
	systems scheduling discipline, with estimations and
	scheduling and allocation algorithms. Matching the number of
	bits in tasks assigned to the same processor was the most
	important factor in obtaining good designs. We have
	demonstrated the effectiveness of our algorithms on several
	multiple-task examples (21 Refs.)

AUTHOR:	Wing Hang Wong; Jain, R.
TITLE:	PARAS: System-level concurrent partitioning and scheduling
SOURCE:	1995 IEEE/ACM International Conference on Computer-Aided
	Design. Digest of Technical Papers (Cat. No.95CB35859),
	p. xxviii+743, 440-5
YEAR:	1995
ABSTRACT: Partitioning for the ASIC designs is examined and the
	interaction between high-level synthesis and partitioning is
	studied and incorporated in the solution. Four algorithms
	(called PARAS) which can exploit this interaction by solving
	the scheduling and partitioning problems concurrently are
	presented. PARAS maximizes the overall performance of the
	final design and considers different chip configurations and
	communication structures. Experiments, conducted with
	specifications ranging in size from few to hundreds of
	operations, demonstrate the success of this approach
	(15 Refs.)

AUTHOR:	Kropf, T.; Schneider, K.; Kumar, R.
TITLE:	A formal framework for high level synthesis
SOURCE:	Theorem Provers in Circuit Design. Theory, Practice and
	Experience. Second International Conference, TPCD '94.
	Proceedings, p. viii+303, 223-38
YEAR:	1995
ABSTRACT: In this paper, we propose a new approach to formal synthesis
	which focuses on the generation of verification-friendly
	circuits. Starting from a high-level implementation
	description, which may result from the application of usual
	scheduling and allocation algorithms, hardware is
	automatically synthesized. The target architecture is based
	on handshake processes, modules which communicate by a
	simple synchronizing handshake protocol. The circuits result
	from the application of only a few basic operations like
	synchronization, sequential execution or iteration of base
	handshake processes. Each process is guided by an abstract
	theorem that is used to derive proof obligations, to be
	justified after synthesis. Automation has been achieved to
	the extend that only those "relevant" proof obligations
	remain to be proven manually, e.g. theorems for data-
	dependent loops and lemmata about the used data types. The
	process-oriented implementation language is enriched by loop
	invariants. If those are given prior to the synthesis
	process and the underlying data types are only Booleans,
	i.e. finite-length bitvectors, then the complete synthesis
	and verification process runs automatically (17 Refs.)

AUTHOR:	Shih-Hsu Huang; Yu-Chin Hsu; Yen-Jen Oyang
TITLE:	A new scheduling algorithm for synthesizing the control
	blocks of control-dominated circuits
SOURCE:	Microprocessing & Microprogramming, vol.41, no.7, p. 501-19
YEAR:	Nov. 1995
ABSTRACT: This paper describes a new scheduling algorithm for
	automatic synthesis of the control blocks of control-
	dominated circuits. The proposed scheduling algorithm is
	distinctive in its approach to partition a control/data flow
	graph (CDFG) into an equivalent state transition graph. It
	works on the CDFG to exploit operation relocation, chaining,
	duplication, and unification. The optimization goal is to
	schedule each execution path as fast as possible. Benchmark
	data shows that this approach achieved better results over
	the previous ones in terms of the speedup of the circuit and
	the number of states and transitions (18 Refs.)

AUTHOR:	Jong, C.C.; Lam, Y.Y.H.
TITLE:	Using time zones for data path allocation in high-level
	synthesis of digital systems
SOURCE:	International Journal of Electronics, vol.79, no.5, p. 627-
	40
YEAR:	Nov. 1995
ABSTRACT: The paper describes the data path allocation in high-level
	synthesis of digital systems using a new approach named the
	time-zone approach. Using the time-zone approach, where time
	steps are partitioned into zones, the register allocation
	and the module allocation for a given scheduled data flow
	graph (DFG) are performed in the same phase. The number of
	registers required to store the values in the DFG is
	minimized, and at the same time the interconnections between
	the registers and the modules (as well as the number of
	multiplexers required) are optimized. The experimental
	results obtained from testing several published benchmarks
	show that the allocation results are improved in terms of
	reduction of interconnections and the number of multiplexers
	and registers (15 Refs.)

AUTHOR:	Septien, J.; Mozos, D.; Tirado, J.F.; Hermida, R.;
	Fernandez, M.; Mecha, H.
TITLE:	FIDIAS: an integral approach to high-level synthesis
SOURCE:	IEE Proceedings-Circuits, Devices and Systems, vol.142,
	no.4, p. 227-35
YEAR:	Aug. 1995
ABSTRACT: An integral approach to the problem of high-level synthesis
	of digital architectures as implemented in the FIDIAS
	synthesis system is described. The synthesis task is split
	into different subtasks, the interrelation between them is
	no longer ignored as happens in most systems. One of the
	system's outstanding features is the control of global
	system behaviour performed by an expert system that deals
	with such interrelations and aims it towards the required
	goals. The most important subtasks are explained in detail.
	Among these subtasks, cycle-time estimation, operation
	scheduling, allocation search algorithm and area estimation
	are found. Heuristics for intelligent bounding of the design
	space during allocation are shown. Also presented is one of
	the most novel features of FIDIAS, which is the use of
	heuristics to guide the search performed by the allocation
	algorithm through the accessible design space (28 Refs.)

AUTHOR:	Hae-Dong Lee; Sun-Young Hwang
TITLE:	Design of an area-efficient allocation algorithm for
	datapaths with multiport memories
SOURCE:	Journal of Microelectronic Systems Integration, vol.3, no.1,
	p. 3-17
YEAR:	March 1995
ABSTRACT: In this paper, we present a heuristic algorithm that
	generates area-efficient datapaths with multiport memories.
	The proposed algorithm assigns variables to multiport memory
	modules, so that overall cost can be reduced. In the
	process, efforts are made to reduce the number of memory
	modules, the number of elements in each memory module, and
	the number of interconnections in the generated datapaths.
	Interconnection cost is further reduced in a number of
	multiplexer inputs by exchanging connections between memory
	modules and FUs performing commutative operations in the
	interconnection binding phase. Experimental results show the
	effectiveness of the proposed algorithm. When compared with
	previous approaches for several benchmarks available from
	the literature, the proposed algorithm generates the
	datapaths with fewer registers in memory modules and less
	interconnection cost (17 Refs.)

AUTHOR:	Ohm, S.Y.; Kurdahi, F.J.; Dutt, N.; Xu, M.
TITLE:	A comprehensive estimation technique for high-level
	synthesis
SOURCE:	Proceedings of the Eighth International Symposium on System
	Synthesis (IEEE Cat. No.95TH8050), p. xiii+175, 122-7
YEAR:	1995
ABSTRACT: We present an integrated approach aimed at predicting layout
	area needed to implement a behavioral description for a
	given performance goal. Our approach is novel because: (1)
	it accounts for all types of RT level components (FUs,
	buses, registers), (2) it is highly flexible, allowing the
	designer to tradeoff one type of resource with another and
	considers dependencies between these different types, (3) it
	is vertically integrated to include provably accurate
	physical level estimators, and hence provides realistic
	accounting of layout effects, and (4) it uses a timing model
	with finer granularity, accounting for various delays in RTL
	datapaths. We demonstrate our technique on a variety of HLS
	benchmarks and show that efficient and effective design
	space exploration can be accomplished using this technique
	(21 Refs.)

AUTHOR:	Musoll, E.; Cortadella, J.
TITLE:	Scheduling and resource binding for low power
SOURCE:	Proceedings of the Eighth International Symposium on System
	Synthesis (IEEE Cat. No.95TH8050), p. xiii+175, 104-9
YEAR:	1995
ABSTRACT: Decisions taken at the earliest steps of the design process
	may have a significant impact on the characteristics of the
	final implementation. This paper illustrates how power
	consumption issues can be tackled during the scheduling and
	resource-binding steps of high-level synthesis. Algorithms
	for these steps targeting at low-power data-paths and
	trading off, in some cases, speed and area for low power are
	presented. The algorithms focus on reducing the activity of
	the functional units (adders, multipliers) by minimizing the
	transitions of their input operands. The power consumption
	of the functional units accounts for a large fraction of the
	overall data-path power budget (21 Refs.)

TEM:	172
AUTHOR:	Chaudhuri, S.; Blythe, S.A.; Walker, R.A.
TITLE:	An exact methodology for scheduling in a 3D design space
SOURCE:	Proceedings of the Eighth International Symposium on System
	Synthesis (IEEE Cat. No.95TH8050), p. xiii+175, 78-83
YEAR:	1995
ABSTRACT: This paper describes an exact solution methodology,
	implemented in Rensselaer's Voyager design space exploration
	system, for solving the scheduling problem in a 3-
	dimensional (3D) design space: the usual 2D design space
	(which trades off area and schedule length), plus a third
	dimension representing clock length. Unlike design space
	exploration methodologies which rely on bounds or estimates,
	this methodology is guaranteed to find the globally optimal
	solution to the 3D scheduling problem. Furthermore, this
	methodology efficiently prunes the search space, eliminating
	provably inferior design points through: a careful selection
	of candidate clock lengths; and tight bounds on the number
	of functional units of each type or on the schedule length
	(27 Refs.)

AUTHOR:	Economakos, G.; Papakonstantinou, G.; Tsanakas, P.
TITLE:	An attribute grammar approach to high-level automated
	hardware synthesis
SOURCE:	Information and Software Technology, vol.37, no.9, p. 493-
	502
YEAR:	Sept. 1995
ABSTRACT: Attribute grammars have been used extensively in every phase
	of traditional compiler construction. Since some of these
	phases have also been used in automated hardware synthesis
	(hardware compilation), attribute grammars can be
	effectively adopted to handle the two major tasks of high
	level hardware synthesis, operation scheduling and hardware
	allocation, implementing various algorithms. The paper
	presents an attribute grammar driven scheduling system, as a
	more abstract way of handling the whole high level hardware
	synthesis task, while maintaining the desired functionality
	by the utilization of existing and well tested tools and
	techniques transferred from traditional compiler
	construction (32 Refs.)

AUTHOR:	Al-Sukhni, H.F.; Youssef, H.; Sait, S.M.; Benten, M.S.T.
TITLE:	Loop based scheduling for high level synthesis
SOURCE:	Conference Proceedings of the 1995 IEEE Fourteenth Annual
	International Phoenix Conference on Computers and
	Communications (Cat. No.95CH35751), p. xvii+742, 76-81
YEAR:	1995
ABSTRACT: This paper describes a new loop based scheduling algorithm.
	The algorithm aims at reducing the runtime processing
	complexity of path based scheduling techniques. It
	partitions the control flow graph of the input specification
	into subgraphs before scheduling the different paths of each
	subgraph. Benchmark tests as well as simulation results on
	the scheduling algorithm indicate that the proposed
	algorithm results in sizeable reduction in runtime (9 Refs.)

AUTHOR:	Rahmouni, M.; Jerraya, A.A.
TITLE:	PPS: a pipeline path-based scheduler
SOURCE:	Proceedings. The European Design and Test Conference. ED&TC
	1995 (Cat. No.95TH8058), p. xxvii+611, 557-61
YEAR:	1995
ABSTRACT: This paper presents a scheduling algorithm that improves on
	other approaches when dealing with the synthesis of control-
	flow dominated behavioral descriptions. It achieves this
	through the use of a constraint-driven path-based scheduling
	algorithm. The suboptimality of the original path-based
	algorithms when dealing with loops is overcome through a new
	technique for pipelining different loop iterations during
	execution path generation. Results show that the algorithm
	always generates the fastest solution in terms of clock
	cycles (9 Refs.)

AUTHOR:	Holtmann, U.; Ernst, R.
TITLE:	Combining MBP-speculative computation and loop pipelining in
	high-level synthesis
SOURCE:	Proceedings. The European Design and Test Conference. ED&TC
	1995 (Cat. No.95TH8058), p. xxvii+611, 550-6
YEAR:	1995
ABSTRACT: Frequent control dependencies caused by IF- and loop-
	statements limit the parallelism usable in High-Level
	Synthesis. Loop pipelining is a powerful way to increase
	parallelism, but is often limited by these control
	dependencies. Multiple branch prediction (MBP-SC) applies
	loop pipelining and speculative computation to the most
	probable path and serves other paths during the restore
	phase (prediction error correction). In this paper we
	combine MBP-SC and loop pipelining and give a scheduling
	algorithm. Further MBP-SC improvement comes from parallel
	branch execution. The results show a considerable speedup
	compared to previous approaches (12 Refs.)

AUTHOR:	Minjoong Rim; Yaw Fann; Rajiv Jain
TITLE:	Global scheduling with code-motions for high-level synthesis
	applications
SOURCE:	IEEE Transactions on Very Large Scale Integration (VLSI)
	Systems, vol.3, no.3, p. 379-92
YEAR:	Sept. 1995
ABSTRACT: In this paper, we present a global scheduling technique for
	synthesis applications. The algorithm accepts a
	specification containing conditional branches and while-loop
	constructs and schedules it for a given set of resources.
	The algorithm performs several types of code motions across
	different basic blocks and trades off cost with performance.
	Several real-life examples taken from Numerical Recipes in C
	are used to demonstrate the efficacy of the approach. The
	results indicate that code-motions are very important for
	achieving significant speed-ups for synthesis applications
	(57 Refs.)

AUTHOR:	Kucukcakar, K.; Parker, A.C.
TITLE:	A methodology and design tools to support system-level VLSI
	design
SOURCE:	IEEE Transactions on Very Large Scale Integration (VLSI)
	Systems, vol.3, no.3, p. 355-69
YEAR:	Sept. 1995
ABSTRACT: System-level design involves making major design decisions
	without having accurate information on the eventual system
	characteristics. This paper presents a novel constraint-
	driven methodology to support system-level design. The
	software assists a designer or a tool in partitioning
	behavioral specifications onto multiple VLSI chips and in
	system design while satisfying hard constraints such as
	individual chip areas, chip pin counts, system throughput
	(inverse of system initiation interval) and system latency
	(delay). The software uses search and estimation techniques
	to perform comprehensive design-space exploration and
	evaluates partitions supplied by the user or by other
	synthesis software. The technique determines what design
	characteristics each partition must possess in order to
	satisfy area, pin, throughput and latency constraints. The
	paper also includes results of extensive experiments with
	the methodology (24 Refs.)

AUTHOR:	Sharma, A.; Jain, R.
TITLE:	InSyn: integrated scheduling for DSP applications
SOURCE:	IEEE Transactions on Signal Processing, vol.43, no.8,
	p. 1966-77
YEAR:	Aug. 1995
ABSTRACT: We present the InSyn algorithm for high-level synthesis of
	DSP applications. InSyn combines allocation and scheduling
	of functional, storage, and interconnect units into a single
	phase and uses the following unique optimizations. (i) The
	concept of register states (free, busy, and undecided) is
	used for optimizing registers in a partial schedule where
	lifetimes of data values are not yet available. (ii)
	Reusable data values and broadcast are used to alleviate bus
	contention. (iii) InSyn can alternate between performance-
	guided and resource-guided measures. For example, InSyn can
	forgo its priority in favor of completing partially
	evaluated paths when the availability of allocated registers
	becomes low. (iv) InSyn ran selectively increase execution
	time of noncritical operations to alleviate bus contention.
	(V) InSyn can optimize and trade off distinct (functional
	units, interconnect, and registers) resource sets
	concurrently leading to more area-delay efficient designs.
	(vi) InSyn utilizes estimation tools towards resource
	allocation, design space pruning, and evaluation of
	synthesized designs. The experiments show that the features
	incorporated in inSyn result in very good designs (26 Refs.)

AUTHOR:	Potkonjak, M.; Dey, S.; Roy, R.K.
TITLE:	Behavioral synthesis of area-efficient testable designs
	using interaction between hardware sharing and partial scan
SOURCE:	IEEE Transactions on Computer-Aided Design of Integrated
	Circuits and Systems, vol.14, no.9, p. 1141-54
YEAR:	Sept. 1995
ABSTRACT: We introduce BETS, a behavioral test synthesis system, for
	the synthesis of high-throughput, area-efficient testable
	designs. While hardware sharing is a powerful technique to
	achieve area efficiency, it may adversely affect the
	testability of the synthesized design by introducing new
	loops. Besides CDFG loops, hardware sharing introduces three
	other types of loops: assignment loops, sequential false
	loops, and register files cliques. We provide a
	comprehensive analysis and a formal grammar characterization
	of the sources of loops in the data path during behavioral
	synthesis. Partial scan is a cost-effective technique for
	sequential circuit testing. Hardware sharing of scan
	registers can be used to minimize the number of scan
	registers required to synthesize data paths with minimal
	number of loops. The scan registers can be shared amongst
	several variables of the CDFG, to break not only the loops
	in the CDFG, but also the very loops introduced in the data
	path by hardware sharing. A new random walk based algorithm
	is proposed to break all CDFG loops using a minimal number
	of scan registers. The subsequent scheduling and assignment
	phase avoids formation of loops in the data path by reusing
	the scan registers, while ensuring high resource
	utilization. The experimental results demonstrate the
	effectiveness of the new technique to synthesize easily
	testable data paths, with nominal hardware overhead, while
	maintaining the performance of the designs. The partial scan
	overhead incurred by the technique is significantly less
	than that of a gate-level partial scan approach (44 Refs.)

AUTHOR:	Verhaegh, W.F.J.; Lippens, P.E.R.; Aarts, E.H.L.; Korst,
	J.H.M.; van Meerbergen, J.L.; van der Werf, A.
TITLE:	Improved force-directed scheduling in high-throughput
	digital signal processing
SOURCE:	IEEE Transactions on Computer-Aided Design of Integrated
	Circuits and Systems, vol.14, no.8, p. 945-60
YEAR:	Aug. 1995
ABSTRACT: This paper discusses improved force-directed scheduling and
	its application in the design of high-throughput DSP
	systems, such as real-time video VLSL circuits. We present a
	mathematical justification of the technique of force-
	directed scheduling, introduced by Paulin and Knight (1989),
	and we show how the algorithm can be used to find cost-
	effective time assignments and resource allocations,
	allowing trade-offs between processing units and memories.
	Furthermore, we present modifications that improve the
	effectiveness and the efficiency of the algorithm. The
	significance of the improvements is illustrated by an
	empirical performance analysis based on a number of problem
	instances (30 Refs.)

AUTHOR:	Dhodhi, M.K.; Hielscher, F.H.; Storer, R.H.; Bhasker, J.
TITLE:	Datapath synthesis using a problem-space genetic algorithm
SOURCE:	IEEE Transactions on Computer-Aided Design of Integrated
	Circuits and Systems, vol.14, no.8, p. 934-44
YEAR:	Aug. 1995
ABSTRACT: This paper presents a new approach to datapath synthesis
	based on a problem-space genetic algorithm (PSGA). The
	proposed technique performs concurrent scheduling and
	allocation of functional units, registers, and multiplexers
	with the objective of finding both a schedule and an
	allocation which minimizes the cost function of the hardware
	resources and the total time of execution. The problem-space
	genetic algorithm based datapath synthesis system (PSGA-
	Synth) combines a standard genetic algorithm with a known
	heuristic to search the large design space in an intelligent
	manner. PSGA-Synth handles multicycle functional units,
	structural pipelining, conditional code and loops, and
	provides a mechanism to specify lower and upper bounds on
	the number of control steps. The PSGA-Synth was tested on a
	set of problems selected from the literature, as well as
	larger problems created by us, with promising results. PSGA-
	Synth not only finds the best known results for all the test
	problems examined in a relatively small amount of CPU time,
	but also has the ability to efficiently handle large
	problems (21 Refs.)

AUTHOR:	Kawaguchi, T.; Todaka, T.
TITLE:	Operation scheduling by annealed neural networks
SOURCE:	IEICE Transactions on Fundamentals of Electronics,
	Communications and Computer Sciences, vol.E78-A, no.6,
	p. 656-63
YEAR:	June 1995
ABSTRACT: The operation scheduling is an important subtask in the
	automatic synthesis of digital systems. Many greedy
	heuristics have been proposed for the operation scheduling,
	but they cannot find the globally best schedule. In this
	paper we present an algorithm to construct near optimal
	schedules. The algorithm combines characteristics of
	simulated annealing and neural networks. The neural network
	used in our scheduling algorithm is similar to that proposed
	by Hellstrom et al. (1992). However, while the problems of
	these neural networks have a single type of constraint, the
	problem considered in this paper has three types of
	constraints. As a result, the energy function of the
	proposed neural network is given by the weighted sum of
	three energy functions. To minimize the weighted sum of two
	or more energy functions, conventional methods try to find a
	good set of weights using a trial and error method. Our
	algorithm takes a different approach than these methods.
	Results of the experiments show that the proposed algorithm
	can be used as an alternative heuristic for solving the
	operation scheduling problem. In addition, the proposed
	algorithm can exploit the inherent parallelism of the neural
	network (13 Refs.)

AUTHOR:	Wilson, T.C.; Mukherjee, N.; Garg, M.K.; Banerji, D.K.
TITLE:	An ILP solution for optimum scheduling, module and register
	allocation, and operation binding in datapath synthesis
SOURCE:	VLSI Design, vol.3, no.1, p. 21-36
YEAR:	1995
ABSTRACT: We present an integrated and optimal solution to the
	problems of operator scheduling, module and register
	allocation, and operator binding in datapath synthesis. The
	solution is based on an integer linear programming (ILP)
	model that minimizes a weighted sum of module area and total
	execution time under very general assumptions of module
	capabilities. In particular, a module may execute an
	arbitrary combination of operations, possibly using
	different numbers of control steps for different operations.
	Furthermore, operations may be implemented by a variety of
	modules, possibly requiring different numbers of control
	steps depending on the modules chosen. This generality in
	the complexity and mixture of modules is unique to our
	system and leads to an optimum selection of modules to meet
	specified design constraints. Significant extensions include
	the ability to incorporate pipelined functional units and
	operator chaining in an integrated manner. Straightforward
	extension to multi-block synthesis is discussed briefly but
	the details are omitted due to space considerations
	(13 Refs.)


AUTHOR:	Duncan, A.A.; Hendry, D.C.
TITLE:	High-level synthesis of DSP datapaths by global optimisation
	of variable lifetimes
SOURCE:	IEE Proceedings-Computers and Digital Techniques, vol.142,
	no.3, p. 215-24
YEAR:	May 1995
ABSTRACT: COBRA (Column-Oriented Butted Regular Architecture) is a
	behavioural high-level synthesis tool for datapath-dominated
	applications. It globally optimises the synthesised datapath
	by performing the scheduling and allocation tasks
	simultaneously. COBRA uses a bit-sliced target architecture
	and layout style which, when compared with conventional
	approaches, significantly reduces the area of the final
	datapaths. The synthesis problem is formulated as an
	optimisation problem on the configuration of variable
	lifetimes when mapped into a 3D 'datapath space'. The
	configuration of the data in the datapath space implies the
	structure required to achieve the data configuration and
	hence the datapath. Simulated annealing is used to optimise
	the solution. A description is given of the target
	architecture, the mapping of the input description into the
	datapath space, the optimisation of the data configuration
	in the datapath space, and the post-processing operations.
	Results for a number of examples are presented (44 Refs.)

AUTHOR:	Shih-Hsu Huang; Cheng-Tsung Hwang; Yu-Chin Hsu; Yen-Jen
	Oyang
TITLE:	A new approach to schedule operations across nested-ifs and
	nested-loops
SOURCE:	Microprocessing & Microprogramming, vol.41, no.1, p. 37-52
YEAR:	April 1995
ABSTRACT: This paper presents a new global scheduling algorithm for
	automatic synthesis of the control blocks of special-purpose
	microprocessors. The main distinction of the proposed
	algorithm is that it exploits the inheritances of structured
	programs. The optimization goal is to maximize the speedup
	of the processor and minimize the size of the control block.
	If compared with existing global scheduling algorithms such
	as trace scheduling, tree compaction, and percolation
	scheduling, the proposed algorithm consistently achieves
	better results in terms of the speedup of the processor and
	the size of the control block (12 Refs.)

AUTHOR:	Parhi, K.K.
TITLE:	High-level algorithm and architecture transformations for
	DSP synthesis
SOURCE:	Journal of VLSI Signal Processing, vol.9, no.1-2, p. 121-43
YEAR:	Jan. 1995
ABSTRACT: This survey paper reviews numerous high-level transformation
	techniques which can be applied at the algorithm or the
	architecture level to improve the performance of digital
	signal and image processing architectures and circuits
	implemented using VLSI technology. Successful design of VLSI
	signal and image processors requires careful selection of
	algorithms, architectures, implementation styles, and
	synthesis techniques. High-level transformations can play an
	important role in reducing silicon area or power at the same
	speed or in increasing the speed for same area. These
	transformations can also increase the suitability of an
	algorithm for a particular architectural style. The
	transformation techniques reviewed in this paper include
	pipelining, parallel processing, retiming, unfolding,
	folding, look-ahead, relaxed look-ahead, associativity,
	distributivity, and reduction in strength (94 Refs.)

AUTHOR:	Van Meerbergen, J.L.; Lippens, P.E.R.; Verhaegh, W.E.J.; Van
	der Werf, A.
TITLE:	PHIDEO: high-level synthesis for high throughput
	applications
SOURCE:	Journal of VLSI Signal Processing, vol.9, no.1-2, p. 89-104
YEAR:	Jan. 1995
ABSTRACT: The paper describes a new approach to high-level synthesis
	for high throughput applications. Such applications are
	typically found in real-time video systems such as HDTV. The
	method is capable of dealing with hierarchical flow graphs
	containing loops with manifest boundaries and linear index
	expressions. The algorithm is based on the model of periodic
	operations which allows optimizations across loop
	boundaries. Processing units and storage units are minimized
	simultaneously. The algorithm is implemented in the PHIDEO
	system. The major parts of this system are the processing
	unit synthesis, the scheduler and the memory synthesis
	including address generation (37 Refs.)

AUTHOR:	Verbauwhede, I.; Rabaey, J.M.
TITLE:	Synthesis for real time systems: solutions and challenges
SOURCE:	Journal of VLSI Signal Processing, vol.9, no.1-2, p. 67-88
YEAR:	Jan. 1995
ABSTRACT: A real time system typically combines a variety of
	implementation technologies and hardware architectures.
	Deciding how to partition the system and selecting an
	architectural technology for the sub-systems is by no means
	a trivial task. These architectural decisions have to be
	made at the early stages in the design process, when the
	impact of the decisions is unclear and can only be
	quantified using some primitive measures. the authors
	present their vision on how a next generation of design
	environment can aid the designer in this decision process.
	They first identify the problems of designing a
	heterogeneous real time system by walking through the design
	process of a complex speech recognition system. Based on
	this analysis, they propose a system design methodology
	build on top of current synthesis tools. Today, DSP
	synthesis tools are application and/or architecture
	specific, covering subparts of the application once the
	partitioning is made. To make them useful in the proposed
	methodology, a unified view on the underlying architecture
	assumptions is needed. Secondly, good decision making
	requires an "as-good-as-possible" estimation of the
	implications of the decision. Therefore, it is important
	that current manual estimation be enlarged by high level
	estimation and performance analysis tools. The HYPERSPACE
	environment, which is currently under development,
	therefore, consists of three complementary components: a set
	of architecture specific compilers, a set of estimation and
	performance analysis tools and an architecture selection and
	partitioning framework, steered by the designer (54 Refs.)

AUTHOR:	Goossens, G.; Lanneer, D.; Pauwels, M.; Depuydt, F.;
	Schoofs, K.; Kifli, A.; Cornero, M.; Petroni, P.; Catthoor,
	F.; de Man, H.
TITLE:	Integration of medium-throughput signal processing
	algorithms on flexible instruction-set architectures
SOURCE:	Journal of VLSI Signal Processing, vol.9, no.1-2, p. 49-65
YEAR:	Jan. 1995
ABSTRACT: Integrated circuits in telecommunications and consumer
	electronics are rapidly evolving towards single chip
	solutions. New IC architectures are emerging, which combine
	instruction-set processor cores with customised hardware.
	The paper describes a high-level synthesis system for
	integration of real-time signal processing systems on such
	processor cores. The compiler supports a flexible
	architectural model. It can handle certain types of
	incompletely specified architectures, and offers
	capabilities for retargetable compilation and architectural
	exploration. Results for a realistic application from the
	domain of audio processing indicate the feasibility and
	power of the presented approach (43 Refs.)

AUTHOR:	Ching-Yi Wang; Parhi, K.K.
TITLE:	High-level DSP synthesis using concurrent transformations,
	scheduling, and allocation
SOURCE:	IEEE Transactions on Computer-Aided Design of Integrated
	Circuits and Systems, vol.14, no.3, p. 274-95
YEAR:	March 1995
ABSTRACT: This paper addresses high-level synthesis methodologies for
	dedicated digital signal processing (DSP) architectures used
	in the iterative Loop-based Minnesota Architecture Synthesis
	(MARS) design system. We present a novel concurrent
	scheduling and resource allocation algorithm which exploits
	inter-iteration and intra-iteration precedence constraints.
	The novel algorithm implicitly performs algorithmic
	transformations, such as pipelining and retiming, on the
	data-flow graphs during the scheduling process to produce
	solutions which are as good as those previously published
	and which executes in less time. MARS is capable of
	producing optimal and near-optimal schedules in fractions of
	seconds. Previous synthesis systems have focused on DSP
	algorithms which have single or lumped delays in the
	recursive loops. In contrast, MARS is capable of generating
	valid architectures for algorithms which have randomly
	distributed delays. MARS exploits these delays to produce
	more efficient architectures and allows our system to be
	more general. We are able to synthesize architectures which
	meet the iteration bound of any algorithm by unfolding,
	retiming, and pipelining the original data-flow graph
	(49 Refs.)

AUTHOR:	Ohm, S.Y.; Kurdahi, F.J.; Jhon, C.S.
TITLE:	An optimal scheduling approach using lower bound in high-
	level synthesis
SOURCE:	IEICE Transactions on Information and Systems, vol.E78-D,
	no.3, p. 231-6
YEAR:	March 1995
ABSTRACT: The paper describes an optimal scheduling approach which
	finds the scheduling result of the minimum functional unit
	cost under the given timing constraint. In this method, a
	well-defined search space is constructed incrementally and
	traversed in a branch-and-bound manner. During the
	traversal, tighter lower bounds are estimated and utilized
	coupled with the upper bound on the optimal solution in
	pruning the search space effectively. This method is
	extended to support multi-cycling operations, operation
	chaining, pipelined functional units, and pipelined data
	paths. Experimental results on some benchmarks show the
	efficiency of the proposed approach (9 Refs.)

AUTHOR:	Radivojevic, I.P.; Brewer, F.
TITLE:	Symbolic scheduling techniques
SOURCE:	IEICE Transactions on Information and Systems, vol.E78-D,
	no.3, p. 224-30
YEAR:	March 1995
ABSTRACT: Describes an exact symbolic formulation of resource-
	constrained scheduling which allows speculative operation
	execution in arbitrary forward-branching control/data paths.
	The technique provides a closed-form solution set in which
	all satisfying schedules are encapsulated in a compressed
	OBDD (ordered binary decision diagram) based representation.
	An iterative construction method is presented, along with
	benchmark results. The experiments demonstrate the ability
	of the proposed technique to efficiently extract parallelism
	not explicitly specified in the input description (17 Refs.)

AUTHOR:	Srivastava, M.B.; Potkonjak, M.
TITLE:	Optimum and heuristic transformation techniques for
	simultaneous optimization of latency and throughput
SOURCE:	IEEE Transactions on Very Large Scale Integration (VLSI)
	Systems, vol.3, no.1, p. 2-19
YEAR:	March 1995
ABSTRACT: Although throughput alone can be arbitrarily improved for
	several classes of systems using previously published
	techniques, none of those approaches are effective when
	latency constraints, which are increasingly important in
	embedded DSP systems, are considered. After formally
	establishing the relationship between latency and throughput
	in general computation, we explore the effect of pipelining
	on latency, and establish necessary and sufficient
	conditions under which pipelining does not alter latency.
	Many systems are either linear, or have subsystems that are
	linear. For such cases we have used a state-space based
	approach that treats various transformations in an
	integrated fashion, and answers analytically whether it is
	possible to simultaneously meet any given combination of
	constraints on latency and throughput, The analytic approach
	is constructive in nature, and produces a complete
	implementation when feasibility conditions are fulfilled. We
	also present a suboptimal but hardware efficient heuristic
	approach for the special case of initially-relaxed single-
	input single-output linear time-invariant computations. A
	novel software platform consisting of a high-level synthesis
	system coupled to a symbolic algebra system was used to
	implement the proposed algorithm transformations. Instead of
	optimizing to improve throughput and latency, our
	transformations can also be used to increase the
	implementation efficiency while achieving the same latency
	and throughput as the original design (23 Refs.)

AUTHOR:	Ahmad, I.; Dhodhi, M.K.; Chen, C.Y.R.
TITLE:	Integrated scheduling, allocation and module selection for
	design-space exploration in high-level synthesis
SOURCE:	IEE Proceedings-Computers and Digital Techniques, vol.142,
	no.1, p. 65-71
YEAR:	Jan. 1995
ABSTRACT: High-level synthesis consists of many interdependent tasks
	such as scheduling, allocation and binding. To make
	efficient use of time and area, functional unit allocation
	must be performed using a library of modules which contains
	a variety of module types with identical functionality, but
	different area and delay characteristics. The synthesis
	technique presented in the paper simultaneously performs
	scheduling, allocation and module selection, using problem-
	space genetic algorithm (PSGA) to produce area and
	performance optimised designs. The PSGA-based system uses an
	intelligent design-space exploration technique by combining
	a genetic algorithm with a simple and fast problem-specific
	heuristic to search a large design space effectively and
	efficiently. The efficient exploration of design-space is
	essential to design cost-effective architectures for
	problems of VLSI/ULSI complexity. The PSGA method offers
	several advantages such as the versatility, simplicity,
	objective independence and the computational advantages for
	problems of large size over other existing techniques. The
	proposed synthesis system handles multicycle functional
	units, chaining, conditional constructs, loops and
	structural pipelining. Experiments on benchmarks show very
	promising results (15 Refs.)

AUTHOR:	Karkowski, I.
TITLE:	Architectural synthesis with possibilistic programming
SOURCE:	Proceedings of the Twenty-Eighth Hawaii International
	Conference on System Sciences, p. 5 vol.
	(x+361+xv+762+xv+600+xx+1042+x+362), 14-22 vol.1
YEAR:	1995
ABSTRACT: The knowledge about available resources during high-level
	synthesis is usually imprecise. Previous methods seem to
	have ignored this fact, possibly to avoid an increase in
	the, already high, computational complexity. In this paper
	an approach based on so called "possibilistic" programming,
	a kind of fuzzy mathematical programming, is presented.
	Using this method we can improve existing mathematical
	programming methods for the architectural synthesis while
	keeping their good properties. Not only architectures which
	optimize the most possible value of the cost function can be
	generated, but more importantly, also the tradeoff between
	this goal and reducing the probability of obtaining worse
	solution and enhancing probability of obtaining a better
	solution is controlled. At the same time, an increase in the
	computational complexity of the algorithms is avoided. To
	show the validity of the approach an application to
	simultaneous scheduling, selection and allocation of
	functional units is described. The approach has been
	implemented in a system called FOAS. Experimental results
	confirm the advantages of the proposed methodology
	(16 Refs.)

AUTHOR:	Chandrakasan, A.P.; Potkonjak, M.; Mehra, R.; Rabaey, J.;
	Brodersen, R.W.
TITLE:	Optimizing power using transformations
SOURCE:	IEEE Transactions on Computer-Aided Design of Integrated
	Circuits and Systems, vol.14, no.1, p. 12-31
YEAR:	Jan. 1995
ABSTRACT: The increasing demand for portable computing has elevated
	power consumption to be one of the most critical design
	parameters. A high-level synthesis system, HYPER-LP, is
	presented for minimizing power consumption in application
	specific datapath intensive CMOS circuits using a variety of
	architectural and computational transformations. The
	synthesis environment consists of high-level estimation of
	power consumption, a library of transformation primitives,
	and heuristic/probabilistic optimization search mechanisms
	for fast and efficient scanning of the design space.
	Examples with varying degree of computational complexity and
	structures are optimized and synthesized using the HYPER-LP
	system. The results indicate that more than an order of
	magnitude reduction in power can be achieved over current-
	day design methodologies while maintaining the system
	throughput; in some cases this can be accomplished while
	preserving or reducing the implementation area (38 Refs.)

*******************************************************************************

1994
----

AUTHOR:	Potkonjak, M.; Rabaey, J.
TITLE:	Area-time high level synthesis laws: theory and practice
SOURCE:	VLSI Signal Processing VII (Cat. No.94TH8008), p. xii+511,
	53-62
YEAR:	1994
ABSTRACT: We introduce three AT DSP high level synthesis laws that
	relate different components of the area of ASIC
	implementation cost, namely foreground memory, execution
	units, and interconnect to the sampling period (available
	time). The laws state that: A=const, AT=const, and AT/sup
	2/=const for the area of registers, execution units, and
	interconnect respectively. We validate the AT laws using
	case studies and statistical analysis of synthesis results
	of 80 real life designs. Several applications of the AT laws
	for development of high level synthesis tools are presented.
	Use of the AT high level synthesis laws as an effective
	method for encapsulation of high level synthesis knowledge
	is also studied, The effectiveness of the AT laws
	applications is documented on numerous designs (15 Refs.)

AUTHOR:	Silc, J.
TITLE:	Scheduling strategies in high-level synthesis
SOURCE:	Informatica, vol.18, no.1, p. 71-9
YEAR:	April 1994
ABSTRACT: This paper describes the objectives of high-level synthesis.
	It concentrates on operation scheduling strategies and the
	interaction with resource allocation. Some transformational
	and iterative/constructive scheduling algorithms are
	described. Moreover, a new scheduling/allocation approach is
	presented and compared with other known algorithms. Finally,
	some open problems of high-level synthesis are given
	(51 Refs.)

AUTHOR:	Silc, J.
TITLE:	Scheduling strategies in high-level synthesis
SOURCE:	Informatica, vol.18, no.1, p. 71-9
YEAR:	April 1994
ABSTRACT: This paper describes the objectives of high-level synthesis.
	It concentrates on operation scheduling strategies and the
	interaction with resource allocation. Some transformational
	and iterative/constructive scheduling algorithms are
	described. Moreover, a new scheduling/allocation approach is
	presented and compared with other known algorithms. Finally,
	some open problems of high-level synthesis are given
	(51 Refs.)

AUTHOR:	Tabuenca, P.; Sanchez, P.; Villar, E.
TITLE:	ULSB: a fast scheduling and binding algorithm for
	unrestricted libraries inside a RT space exploration
	strategy
SOURCE:	Proceedings of the 20th EUROMICRO Conference. EUROMICRO 94.
	System Architecture and Integration, p. xxi+720, 200-7
YEAR:	1994
ABSTRACT: This paper presents a method for solving scheduling problem
	in high-level synthesis. The main advantage of this method
	over those proposed up to now, lies in its capacity to
	handle unrestricted libraries. A library of this kind
	includes combinational and sequential multi-cycle functional
	units, multi-functional operational units, pipeline modules
	as well as different execution and different latency times
	for the same operation type in different modules. The
	computing time of this algorithm is quite low and gives very
	good results, so it has powerful applications in RT space
	exploration. The tool FIRES (Fast Intelligent RT Exploration
	System) is also presented (11 Refs.)

AUTHOR:	Grass, W.; Mutz, M.; Tiedemann, W.-D.
TITLE:	High level synthesis based on formal methods
SOURCE:	Proceedings of the 20th EUROMICRO Conference. EUROMICRO 94.
	System Architecture and Integration, p. xxi+720, 83-91
YEAR:	1994
ABSTRACT: High-level synthesis, the task of automatically transforming
	behavioural descriptions at the algorithmic level into
	register transfer level structures, is conquered today
	almost always by use of heuristics. This prevents giving a
	guarantee either for the design optimality or the
	correctness, although these are the most essential criteria.
	By applying formal methods, both issues can be formally
	proven. This paper presents 3 aspects of HLS, that have been
	solved using formal methods, namely (1) the formal
	verification that those hardware modules allocated to
	realize a particular RT level operation are actually doing
	so with respect to functionality and timing, (2) an exact
	solution of the scheduling problem that is yet able to cope
	with more practical constraints than any heuristical
	procedure currently known and (3) the formally based
	synthesis of communication-dominated hardware, an important
	application area that is largely unnoticed by today's HLS
	systems (20 Refs.)

AUTHOR:	ShaoJun Wei; Leroy, J.; Crappe, R.; Nancy, T.
TITLE:	Mobility reduction based scheduling with allocation emphasis
	in datapath design
SOURCE:	Proceedings of the 20th EUROMICRO Conference. EUROMICRO 94.
	System Architecture and Integration, p. xxi+720, 12-18
YEAR:	1994
ABSTRACT: Operation scheduling is one of the most important steps in
	datapath design. The object of this paper is to propose a
	new idea of scheduling which may potentially lead to an
	optimal result. Different from the algorithms proposed in
	the past, which try to find the control step that is
	suitable for a selected operation, the new idea tries to
	find the control step that is not suitable for the
	operation. An operation is scheduled when all the unsuitable
	steps are found. Some allocation problems such as functional
	unit sharing and connection minimisation, are combined into
	the scheduling procedure. Thus, the allocation procedure is
	simplified and design quality increases. The experiment show
	that the optimality of scheduling is distinctly improved
	comparing to those of related works published before
	(8 Refs.)

AUTHOR:	Yamada, A.; Yamazaki, T.; Ishiura, N.; Shirakawa, I.; Kambe,
	T.
TITLE:	Datapath scheduling for behavioral description with
	conditional branches
SOURCE:	IEICE Transactions on Fundamentals of Electronics,
	Communications and Computer Sciences, vol.E77-A, no.12,
	p. 1999-2009
YEAR:	Dec. 1994
ABSTRACT: A new approach is described for the datapath scheduling of
	behavioral descriptions containing nested conditional
	branches of arbitrary structures. This paper first
	investigates such a complex scheduling mechanism, and
	formulates an optimal scheduling problem as a 0-1 integer
	programming problem such that given a prescribed number of
	control steps, the total cost of functional units can be
	minimized. In this formulation, each constraint is expressed
	in the form of a Boolean function, which is set equal to 1
	or 0 according to whether the constraint is satisfied or
	not, respectively, and a satisfiability problem is defined
	by the product of the Boolean functions. A procedure is then
	described, which intends to seek an optimal solution by
	means of a branch-and-bound method on a binary decision
	diagram representing the satisfiability problem.
	Experimental results are also shown, which demonstrate that
	our approach is of more practical use than the existing
	methods (16 Refs.)

AUTHOR:	Stok, L.
TITLE:	Data path synthesis
SOURCE:	Integration, The VLSI Journal, vol.18, no.1, p. 1-71
YEAR:	Dec. 1994
ABSTRACT: This paper reviews all the phases in data path synthesis:
	register allocation, storage grouping, module allocation and
	interconnect allocation. In addition, a new phase for the
	storage value insertion is introduced. For each of these
	phases a formal problem description is given. Restrictions
	on the data path allocation phases are presented, which
	delimit the problems to cases which can be solved by
	polynomial algorithms. For the general cases, heuristics are
	provided which have appeared to be effective in the
	literature. Special data path architectures may require
	special algorithms to make use of their features. Throughout
	the paper architectural constraints are described and
	effective algorithms for them derived. To construct an
	effective data path allocation system, a scheme has to be
	defined. The scheme determines which subproblems are solved
	in what order and which constraints are taken into account
	in each phase. The data flow graph and schedule and their
	match with the data path architecture have a major impact on
	the development of a scheme. This paper will point out the
	trade-offs that have to be made when developing such a
	scheme. This paper provides a reference to most of the data
	path allocation algorithms published in the scope of high-
	level synthesis. Most of the techniques are explained in
	considerable detail and various examples are given. The
	paper comments on the applicability of most of the
	algorithms for particular data path allocation problems
	(74 Refs.)

AUTHOR:	Jansen, K.
TITLE:	On the complexity of allocation problems in high-level
	synthesis
SOURCE:	Integration, The VLSI Journal, vol.17, no.3, p. 241-52
YEAR:	Nov. 1994
ABSTRACT: In this paper, we study the problem of processor allocation.
	Given a scheduled flow graph with a constant number of
	conditional branches, the problem is to find an assignment
	of the operations in the flow graph to a minimum number of
	processors. We show that this problem is NP-complete even if
	the flow graph has at most one conditional branch. Finally,
	we show that another related problem in high-level synthesis
	(mentioned as an open problem on the 24th Design Automation
	Conference), the so-called register allocation problem, is
	also NP-complete (21 Refs.)

AUTHOR:	Chaudhuri, S.; Walker, R.A.; Mitchell, J.E.
TITLE:	Analyzing and exploiting the structure of the constraints in
	the ILP approach to the scheduling problem
SOURCE:	IEEE Transactions on Very Large Scale Integration (VLSI)
	Systems, vol.2, no.4, p. 456-71
YEAR:	Dec. 1994
ABSTRACT: In integer linear programming (ILP), formulating a "good"
	model is of crucial importance to solving that model. In
	this paper, we begin with a mathematical analysis of the
	structure of the assignment, timing, and resource
	constraints in high-level synthesis, and then evaluate the
	structure of the scheduling polytope described by these
	constraints. We then show how the structure of the
	constraints can be exploited to develop a well-structured
	ILP formulation, which can serve as a solid theoretical
	foundation for future improvement. As a start in that
	direction, we also present two methods to further tighten
	the formulation. The contribution of this paper is twofold:
	1) it provides the first in-depth formal analysis of the
	structure of the constraints, and it shows how to exploit
	that structure in a well-designed ILP formulation, and 2) it
	shows how to further improve a well-structured formulation
	by adding new valid inequalities (32 Refs.)

AUTHOR:	Aloqeely, M.; Chen, C.Y.R.
TITLE:	Sequencer-based data path synthesis of regular iterative
	algorithms
SOURCE:	31st Design Automation Conference. 31st DAC Proceedings 1994
	(IEEE Cat. No.94CH3408-2), p. xxviii+739, 155-60
YEAR:	1994
ABSTRACT: In many applications, especially signal processing and
	matrix computations, algorithms are in a highly regular
	iterative form; access patterns for most variables are
	highly regular and uniform. Instead of always storing the
	values of variables back to and retrieving them from memory
	or register files, it will be much more efficient and cost
	effective to let those variables intelligently "stay" or
	"flow" in the data path for future use. In this paper, low
	cost and simple structured sequencers which are best
	exemplified by hardware stacks and queues are introduced in
	the data path for efficiently implementing such a novel
	concept. Various algorithms are developed to map variables
	to sequencers and to integrate sequencers into conventional
	high-level synthesis procedures. Experimental results show
	very encouraging improvement in the performance of designs
	as well as significant reduction in hardware cost (16 Refs.)

AUTHOR:	Wilson, T.C.; Grewal, G.W.; Banerji, D.K.
TITLE:	An ILP solution for simultaneous scheduling, allocation, and
	binding in multiple block synthesis
SOURCE:	Proceedings IEEE International Conference on Computer
	Design: VLSI in Computers and Processors (Cat.
	No.94CH35712), p. xvii+639, 581-6
YEAR:	1994
ABSTRACT: Presents a novel approach to the high-level synthesis
	problems of scheduling, allocation, and binding for
	multiblock behavioral descriptions. Our design tool, JOSHUA,
	uses an integer linear programming (ILP) formulation to
	solve the three interdependent subproblems simultaneously
	and optimally. The system allows the designer to minimize
	time, area, and the number of microwords for the entire
	design, or for specific segments of the design. A diverse
	module library provides a selection of modules that can
	perform a specific operation in differing amounts of time
	(control steps). A novel feature is the ability to select an
	implementation for part of an algorithm from among a set of
	implementation alternatives. The system can also handle the
	issues of path frequencies, loops, parallel threads of
	execution, and register allocation (16 Refs.)

AUTHOR:	Landwehr, B.; Marwedel, P.; Domer, R.
TITLE:	OSCAR: optimum simultaneous scheduling, allocation and
	resource binding based on integer programming
SOURCE:	Proceedings EURO-DAC '94 with EURO-VHDL '94 (IEEE Cat.
	No.94CH35704), p. xix+697, 90-5
YEAR:	1994
ABSTRACT: The paper presents an approach to high-level synthesis which
	is based upon a 0/1 integer programming model. In contrast
	to other approaches, this model allows solving all three
	subtasks of high-level synthesis (scheduling, allocation and
	binding) simultaneously. As a result, designs which are
	optimal with respect to the cost function are generated. The
	model is able to exploit large component libraries with
	multiplier-functional units and complex components such as
	multiplier-accumulators. Furthermore, the model is capable
	of handling mixed speeds and chaining in its general form
	(14 Refs.)

AUTHOR:	Ali, S.; Sait, S.M.; Benten, M.S.T.
TITLE:	GSA: scheduling and allocation using genetic algorithm
SOURCE:	Proceedings EURO-DAC '94 with EURO-VHDL '94 (IEEE Cat.
	No.94CH35704), p. xix+697, 84-9
YEAR:	1994
ABSTRACT: The paper describes a unique approach to the scheduling and
	allocation problem in high-level synthesis using genetic
	algorithms (GA). This approach is different from a previous
	attempt using GA (N. Wehn et al., 1990), in many respects.
	Its contributions include: a new chromosomal representation
	for scheduling and two subproblems of allocation; and two
	novel crossover operators to generate legal schedules. The
	approach has been tested on various benchmarks and results
	are compared with those obtained by other approaches such as
	simulated evolution, tabu search, HAL, SALSA II, STAR, etc
	(11 Refs.)

AUTHOR:	Antola, A.; Distante, F.
TITLE:	High level synthesis: node activation synthesis for acyclic
	and cyclic data flow graphs
SOURCE:	Journal of Microelectronic Systems Integration, vol.2, no.1,
	p. 13-22
YEAR:	March 1994
ABSTRACT: In the process of High Level Synthesis, the behavioral
	description of a device is usually translated into a graph
	based representation originating a Control Flow Graph and
	Data Flow Graph or a single Control Data Flow Graph. In the
	proposed approach to high level synthesis, the graph domain,
	represented by the DFG or CDFG of the behavioral
	description, is exploited to analyze design alternatives and
	to evaluate architectural performances. In this paper, we
	propose a methodology and a set of algorithms generating an
	optimal pipelined design of the graph-based architecture.
	General DFGs are considered and the problem of cyclic DFGs
	is discussed (13 Refs.)

AUTHOR:	Springer, D.L.; Thomas, D.E.
TITLE:	Exploiting the special structure of conflict and
	compatibility graphs in high-level synthesis
SOURCE:	IEEE Transactions on Computer-Aided Design of Integrated
	Circuits and Systems, vol.13, no.7, p. 843-56
YEAR:	July 1994
ABSTRACT: Coloring of conflict graphs and clique partitioning of
	compatibility graphs have been used in high-level synthesis
	to map operators, values, and data transfers onto shared
	resources. However, finding a minimum sized coloring or
	clique partition is NP hard. One method to overcome this
	complexity is to identify special types of graphs that can
	be colored or clique partitioned in polynomial time.
	Existing high-level synthesis systems have exploited two
	special types of conflict graphs-interval and circular-arc
	graphs. However, they have provided no insight into why and
	how frequently these graphs occur. This paper will
	investigate the features of behavioral representations and
	synthesis algorithms that give rise to special conflict and
	compatibility graphs. We will identify two additional types
	of graphs useful for high-level synthesis-chordal and
	comparability graphs-and demonstrate their use in an
	existing high-level synthesis system (25 Refs.)

AUTHOR:	Arato, P.; Beres, I.; Rucinski, A.; Davis, R.; Torbert, R.
TITLE:	A high-level datapath synthesis method for pipelined
	structures
SOURCE:	Microelectronics Journal, vol.25, no.3, p. 237-47
YEAR:	May 1994
ABSTRACT: This paper presents a model and a method for the high-level
	datapath synthesis of pipelined ASIC architectures, starting
	with a behavioural description of the system consisting of
	theoretical operational units with arbitrary operation
	duration. The method calculates the minimal number of
	buffers to be inserted, optimally selects the number of
	types of additional operational units, and determines the
	minimal number of required copies. The aim of the procedure
	is to ensure a latency which can be given in advance, and
	could not be achieved without additional buffers and extra
	copies of the operational units. The method provides a
	solution to the resource allocation problem by establishing
	a compatibility relation between the concurrent operations.
	The constraints for the types of processors to be applied
	can vary, depending upon the hardware resources (20 Refs.)

AUTHOR:	Rabaey, J.M.; Potkonjak, M.
TITLE:	Estimating implementation bounds for real time DSP
	application specific circuits
SOURCE:	IEEE Transactions on Computer-Aided Design of Integrated
	Circuits and Systems, vol.13, no.6, p. 669-83
YEAR:	June 1994
ABSTRACT: This paper discusses techniques for estimating
	implementation bounds on computational resources and their
	role in the high-level synthesis process. Accurate
	estimations can be extremely useful in a multitude of
	synthesis operations, such as algorithm and architecture
	selection, design space search, module selection,
	transformations, allocation, assignment, and scheduling.
	Several techniques to efficiently estimate sharp minimum and
	maximum bounds on the resource requirements of a hardware
	implementation are discussed. The performance of the
	algorithms as well as their applications is analyzed using
	an extensive benchmark set. The proposed techniques have
	been implemented in the HYPER synthesis system (63 Refs.)

AUTHOR:	Rim, M.; Mujumdar, A.; Jain, R.; de Leone, R.
TITLE:	Optimal and heuristic algorithms for solving the binding
	problem
SOURCE:	IEEE Transactions on Very Large Scale Integration (VLSI)
	Systems, vol.2, no.2, p. 211-25
YEAR:	June 1994
ABSTRACT: In this paper we present an optimal and a heuristic approach
	to solve the binding problem which occurs in high-level
	synthesis of digital systems. The optimal approach is based
	on an integer linear programming formulation. Given that
	such an approach is not practical for large problems, we
	then derive a heuristic from the ILP formulation which
	produces very good solutions in order of seconds. The
	heuristic is based on a network flow model and also
	considers floorplanning during the design process to
	minimize the interconnection area (60 Refs.)

AUTHOR:	Seawright, A.; Brewer, F.
TITLE:	Clairvoyant: a synthesis system for production-based
	specification
SOURCE:	IEEE Transactions on Very Large Scale Integration (VLSI)
	Systems, vol.2, no.2, p. 172-85
YEAR:	June 1994
ABSTRACT: This paper describes a new high-level synthesis system based
	on the hierarchical production based specification (PBS).
	Advantages of this form of specification are that the
	designer does not describe the control flow in terms of
	explicit states or control variables, and that the designer
	does not describe a particular form of implementation. The
	production-based specification also separates the
	specification of the control aspects and data-flow aspects
	of the design. The control is implicitly described via the
	production hierarchy, while the data-flow is described as
	action computations. This approach is a hardware analog of
	popular software engineering techniques. The Clairvoyant
	system automatically constructs a controlling machine from
	the PBS and this process is not impacted by the possibly
	exponentially larger deterministic state space of the
	designs. The encodings generated by the constructions
	compare favorably to encodings derived using graph-based
	state encoding techniques in terms of logic complexity and
	logic depth. These construction techniques utilize recent
	advances in BDD techniques (32 Refs.)

AUTHOR:	Dowsing, R.; Elliott, R.; Marshall, I.
TITLE:	Automated technique for high-level circuit synthesis from
	temporal logic specifications
SOURCE:	IEE Proceedings-Computers and Digital Techniques, vol.141,
	no.3, p. 145-52
YEAR:	May 1994
ABSTRACT: A general-purpose strategy for the synthesis of digital
	circuits from high-level behavioural specifications
	expressed in the temporal-logic language Tempura is
	described. This strategy has been implemented as a synthesis
	tool called AST, and the application of AST to part of the
	specification for an error-encoder circuit is examined
	(13 Refs.)

AUTHOR:	Rouzeyre, B.; Dupont, D.; Sagnes, G.
TITLE:	Component selection, scheduling and control schemes for high
	level synthesis
SOURCE:	Proceedings. The European Design and Test Conference. EDAC,
	The European Conference on Design Automation. ETC European
	Test Conference. EUROASIC, The European Event in ASIC Design
	(Cat. No.94TH0634-6), p. xxvii+676, 482-9
YEAR:	1994
ABSTRACT: This paper emphasizes on the performance optimization
	problem of automatically synthesized circuits while
	respecting area constraints. This optimization is performed
	at the earliest step of synthesis, i.e. during the
	scheduling of the behavioral specification of the circuit.
	It is mainly based on an adequate determination of the type
	of component which implements each operation in the
	specification. An algorithm performing concurrently
	component selection and scheduling is presented. It gives
	the designer facilities for design-space exploration, i.e.
	time versus area, since components are taken from a user
	specified library. The special features of this algorithm
	are: several component types can implement an operation and
	one component type can implement several operations; the
	delays are associated to pairs (component type, operation
	type); both the usual synchronous control scheme and a
	control scheme allowing to tune independently the delay of
	every control step (referred as adjusted control) are
	discussed (26 Refs.)

AUTHOR:	Dalkilic, M.E.; Pitchumani, V.
TITLE:	Optimal operation scheduling using resource lower bound
	estimations
SOURCE:	Proceedings. The European Design and Test Conference. EDAC,
	The European Conference on Design Automation. ETC European
	Test Conference. EUROASIC, The European Event in ASIC Design
	(Cat. No.94TH0634-6), p. xxvii+676, 319-24
YEAR:	1994
ABSTRACT: Presents an accurate resource lower bound estimation
	technique which leads to an efficient, and optimal solution
	of time-constrained as well as hardware resource-constrained
	scheduling problems in high-level synthesis. A given time or
	hardware constrained scheduling problem is transformed into
	a cost ordered sequence of feasible scheduling problems
	where a solution to this new problem is guaranteed to be an
	optimal solution to the original problem. Efficiency of the
	approach is demonstrated on large high-level synthesis
	benchmarks like the elliptical wave filter and the discrete
	cosine transform (10 Refs.)

AUTHOR:	Amellal, S.; Kaminska, B.
TITLE:	Functional synthesis of digital systems with TASS
SOURCE:	IEEE Transactions on Computer-Aided Design of Integrated
	Circuits and Systems, vol.13, no.5, p. 537-52
YEAR:	May 1994
ABSTRACT: Synthesizing a digital system from a functional description
	is a complex process requiring the solution of various
	different problems. TASS (Tabu Search Synthesis System) is a
	functional synthesis system made up of interdependent
	modules based on new and more efficient algorithms. First, a
	control and data flow graph model for system representation
	is developed and presented. This model generates a single
	graph representing both the data and control flows of a VHDL
	behavioral description. A new representation of conditional
	branches and a mutual exclusion testing procedure offering
	optimized resource sharing and critical path reduction
	possibilities have been developed. This graph model is an
	environment used for various synthesis needs starting from
	high-level transformations to FSM synthesis for controller
	implementation. A new mathematical formulation of the
	scheduling problem is developed using a new approach based
	on penalty weights. This approach avoids the inflexibility
	of the ILP formulations developed in related works where the
	functional unit performing each type of operation is fixed
	prior to scheduling. The Tabu Search technique, which has
	been effective in finding optimal solutions for many types
	of large and difficult combinatorial optimization problems,
	has been adapted for this purpose. This technique, which
	performs an intelligent and fast solution space exploration,
	combined with an effective mathematical formulation makes
	the scheduling algorithm presented here very powerful
	(38 Refs.)

AUTHOR:	Gebotys, C.H.
TITLE:	An optimization approach to the synthesis of multichip
	architectures
SOURCE:	IEEE Transactions on Very Large Scale Integration (VLSI)
	Systems, vol.2, no.1, p. 11-20
YEAR:	March 1994
ABSTRACT: An optimization approach to the high level synthesis of VLSI
	multichip architectures is presented in this paper. This
	research is important for industry since it is well known
	that these early high level decisions have the greatest
	impact on the final VLSI implementation. Optimal application-
	specific architectures are synthesized here to minimize
	latency given constraints on chip area, I/O pin count and
	interchip communication delays. A mathematical integer
	programming (IP) model for simultaneously partitioning,
	scheduling, and allocating hardware (functional units, I/O
	pins, and interchip busses) is formulated. By exploiting the
	problem structure, using polyhedral theory, the size of the
	search space is decreased and a new variable selection
	strategy is introduced based on the branch and bound
	algorithm. Multichip optimal architectures for several
	examples are synthesized in practical cpu times. Execution
	times are comparable to previous heuristic approaches,
	however there are significant improvements in optimal
	schedules and allocations of multichips. This research
	breaks new ground by 1) simultaneously partitioning,
	scheduling, and allocating in practical cpu times, 2)
	guaranteeing globally optimal architectures for multichip
	systems for a specific objective function, and 3) supporting
	interchip communication delay, interchip bus allocation, and
	other complex interface constraints (24 Refs.)

AUTHOR:	Rim, M.; Jain, R.
TITLE:	Lower-bound performance estimation for the high-level
	synthesis scheduling problem
SOURCE:	IEEE Transactions on Computer-Aided Design of Integrated
	Circuits and Systems, vol.13, no.4, p. 451-8
YEAR:	April 1994
ABSTRACT: A given behavioral specification can be implemented on a
	large number of register-transfer level designs. Instead of
	producing several designs and selecting the best one,
	synthesis systems may use estimation to reduce the design
	space. In this paper, we present a new technique for
	computing a lower-bound completion time for non-pipelined
	resource-constrained scheduling problem. Given a data flow
	graph, a set of resources, resource delays and a clock
	cycle, we derive a lower-bound on the completion time of a