Function approximation for large-scale reinforcement learning

We are developing new techniques to solve large-scale, high-dimension machine learning problems, such as multi-agent optimization. We use sparse distributed memory, to reduce the size of the tables that must be stored during the learning process. We show that adaptive prototype generation, Kanerva Coding, and fuzzy function approximation can be used to improve the performance of the solver.

C. Wu and W. Meleis, Fuzzy Kanerva-based Function Approximation for Reinforcement Learning, Proceedings of the 8th International Conference on Autonomous Agents and Multiagent Systems (AAMAS), Budapest, Hungary, 2009.
C. Wu and W. Meleis, Optimized Kanerva-based Function Approximation for Multi-Agent Systems, Proceedings of the 7th International Conference on Autonomous Agents and Multiagent Systems (AAMAS), Estoril, Portugal, 2008.

Computational infrastructure for seamless, inter-site Grid computing.
We are developing and implementing a new Grid software layer that allows users to deploy parallel MPI applications across hosts and clusters that are located at different computing sites. Using message relaying and dynamic scheduling, the Grid layer allows the user to view the resources as if they are all located within the same cluster. This approach has been applied to parallelize a tomographic mammography application. (Joint work with D. Kaeli, CenSSIS, and T. Wu at Mass General Hospital)

J. Zhang, W. Meleis, and D. Kaeli, "Adaptive execution of distributed MPI applications on the Grid", submitted to Symposium on Principles and Practice of Parallel Programming, 2006.

Applications of Combinatorial Optimization to Switching, Testing, Reconfigureable Computing, and Cache Layout (Joint work with M. Leeser, D. Kaeli, F. Lombardi and Z. Navabi).

M. Fayyazi, D. Kaeli and W. Meleis, An adjustable linear-time parallel algorithm for maximum weight bipartite matching, Information Processing Letters, Vol. 97, No. 5, March 2006, pp. 186-190.
F. Karimi, Z. Navabi, W. Meleis, and F. Lombardi, Using Data Compression in Automatic Test Equipment for System-on-Chip Testing, IEEE Transactions on Instrumentation and Measurement, Vol. 53, No. 2, April 2004, pp. 308-317.
M. Fayyazi, D. Kaeli and W. Meleis, A Polylogarithmic Time Parallel Maximum Weight Bipartite Matching Algorithm for Scheduling in Input-Queued Switches, Proceedings of International Parallel and Distributed Processing Symposium (IPDPS), Santa Fe NM, 2004.
H. Quinn, L. A. S. King, M. Leeser, and W. Meleis, Runtime Assignment of Reconfigurable Hardware Components for Image Processing Pipelines, IEEE Symposium on FPGAs for Custom Computing Machines, Napa CA, 2003, p. 173.
F. Karimi, W. Meleis, Z. Navabi, and F. Lombardi, Data Compression for System-On-Chip Testing using ATE, 17th IEEE Intl. Symposium on Defect and Fault Tolerance in VLSI Systems, Vancouver, Canada, 2002, pp. 166-174.
J. Kalamatianos, A. Khalafi, D. Kaeli, and W. Meleis, Temporal-based Cache Interaction for Improved Program Layout, IEEE Trans. on Computers, Special Issue on Cache Memory, 1999, pp. 168-175.
J. Kalamatianos, A. Khalafi, D. Kaeli, B. Calder, and W. Meleis, Program Reordering Using Estimated Call Graphs, DEC Technical Journal, Special Issue on Programming Languages and Tools, accepted 1999.
M. Leeser, W. Meleis, M. Vai, S. Chiricescu, W. Xu, and P. Zavracky, Rothko: A Three Dimensional FPGA, IEEE Design and Test Magazine, Spring 1998, pp. 16-23.
M. Leeser, W. Meleis, M. Vai, and P. Zavracky, Rothko: a Three Dimensional FPGA Architecture, its Fabrication, and Design Tools, IEEE Conf. on Advanced Research in VLSI (ARVLSI), 1997.

Microprocessor-aware scheduling algorithms for modern compilers.
We have developed high-performance algorithms that maximize instruction-level parallelism in the presence of scheduling restrictions, including complex resource constraints, branch delay slots, non fully-pipelined functional units, and limits on the number of available registers. Problems of interest have included scheduling for superblocks and hyperblocks, developing tight lower bounds on schedule length, backtracking acyclic schedulers, and scheduling and register allocation, with spill code (Joint work with S. Abraham and A. Eichenberger)

I. Baev, W. Meleis, S. Abraham, Backtracking-based Instruction Scheduling To Fill Branch Delay Slots, International Journal on Parallel Programming, Vol. 30, December 2002, pp. 397-418.
I. Baev, W. Meleis, and A. Eichenberger, An Experimental Study of Algorithms for Total Weighted Completion Time Scheduling, Algorithmica, Vol. 33, No. 1, May 2002, pp. 34-51.
W. Meleis, A. Eichenberger, and I. Baev, Scheduling Superblocks with Bound-based Branch Tradeoffs, IEEE Trans. on Computers, Vol. 50, No. 8, August 2001, pp. 784-797.
W. Meleis, Dual-Issue Scheduling for Binary Trees with Spills and Pipelined Loads, SIAM Journal on Computing, Vol. 30, No. 6, 2001, pp. 1921-1941.
A. Eichenberger, W. Meleis, and S. Maradani, An Integrated Approach to Accelerate Data and Predicate Computations in Hyperblocks, 33rd Annual Intl. Conf. on Microarchitecture (IEEE/ACM), Monterey CA, Dec. 2000, pp. 101-111.
S. Abraham, W. Meleis, and I. Baev, Efficient backtracking instruction schedulers, Intl. Conf. on Parallel Architectures and Compilation Techniques (IEEE/ACM)\ , Philadelphia, PA, 2000, pp. 301-308.
A. Eichenberger and W. Meleis, Balance Scheduling: Weighing Branch Tradeoffs in Superblocks, 32nd Annual Intl. Conf. on Microarchitecture (IEEE/ACM), 1999, pp. 272.
I. Baev, W. Meleis, and A. Eichenberger, Algorithms for Total Weighted Completion Time Scheduling, ACM-SIAM Symp. on Discrete Algorithms, January 1999.
W. Meleis and E. Davidson, Optimal Dual-Issue Instruction Scheduling With Spills for Binary Expression Trees, ACM-SIAM Symp. on Discrete Algorithms, January 1999.
S. Sair, D. Kaeli, and W. Meleis, A Study of Loop Unrolling for VLIW-Based DSP Processors, IEEE Workshop on Signal Processing Systems, 1998, pp. 519 -527.

Lower bounds on Schedule Length
We developed tight lower bounds on schedule length that are used to guide compilers in making accurate scheduling decisions. We performed a comprehensive study of the tightness of lower bounds on basic block execution time as well as developing and evaluating new bounds. We developed the first tight lower bounds on superblock execution time that specifically account for the dependence and resource conflicts between groups of branches, and gave a fast implementation suitable for inclusion in a production compiler, and the tightest known lower bound on weighted completion time scheduling. (Joint work with A. Eichenberger)

W. Meleis, A. Eichenberger, and I. Baev, Scheduling Superblocks with Bound-based Branch Tradeoffs, IEEE Trans. on Computers, Vol. 50, No. 8, August 2001, pp. 784-797.
A. Eichenberger and W. Meleis, Balance Scheduling: Weighing Branch Tradeoffs in Superblocks, 32nd Annual Intl. Conf. on Microarchitecture (IEEE/ACM), 1999, pp. 272.
I. Baev, W. Meleis, and A. Eichenberger, Lower Bounds on Precedence-Constrained Scheduling for Parallel Processors, Information Processing Letters, Vol. 83, No. 1, July 2002, pp. 27-32.
I. Baev, W. Meleis and A. Eichenberger, Lower Bounds on Precedence-constrained Scheduling for Parallel Processors, Intl. Conf. on Parallel Processing, Toronto, Canada, 2000, 549-553.

Parallel and Scalable Processing Systems and Programming Toolsets
We developed a methodology for increasing the performance of four classes of finite difference computational applications: finite difference time domain (FDTD), finite difference frequency domain (FDFD), Steepest Descent Fast Multipole Method (SDFMM), and Semi-analytic Mode Matching (SAMM). By defining processor and communication models using synthetic benchmarks for a parallel architecture, and performing detailed performance analysis of a target application, we developed parallelization strategies for each class of applications. These strategies were tailored both to the characteristics of each application as well as to the computational environment of the underlying parallel architectures. (Joint work with D. Kaeli, C. Rappaport and T. Wu)

T. Wu, R. Moore, E. Rafferty, D. Kopans, J. Zhang, W. Meleis and D. Kaeli, Digital tomosynthesis mammography using a parallel maximum likelihood reconstruction method, Medical Imaging, San Diego CA, 2004.
M. Ashouei, D. Jiang, W. Meleis, D. Kaeli, M. El-Shenawee, E. Mizan, Y. Wang and C. Rappaport, Profile-based Characterization and Tuning for Subsurface Sensing and Imaging Applications, International Journal of SIMULATION: Systems, Science and Technology, Vol. 3, No. 1-2, June 2002, pp. 40-55.
M. El-Shenawee, C. Rappaport, D. Jiang, and W. Meleis, Electromagnetics Computations Using the MPI Parallel Implementation of the Steepest Descent Fast Multipole Method (SDFMM), Applied Computational Electromagnetics Society Journal, Vol. 17, 2002, pp. 112-122.
D. Jiang, W. Meleis, M. El-Shenawee, E. Mizan, M. Ashouei, and C. Rappaport, Parallel Implementation of the Steepest Descent Fast Multipole Method (SDFMM) On a Beowulf Cluster for Subsurface Sensing Applications, IEEE Microwave and Wireless Components Letters, Vol. 12, No. 1, January 2002, pp. 24-26.