Function approximation for large-scale reinforcement learning

We are developing new techniques to solve large-scale, high-dimension machine learning problems, such as multi-agent optimization. We use sparse distributed memory, to reduce the size of the tables that must be stored during the learning process. We show that adaptive prototype generation, Kanerva Coding, and fuzzy function approximation can be used to improve the performance of the solver.

Computational infrastructure for seamless, inter-site Grid computing.

We are developing and implementing a new Grid software layer that allows users to deploy parallel MPI applications across hosts and clusters that are located at different computing sites. Using message relaying and dynamic scheduling, the Grid layer allows the user to view the resources as if they are all located within the same cluster. This approach has been applied to parallelize a tomographic mammography application. (Joint work with D. Kaeli, CenSSIS, and T. Wu at Mass General Hospital)

  • J. Zhang, W. Meleis, and D. Kaeli, "Adaptive execution of distributed MPI applications on the Grid", submitted to Symposium on Principles and Practice of Parallel Programming, 2006.

Applications of Combinatorial Optimization to Switching, Testing, Reconfigureable Computing, and Cache Layout (Joint work with M. Leeser, D. Kaeli, F. Lombardi and Z. Navabi).

Microprocessor-aware scheduling algorithms for modern compilers.

We have developed high-performance algorithms that maximize instruction-level parallelism in the presence of scheduling restrictions, including complex resource constraints, branch delay slots, non fully-pipelined functional units, and limits on the number of available registers. Problems of interest have included scheduling for superblocks and hyperblocks, developing tight lower bounds on schedule length, backtracking acyclic schedulers, and scheduling and register allocation, with spill code (Joint work with S. Abraham and A. Eichenberger)

Lower bounds on Schedule Length

We developed tight lower bounds on schedule length that are used to guide compilers in making accurate scheduling decisions. We performed a comprehensive study of the tightness of lower bounds on basic block execution time as well as developing and evaluating new bounds. We developed the first tight lower bounds on superblock execution time that specifically account for the dependence and resource conflicts between groups of branches, and gave a fast implementation suitable for inclusion in a production compiler, and the tightest known lower bound on weighted completion time scheduling. (Joint work with A. Eichenberger)

Parallel and Scalable Processing Systems and Programming Toolsets

We developed a methodology for increasing the performance of four classes of finite difference computational applications: finite difference time domain (FDTD), finite difference frequency domain (FDFD), Steepest Descent Fast Multipole Method (SDFMM), and Semi-analytic Mode Matching (SAMM). By defining processor and communication models using synthetic benchmarks for a parallel architecture, and performing detailed performance analysis of a target application, we developed parallelization strategies for each class of applications. These strategies were tailored both to the characteristics of each application as well as to the computational environment of the underlying parallel architectures. (Joint work with D. Kaeli, C. Rappaport and T. Wu)