NUCAR Logo

Journal Papers and Books
Conference Papers
Workshop Papers
Work in Progress
Invited Talks
Research Reports


Recent NUCAR Journal Papers and Books:

2009

AGAMOS: A Graph-Based Approach to Modulo Scheduling for Clustered Microarchitecture, IEEE Transactions on Computers, 2009, to appear.

2007

Soft Error Susceptibility Analysis of SRAM-Based FPGAs in High-Performance Information Systems, IEEE Transactions on Nuclear Science (TNS), Dec. 2007

Characterization of File IO Activity for SPEC CPU2006, Special Issue of ACM SIGARCH Computer Architecture News: SPEC CPU2006 Analysis, 2007

Towards the Development of an Error Checker for Radiotherapy Treatment Plans: A Preliminary Study, Physics in Medicine and Biology, 2007

2006

Addressing a Workload Characterization Study to the Design of Consistency Protocols, Journal of Supercomputing , 2006.

Reducing Data Cache Susceptibility to Soft Errors, IEEE Transactions on Dependable and Secure Computing , 2006.

An Adjustable Linear Time Parallel Algorithm for Maximum Weight Bipartite Matching, Information Processing Letters , 2006.

Profile-guided File Partitioning on Beowulf Clusters, Journal of Cluster Computing, Special Issue on Parallel I/O, 2006.

2005

Speculative Execution in High Performance Computer Architectures, CRC Press, Chapman and Hall, D. Kaeli and P. Yew, editors, ISBN-1-58488-447-9, 2005.

2004

Removing Communications in Clustered Microarchitectures Through Instruction Replication, ACM Transactions on Architecture and Code Optimization, Vol. 1, No. 2, June 2004, pp. 127-151.

A Finite State Model for Respiratory Motion Analysis in Image-guided Radiation Therapy, Journal of Physics in Medicine and Biology , 49(23), 2004, pp. 5357-5372.

An Object-Oriented Parallel Library, Journal of High Performance Computing and Networking , Vol. 1, Issue 1/2/3, 2004, pp. 85-90.

2003

Levo - A Scalable Processor With High IPC, Journal of Instruction Level Parallelism , August 2003.

A Database System to Advance Subsurface Sensing and Imaging, Journal of Subsurface Sensing Technologies and Applications , October 2003, pp. 395-408.

2002

Profile-Based Characterization and Tuning for Subsurface Sensing and Imaging Applications, International Journal of Systems, Science and Technology , September 2002, pp. 40-55.

Electromagnetics Computations Using the MPI Parallel Implementation of the Steepest Descent Fast Multipole Method (SDFMM), ACES Journal, , Vol. 17, No. 2, July 2002, pp. 112-122.

2001

Introduction to the Special Issue on High Performance Memory Systems, IEEE Transactions on Computers , Vol. 50, No. 11, November 2001, pp. 1103-1105.

2000

Welcome to the Opportunities of Binary Translation, IEEE Computer Magazine , March 2000, pp. 40-46.

1999

Analysis of Temporal-based Program Behavior for Improved Instruction Cache Performance, IEEE Transactions on Computers , Vol. 10, No. 2, February 1999, pp. 168-175.

Indirect Branch Predication Using Data Compression Techniques, Journal of Instruction Level Parallelism , Vol. 1, 1999.

Cache Line Coloring Using Real and Estimated Profiles, Digital Technical Journal Special Issue on Tools and Languages , February 1999.

Branch-directed and Pointer-based Data Cache Prefetching, Journal of Systems Architecture: Special Issue on Microprocessor Architecture, Vol. 45, 1999, pp. 1047-1073.

1998

Tracing and Characterization of NT-based System Workloads, Digital Technical Journal Special Issue on Tools and Languages , Vol. 10, No. 1, December 1998, pp. 6-21.

VLSI Design in the 3rd Dimension, Integration: the VLSI Journal, Vol. 25/1, September 1998, pp. 1-16.

1997

Performance Analysis of a CC-NUMA Prototype, IBM Journal of Research and Development, Vol. 41, No. 3, May 1997, pp. 205-214.

Improving the Accuracy of History-Based Branch Prediction, IEEE Transactions on Computers, Vol. 46, No. 4, April 1997, pp. 469-472.

Three Dimensional Circuits Using Transferred Films, IEEE Circuits and Devices Magazine, November 1997, pp. 27-30. Reprints available by request.


Recent NUCAR Conference Papers:


2009

Multi GPU Implementation of Iterative Tomographic Reconstruction Algorithms, IEEE International Symposium on Biomedical Imaging, Jun. 2009, to appear

Software Transactional Memory for Multicore Embedded Systems, Proceedings of the ACM SIGPLAN/SIGBED Conference on Languages, Compilers, and Tools for Embedded Systems, June 2009, to appear

Exploring the Multiple GPU Design Space, 23rd IEEE International Parallel and Distributed Processing Symposium (IPDPS-09), Best Paper Award, May. 2009, to appear

Eliminating Microarchitectural Dependency from Architectural Vulnerability , International Symposium on High-Performance Computer Architecture (HPCA-15), Feb 2009

2008

Performance Prediction in Multi-GPU Execution, NVISION August, 2008

A Field Analysis of System-Level Effects of Soft Errors Occurring in Microprocessors used in Information Systems, International Test Conference, November 2008

A Field Analysis of Soft Errors Occurring in Microprocessors used in Information Systems, North Atlantic Test Conference, May 2008.

Interactive Deformable Registration Visualization and Analysis of 4D Computed Tomography, Proceedings of the 1st International Conference on Medical Biometrics, Jan. 2008

2007

Heterogeneous Clustered VLIW Microarchitectures, Proceedings of the 5th IEEE International Symposium on Code Generation and Optimization, March 2007

Case Study: Soft Error Rate Analysis in Storage Systems, Proceedings of the 25th IEEE VLSI Test Symposium, May 2007

Characterizing the Relationship Between ILU-based Preconditioners and the Storage Hierarchy, Proceedings of the International Conference on Preconditioning Techniques for Large Sparse Matrix Problems in Scientific and Industrial Applications, 2007

Exploring Novel Parallelization Technologies for 3-D Imaging Applications, In 19th International Symposium on Computer Architecture and High Performance Computing, Oct. 2007

External Memory Page Remapping for Embedded Multimedia Systems, Proceedings of the ACM SIGPLAN/SIGBED Conference on Languages, Compilers, and Tools for Embedded Systems, June 2007.

2006

Acceleration of Maximum Likelihood Estimation for Tomosynthesis Mammography, Proceedings of the International Conference on Parallel and Distributed Systems, July 2006

Performance Characterization of SPEC CPU2006 Integer Benchmarks on the x86-64 Architecture, Proceedings of the IEEE Symposium on Workload Characterization, invited paper, October 2006

Vulnerability Analysis of L2 Cache Elements to Single Event Upsets, Proceedings of Design and Test in Europe (DATE), 2006.

2005

Balancing Performance and Reliability in the Memory Hierarchy, International Symposium on Performance Analysis of Systems and Software (ISPASS-05), 2005.

Load Balancing using Grid-based Peer-to-Peer Parallel I/O , Proceedings of the IEEE Cluster Computing Conference, September 2005.

Power Aware External Bus Arbitration for System-on-a-Chip Embedded Systems , Proceedings of the International Conference on High Performance Embedded Architectures and Compilers November 2005.

Exploiting Temporal Locality in Drowsy Cache Policies , Proceedings of IEEE Computing Frontiers, 2005, pp. 371-377.

Demystifying On-the-Fly Spill Code , Proceedings of the ACM Conference on Programming Languages, Design and Implementation (PLDI) 2005, pp. 180-189.

A Multinomial Clustering Model for Fast Simulation of Computer Architecture Designs , Proceedings of the 11th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining Aug. 2005, pp. 808-813.

Subsequence Matching on Structured Time Series Data , Proceedings of ACM Conference on the Management of Data (SIGMOD), 2005, pp. 682-693.

2004

Developing Energy-Aware Strategies for the Blackfin Processor , Proceedings of the 2006 High Performance Embedded Computing Conference, MIT Lincoln Labs, Sept. 2004.

2002

Realizing High IPC Using Time-Tagged Resource-Flow Computing , Proceedings of Europar 2002, Springer-Verlag, August, 2002, pp. 490-499.

Path-based Hardware Loop Prediction , 4th International Conference on Control, Virtual Instrumention and Digital Systems, Mexico City, Mexico, August, 2002, pp. 29-38.

Exploiting Pseudo-schedules to Guide Data Dependence Graph Partitioning , Proceedings of IEEE Parallel Architectures and Compilation Techniques, Sept. 2002, pp. 281-290.

Register Pressure-Based Modulo Scheduling for Clustered VLIW Architectures , Proceedings of Journadas de Concurrencia, June 2002, pp. 1-10.

Localized Message Passing Structures for High Speed Ethernet Packet Switching , Proceedings of the International Conference on Parallel and Distributed Processing Techniques and Applications, June 2002, pp. 1551-1557.

2001

Analysis of Dynamic Loops , (in Spanish), 3rd International Conference on Control, Virtual Instrumention and Digital Systems, Mexico City, Mexico, August, 2001, pp. 93-106.

2000

Using Cache Line Coloring to Perform Aggressive Procedure Inlining, , in ACM SIGARCH News, 28(1) March 2000, pp. 62-71.

Accurate Simulation and Evaluation of Code Reordering, , in the Proceedings of the IEEE International Symposium on the Performance Analysis of Systems and Software , Austin, TX, April 2000.

1999

Parameter Value Characterization of Windows NT-based Applications, , Workload Characterization: Methodology and Case Studies , IEEE Computer Society, 1999, pp.142-149.

1998

Predicting Indirect Branches via Data Compression, , Proc. of the 31st International Symposium on Microarchitecture , Dallas, TX, December 1998, pp.272-281.

Temporal-Based Procedure Reordering for Improved Instruction Cache Performance, , Proc. of the 4th International Conference High Performance Computer Architecture , Las Vegas, NV, February 1998, pp. 244-253.

Operating System Impact on Trace-Driven Simulation, in the Proceedings of the 31st Simulation Symposium, Boston, MA, April 1998, pp. 76-82.,

1997

Efficient Procedure Mapping Using Cache Line Coloring, in the Proceedings of the SIGPLAN Conference on Programming Language Design and Implementation, June 1997, pp. 171-182.

Analytic Models of Workload Behavior and Pipeline Performance, in the Proceedings of IEEE MASCOTTS, Haifa, Israel , January 1997, pp. 91-96.


Recent NUCAR Workshop Papers:


2009

The Effect of Input Data on Program Vulnerability, IEEE Workshop on Silicon Errors in Logic - System Effects (SELSE-5), Mar 2009

Accelerating Phase Unwrapping and Affine Transformations for Optical Quadrature Microscopy using CUDA, 2nd Workshop on General Purpose Computation on GPUs (GPGPU2), Mar. 2009

Architecture-Aware Optimization Targeting Multithreaded Stream Computing, 2nd Workshop on General Purpose Computation on GPUs (GPGPU2), Mar. 2009

2008

Field Failure Analysis of Microprocessors used in Information Systems, Workshop on Resilience Assessment and Dependability Benchmarking, June 2008.

Performance Evaluation of Virtual Appliances, First International Workshop on Virtualization Performance: Analysis, Characterization, and Tools (VPACT 08), April, 2008.

Resource-Conscious Optimization of Cryptographic Algorithms on an Embedded Architecture, Proceedings of the ACM Workshop on Optimizations for DSP and Embedded Systems, April 2008

An M/G/1 Queue Model for Multiple Application on Storage Area Networks, Proceedings of the 11th Workshop on Computer Architecture Evaluation using Commercial Workloads (CAECW-11), February 2008.

A Taxonomy to Enable Error Correction and Recovery in Software, Workshop on Quality-Aware Design (W-QUAD) in conjunction with the 35th International Symposium on Computer Architecture (ISCA-35), June 2008

Quantifying Software Vulnerability, 1st Workshop on Radiation Effects and Fault Tolerance at Nanometer Technologies (WREFT-1) in conjunction with Computing Frontiers, May 2008

2007

Stream Image Processing on a Dual-Core Embedded System, Proceedings of the Embedded Computer Systems: Architectures, Modeling, and Simulation, 7th International Workshop (SAMOS 2007), July 2007

A Code Layout Framework for Embedded Processors with Configurable Memory Hierarchy, Proceedings of the 5th Workshop on Optimizations for DSP and Embedded Systems, March 2007

Stream Programming on the Blackfin Architecture, Proceedings of the 4th Boston Area Computer Architecture Workshop, January 2007

Characterizing the Relationship Between Sparse Matrix Preconditioners and the Storage Hierarchy, Proceedings of the 4th Boston Area Computer Architecture Workshop, January 2007

Performance Characterization of SPEC CPU2006 Integer Benchmarks, Proceedings of the 4th Boston Area Computer Architecture Workshop, January 2007

Instruction-Level Energy Estimation, Proceedings of the 4th Boston Area Computer Architecture Workshop, January 2007

Case Study: Soft Error Rate Analysis in Storage Systems, Proceedings of the 4th Boston Area Computer Architecture Workshop, January 2007

Use of an Embedded Configurable Memory for Stream Image Processing, Proceedings of the 5th Workshop on Optimizations for DSP and Embedded Systems, March 2007

Reliability in the Shadow of Long-Stall Instructions, 3rd Annual Workshop on Silicon Errors in Logic - System Effects (SELSE-3), April 2007

2006

Hunting Trojan Horses, Proceedings of the Workshop on Architecture and System Support for Improving Software Dependability, Oct 2006

Experiences with the Blackfin Architecture for an Embedded Systems Lab, Proceedings of the Workshop on Computer Architecture Education, July 2006

2005

A Benchmark Suite for Behavior-Based Security Mechanisms, Proceedings of the Workshop on Software Security Assurance Tools, Techniques and Metrics, November 2005.

ASM: An Application Security Monitor, Proceedings of the Workshop on Binary Instrumentation adn Applications, September 2005, pp. 31-36.

Reliability Tradeoffs in Design of Cache Memories, 1st Workshop on Architectural Reliability (WAR) in conjunction with the International Symposium on Microarchitecture (MICRO-38), 2005.

2002

Realizing High IPC Through a Scalable Memory-Latency Tolerant Multipath Microarchitecture, Presented at MEDEA Workshop, Charlottesville, VA, September 2002.

2001

Runtime Predictability of Loops, IEEE 4th Annual Workshop on Workload Characterization, Austin, Tx, December, 2001, pp. 91-98.

Profile-guided Tuning of Heap-based Memory Access, 2nd Workshop on Memory Performance Issues, Goteberg, Sweden, July 2001

1999

Studying the Performance of the FX!32 Binary Translation System, , in the Proceedings of the 1st Workshop on Binary Translation , Newport Beach, CA, Oct. 1999.

A Study of Dynamic Branch Predication for SHARC DSP's , in the Proceedings of the 2nd International Workshop on Compiler and Architecture Support for Embedded Systems (CASES'99), Washington, D.C., Oct. 1999.

1998

A Study of Loop Unrolling for VLIW-based DSP Processors , Proc. of the 1998 IEEE Workshop on Signal Processing Systems (SiPS '98) , October 1998, pp. 519-527.

1997

Procedure Mapping Using Static Call Graph Estimation, in the Proceedings of the Workshop on the Interaction between Compilers and Computer Architectures, San Antonio, Texas, February 1997, also appearing in the IEEE TCCA Newsletter, 1997.


NUCAR Work in Progress:


2002

Implications of Register and Memory Temporal Locality for Distributed Microarchitectures, October 2002.

Characterization and Evaluation of Hardware Loop Unrolling , October 2002.

2000

Levo: A Resource Flow Computer, December 2000.

1999

Scalable Interconnects and Topologies for High Performance ICDA, November 1999.

Characterizing the SPEC JVM98 Benchmarks On the Java Virtual Machine, April 1999.

Code Reordering for Multi-level Cache Hierarchies, November 1999.

Using Cache Line Coloring to Perform Aggressive Procedure Inlining, November 1999.

1998

Sensitivity Analysis of System Performance using Synthetic Workloads Presented at the Workshop on Computer Architecture Evaluation Using Commercial Workloads, December 1998.


Recent Invited Talks:


2005

I/O Storage Research at Northeastern , Presented at EMC in May 2005 by Dave Kaeli.

Soft Error Modeling and Mitigation , Presented at EMC in May 2005 by Mehdi Tahoori.

2002

Realizing High IPC Through a Scalable Memory-Latency Tolerant Multipath Microarchitecture , Presented at MEDEA Workshop, Charlottesville, VA, September 2002.

Realizing High IPC Through a Scalable, Multipath Microarchitecture , Presented at Universitat Politecnica de Catalunya, Barcelona, Spain July 29, 2002.

2001

Profile-Guided Instruction and Data Memory Layout , Presented at Tufts University, Medford, MA February 7, 2001.

1997

Issues in Trace-Driven Simulation , Presented at Intel Corporation, Santa Clara, CA February 24, 1997.

Procedure Mapping Using Static Call Graph Estimation , Presented at Microsoft Research, Redmond, WA, May 19, 1997 and at Boston University's Computer Science Colloquim February 26, 1997.


Recent NUCAR Research Reports:


2006

Hunting Trojan Horses M. Moffie and D. Kaeli, NUCAR Technical Report, January 2006.

2002

A Software Communications Architecture Compliant Software Defined Radio Implementation , Sabri Murat Bicer, MS Thesis, June 2002.

1997

A Code Annotation Tool for Capturing Operating System Execution, June 1997.