Kalamatianos Yiannis

My name is Yiannis Kalamatianos and I used to be a member of the Computer Architecture Group (NUCAR) working under the supervision of Prof. David R.Kaeli and a research assistant of the ECE Dept. at Northeastern University . I worked for 4 years with the Microprocessor design group at Sun Microsystems at Sun Microsystems, Burlington MA, doing performance/power modeling and conducting microarchitectural studies. I am currently working with the Microprocessor design group at Advanced Micro Devices , Boxborough MA, doing similar work.

I got my undergraduate degree from the Dept. of Computer Engineering and Informatics at University of Patras, Greece in 1992 with a concentration on Computer Architecture and VLSI design and my Master's degree in 1995 from the ECE Dept. at Northeastern University. Here is what I did while at school :

Ph.D. thesis :

Use temporal locality to improve code layout

This was achieved by extracting temporal locality from the instruction stream of applications using a graph-based model (Conflict Miss Graph,CMG). We developed a procedure reordering algorithm based on the CMG model combined with Cache Line Coloring to improve Instruction cache miss ratios. Simulation results were very encouraging (see HPCA'98 paper). A similar approach was developed by researchers at Harvard University and UCSD . We have developed a code reordering framework where procedure reordering using a Call Graph (CG), Temporal Relationship Graph (TRG) or a CMG can be combined with intraprocedural basic block reordering. The framework also extends cache line coloring for multiple cache levels. We have modified SimpleScalar v3.0 to simulate code reordering at the basic block level and got cycle-based results. This work was supported by NSF and by Microsoft Research.

Examine the temporal locality and invariance of parameter values

I spent the summer of 1998 at Microsoft Research working with the Advanced Development Tools Group on parameter value profiling. We traced several applications on a Pentium platform running Windows NT4.0 (Word97, Excel97, Powerpoint97, Access97, Mso97.dll, VC++ 6.0 Linker, Foxpro 6.0, SQLserver 7.0 and several SPEC95 integer benchmarks). We measured the temporal locality and invariance of parameter values in order to explore opportunities for code specialization optimizations.

Exploit predictability and path correlation of indirect branches

Having the Alpha AXP ISA as an example architecture, we studied the statistical behavior and predictability of indirect branches and their supporting memory references. Our goal was to find techniques towards reducing/tolerating the overhead associated with the use of those branches. We traced several C and C++ benchmarks on Alpha-based workstations with the ATOM tool. Our work so far has proposed several strategies for improving Indirect Branch Prediction such as Prediction by Partial Matching (PPM algorithm adopted from the field of data compression), Dynamic Selection of Correlation Type and Compiler-assisted mapping of indirect branches onto a hardware-based predictor. The first two attempt to better predict the nature of the branch while the last one focuses on reducing aliasing on an indirect branch predictor. For more interesting work on Indirect Branch Prediction visit the Web page of the OOCSB group at UCSB and HPS group at University of Texas at Austin.

Download a gzipped version of my thesis here : phd-thesis.ps.gz

MSEE thesis :

I worked in the area of parallelizing compilers and my Master's thesis was essentially a continuation of the Ph.D. research of Haris Stellakis . The topic was the systematic synthesis of SIMD algorithms for signal processing applications on a general purpose SIMD machine such as the MasPar-1/2. A modified model of an algorithm-to-architecture mapping methodology used for VLSI systolic arrays, was employed to accomplish the synthesis procedure. This modified model is taking advantage of the communication and computation properties of the specific machine to derive minimum latency SIMD algorithms starting from a nested-loop original description of the problem. This approach was applied on the Higher-Order moments estimation problem as a case study. We developed SIMD code for estimating all Higher-Order Moments up to the 3rd and 4th-order of a one-dimensional input signal. Actual experiments performed on the MasPar-1/2 machines showed significant speedups compared to simulations run on a series of workstations based on the Alpha 21064, the MIPS R8000 and the Sun SuperSparc CPU. This work was supported by a NSF grant. As a MSEE student I was a member of the Parallel Architectures Group supervised by Prof. Manolakos .

Research Interests : Processor Microarchitecture, Memory Hierarchy design, Compiler Optimizations, VLSI design.

Resume Long version : longcv.ps.gz

Short version : shortcv.ps.gz

Industrial Experience in the U.S. :

June 2004 - present: Working at the Boston Design Center of AMD , doing performance/power analysis on new generation microprocessors.

February 2000 - May 2004 : Worked at the Boston Design Center at Sun Microsystems Inc. mainly doing performance analysis of the core of several Sparc CPUs. Other tasks included workload characterization, power and FIT rate modeling and performance verification.

Summer 1998 : Worked as a summer intern at Microsoft Research under the supervision of Dr. Amitabh Srivastava on parameter value profiling.

Summer 1996 : Worked as a summer intern at IBM T.J. Watson Center under the supervision of Dr. Phil Emma on performance evaluation studies using a microarchitectural simulator (called timer in IBM terminology) of the IBM S/390 CPU.

Published papers :

Lampros Kalampoukas, Dimitris Nikolos, Costas Efstathiou, Haris Vergos, John Kalamatianos: "High-Speed, Parallel-Prefix Modulo 2^n-1 Adders", IEEE Transactions on Computers, July, 2000.

John Kalamatianos, David Kaeli: "Accurate Simulation and Evaluation of Code Reordering", International Symposium on Performance Analysis of Systems and Software, June 2000.

John Kalamatianos, David Kaeli: "Indirect Branch Prediction using Data Compression Techniques", Journal of Instruction Level Parallelism, December 1999. ps.gz

John Kalamatianos, Ronnie Chaiken, David Kaeli: "Parameter Value Locality of Windows NT-based applications", Workshop on PC-Performance and Analysis held in conjunction with Micro-31, November 1998.

John Kalamatianos, Alireza Khalafi, David Kaeli and Waleed Meleis: "Analysis of temporal-based program behavior for improved cache performance", IEEE Transactions on Computers, February 1999.

John Kalamatianos, David Kaeli: "Improving the Accuracy of Indirect Branch Prediction via Branch Classification", Workshop on the Interaction between Compilers and Computer Architectures (INTERACT-3) held in conjunction with ASPLOS-VIII, October 1998. ps.gz

John Kalamatianos, David Kaeli: "Improving Indirect Branch Prediction via Data Compression", Proceedings of the 31st International Symposium on Microarchitecture, November 1998. pdf

Alireza Khalafi, John Kalamatianos, David Kaeli, Waleed Meleis: "Memory performance tuning using graph-based analysis", PAID Workshop held in conjunction with ISCA-25, June 1998.

John Kalamatianos and David Kaeli: "Temporal-based Procedure reordering for Improved Instruction Cache Performance": Proceedings of the 4th International Symposium on High Performance Computer Architecture, pp. 244-253, February 1998.

John Kalamatianos, Elias Manolakos: "Parallel Computation of Higher Order Moments on the MasPar-1 Machine", International Conference on Acoustics, Speech and Signal Processing, May 1995.

Elias Manolakos, Harris Stellakis, John Kalamatianos: "Parallel Algorithms and Architectures for the Estimation of Higher Order Statistics", IEEE Workshop on Nonlinear Signal and Image Processing, June 1995.

Costas Efstathiou, Dimitris Nikolos, John Kalamatianos: "Area-time efficient modulo 2^N - 1 adder design", IEEE Transactions on Circuits and Systems, July 1994.

Some interesting technical reports :

"VHDL model for a 1-D scalable DWT (Discrete Wavelet Transform) architecture", September 1995.

Documentation : dwt_doc.ps.gz

"VHDL model for a Column Associative Cache", June 1995.

VHDL code for the cache model : cache_vhdl.tar.gz

US mail address :

Kalamatianos Yiannis (John)
AMD Boston Design Center
MS 83/29
90 Central Street
Boxborough, MA 01719
office phone number: (978) 795-2602

e-mail : kalamat at ece dot neu dot edu
e-mail : john dot kalamatianos at amd dot com
last updated 23/09/2005