NUCAR

Performance Evaluation and Optimization Mechanisms for Inter-operable Graphics and Computation on GPUs

Evaluate the performance and efficiency of the OpenCL-OpenGL(CL-GL) interoperability(interop) mode. Explore different methods to improve the execution performance of the CL-GL interop-based applications. We propose a slot-based rendering mechanism for CL-GL interop to increase the efficiency of the application.

Participants
Yash Ukidave
Xiang Gong

Support for Cache Non-Coherence in Many-Core Architectures

Design and implementation of a cache coherence protocol includes a non-coherent state for shared, modified blocks. Allows a seamless transition between memory consistency models that require coherence and those that do not.

Participants
Dana Schaa
Rafael Ubal

Advanced Ultrasound with OpenCL

Implementation of filtering, motion compensation, and other image processing algorithms for a real-time ultrasound system.

Participants
Dana Schaa
Xiangyu Li

Improving Synchronization on GPUs

Research focus on improving the synchronization on GPUs with a hardware based synchronization mechanism in order to extend the applicability of GPUs to a broader class of general purpose applications. Hierarchical Queuing Locks(HQL) is hardware based synchronization mechanism that favors a blocking mechanism for efficiency and uses hardware based hierarchical queuing locks for scalability.

Participants
Ayse Yilmazer

Improving SIMD Efficiency on GPUs

Design micro-architectural techniques to improve the efficiency of SIMD execution on GPUs.

Participants
Ayse Yilmazer

Computer Tomography Image Segmentation on GPGPU Architecture

In a general detection system we can distinguish different operations such as: sensing, segmentation, feature extraction, detection and post processing. One of its bottleneck is image segmentation phase, which takes more time and can define the accuracy of the final result. Therefore image segmentation is one of the most widely open research topic. My approach is to consider different algorithms that can later on being paralleled taking advantage of GPGPU architecture. Currently, I'm focusing on graph-based and CCL approaches, both of these two techniques can be speed-up using OpenCL or CUDA framework.

Participants
Fanny Nina Paravecino

Analysis of Power-Performance Efficiency of Optimizations applied to Heterogenous Applications

Power-Performance Efficiency of different optimization techniques such as coalesced access, local memory usage, loop unrolling, etc. is evaluated on heterogeneous devices. Fast Fourier transform is used as the test application for evaluating the power consumption on GPUs, APUs and embedded SoCs with OpenCL support. AMD Southern Islands GPU, AMD Fusion APU, Nvidia Kepler GPU, Intel Ivy-Bridge APU and Qualcomm Snapdragon SoC are evaluated for their power-performance efficiency.

Participants
Yash Ukidave

Accelerating Phase Field Simulation

Phase-field model is a mathematical model for solving interfacial problems in physics(such as growth of snowflakes). Phase field approaches have long been restricted to the microscopic scale due to the small mesh size necessary to retain a reasonable accuracy. Porting Fortran/C implementation of phase field algorithm to GPU makes it possible to reach experimentally relevant length and time scales. Comparing with CPU implementation, GPU accelerated phase field algorithm achieved 30x speedup on a single GPU.

Participants
Xiang Gong

Scalar-Vector Opportunities on GPU architectures

Research looks at scalar opportunity and scalar-vector GPU architecture. I take advantage of scalar opportunities observed in GPGPU workloads to improve performance and power efficiency on novel scalar-vector GPU architecture.

Participants
Zhongliang Chen

OpenCL Benchmarking

Valar benchmark is a new benchmark suite consisting of real world applications that effectively leverage heterogeneous devices. The main characteristics of these benchmarks is their data-dependent behavior and their usability to study interaction between computation and data movement on different heterogeneous devices.

Participants
Perhaad Mistry

Profiling and Performance Analysis Tools

Haptic is a software framework that implements analysis devices as an extension to OpenCL to specialize algorithms and utilize multiple devices. HAPTIC provides extensions to OpenCL called analysis device for inserting optimization and profiling into a compute pipeline. HAPTIC supports discrete and fused platforms and we have used it to study the performance of analysis devices when integrated into applications where host-device behavior is coupled.

Participants
Perhaad Mistry

Finite Difference Time Domain (FDTD) accelaration using GPUs

Porting a three-dimensional finite difference time domain (FDTD) algorithm in Fortran to GPU using OpenCL programming model.

Participants
Zhongliang Chen