free website design templates

Parallel Hardware and
Software Systems 


Project 3

Utilizing Accelerators for Visualization and Data Discovery (Prof. David Kaeli and Prof. Jennifer Dy) - This inter-disciplinary project pursues research that aims to break down computational barriers in a number of important applications, including machine learning and deep learning. Our collaborative work characterizes large data sets in the field, including identifying environmental factors that impact preterm birth rates, and identifying anomalies in tissue images used for detecting cancer. REU students will receive training and access to the latest high performance hardware and software, and will be immersed in developing algorithms and applications that have application on real-world problems. Students will have the opportunity to utilize the resources of MGHPCC.

Possible REU projects include:

        • Mapping an outlier detection algorithm to a graphics processor utilizing state-of-the-art         parallel languages such as CUDA and OpenCL.

        • Identifying trends in pesticide use in the home to identify adverse health outcomes.

        • Visualizing large well water datasets across a 100 different environmental parameters.

Students selecting this project should have some background in programming (any language), though no experience in CUDA or OpenCL is expected. Students will learn how to develop parallel programs on parallel hardware.



Project 4

Scheduling Data Mining Graph Analysis on MapReduce (Prof. Ningfang Mi) - Large-scale data analysis is of great importance in a number of data analytics applications. MapReduce has become one of the most commonly used programming paradigms for scalable semi-structured and un-structured data processing. The open source implementation, Apache Hadoop, has been widely adopted as the primary platform for parallel data processing. Recently, the Hadoop MapReduce ecosystem evolved into its next generation, the Hadoop YARN (Yet Another Resource Negotiator, which adopts fine-grained resource management schemes for job scheduling. A Hadoop (or YARN) cluster is typically shared among multiple users with different type of workloads. One challenging problem is the ability of efficient scheduling in such sharing cluster with the ability of adapting to the diversity of users' application and submission patterns. Therefore, Mi’s group currently focuses on developing new resource management schemes for achieving high efficiency, high performance, MapReduce-based applications.

To engage REUs, projects will focus on data mining of heterogeneous networks by investigating scalable end-to-end approaches spanning the application layer, the data processing backend, and the lower-level system layers. Research questions are motivated by analysis of communication patterns. Evaluation of the proposed techniques will be performed on data from massively online multiplayer games. This project is in collaboration with Prof. Yizhou Sun, as well as with other faculty.

Possible REU projects include:

        • Contributing to the evaluation of a new scheduling techniques for a YARN cluster

        • Documenting the use of MapReduce frameworks, such as Apache Hadoop, YARN, Spark,                  and Storm

        • Testing new algorithms and techniques on these MapReduce frameworks, comparing their         performance and scalability.


REU students will have time allotted on Amazon’s Elastic MapReduce Cloud.