The volume of data being generated today across multiple application domains including scientific exploration, web search, e-commerce and medical research, has continued to grow unbounded. The value of leveraging machine learning to analyze big data has led to the growth in popularity of high-level distributed computing frameworks such as Apache Hadoop and Spark. These frameworks have significantly improved the programmability of distributed systems to accelerate big data analysis, whose workload is typically beyond the processing and storage capabilities of a single machine.
GPUs have been shown to provide an effective path to accelerate machine learning tasks. These devices offer high memory bandwidth and thousands of parallel cores which can deliver up to an order of magnitude better performance for machine learning applications as compared to multi-core CPUs.
While both distributed systems and GPUs have been shown to independently provide benefits when processing machine learning tasks on big data, developing an integrated framework that can exploit the parallelism provided GPUs, while maintaining an easy-to-use programming interface, has not been aggressively explored.
In this thesis, we explore the seamless integration of GPUs with Hadoop and Spark to achieve performance and scalability, while preserving their flexible programming interfaces. We propose techniques that expose GPU details for fine-tuned kernels in a Java/Scala-based distributed computing environment, reduce JVM overhead, and increase on/off heap memory efficiency. We use a set of representative machine learning data analytics applications to test our approach and achieve promising performance improvements compared to Hadoop/Spark’s multi-core CPU implementations.
- Professor David Kaeli (Advisor)
- Professor Xue Shelley Lin
- Professor Ningfang Mi