You are here

Lin & Kaeli to Work on $800K NSF Grant Collaboration

September 14, 2017

ECE Assistant Professor Xue Lin and Professor David Kaeli in collaboration with CUNY City College received an $800K NSF grant to develop "A Framework of Simultaneous Acceleration and Storage Reduction on Deep Neural Networks Using Structured Matrices".

Abstract Source: NSF

Deep neural networks (DNNs) have emerged as a class of powerful techniques for learning solutions in a number of challenging problem domains, including computer vision, natural language processing and bioinformatics. These solutions have been enabled mainly because we now have computational accelerators able to sift though the myriad of data required to train a neural network. As the size of DNN models continues to grow, computational and memory resource requirements for training will also grow, limiting deployment of deep learning in many practical applications.

Leveraging the theory of structured matrices, this project will develop a general framework for efficient DNN training and inference, providing a significant reduction in algorithmic complexity measures in terms of both computation and storage.

The project, if successful, should fundamentally impact a broad class of deep learning applications. It will explore accelerating this new structure for deep learning algorithms targeting emerging accelerator architectures, and will evaluate the benefits of these advances across a number of application domains, including big data analytics, cognitive systems, unmanned vehicles and aerial systems, and wearable devices. The interdisciplinary nature of this project bridges the areas of matrix theory, machine learning, and computer architecture, and will affect education at both Northeastern and CCNY, including the involvement of underrepresented and undergraduate students in the rich array of research tasks.

The project will: (1) for the first time, develop a general theoretical framework for structured matrix-based DNN models and perform detailed analysis and investigation of error bounds, convergence, fast training algorithms, etc.; (2) develop low-space-cost and high-speed inference and training schemes for the fully connected layers of DNNs; (3) impose a weight tensor with structure and enable low computational and space cost convolutional layers; (4) develop high-performance and energy-efficient implementations of deep learning systems on high-performance parallel platforms, low-power embedded platforms, as well as emerging computing paradigms and devices; (5) perform a comprehensive evaluation of the proposed approaches on different performance metrics in a variety of platforms.  The project will deliver tuned implementations targeting a range of computational platforms, including ASICs, FPGAs, GPUs and cloud servers. The hardware optimizations will focus on producing high-speed and low-cost implementations of deep learning systems.