how to build a site for free

Machine Learning,
Data Mining, and Analytics


Project 1

Machine Learning and Data Fusion for Environmental Study and Process Control (Prof. Amy Mueller) - Though the idea of “big data” has become mainstream over the past few years, issues related to study and protection of the environment have yet to benefit from such high-resolution widespread characterization. The reasons for this are threefold: (1) “the environment” is geographically huge and heterogeneous, (2) the number and types of parameters we are interested in measuring are large, and (3) only a limited number of sensors exist for measurement of parameters in the field (“in-situ”) where the samples actually contain a complex mix of different (natural or anthropogenic) chemicals. The Environmental Sensors Laboratory at Northeastern is a cross-college initiative (Civil and Environmental Engineering, Marine and Environmental Science) broadly addressing challenges of measuring pollutants, nutrients, and other analytes of interest in-situ, either in the field (lakes, rivers, oceans, air) or in process (wastewater treatment, industrial process water usage, etc.). Our projects utilize a scientific understanding of the systems we are studying (chemistry, biology, physics) to select an optimized suite of sensors designed to generally characterize system parameters, coupled with application-optimized signal processing techniques designed to recover the maximum amount of information possible from these noisy, often cross-interfering data streams.

Possible REU projects include:

        • Analysis of sensor signals for quantification of nutrients (nitrogen, phosphorous) in novel         wastewater treatment reactors to support more sustainable optimized online control schemes.

        • Development of event-based sampling strategies for in-situ water sample collection at ocean         moorings, looking at previously collected data as a complete unit to predict times of interest for         studying phytoplankton growth and trace nutrients.

        • Integration of multiple sensor data streams characterizing soils to determine quality of the soil         for plant growth and estimate fertilizer demand.




Project 2

Detecting Health-Related Behaviors and Habits from Mobile Phones and Wearable Sensors (Prof. Stephen Intille) – The mHealth Research Group invents and validates new systems, methodologies, and algorithms that use wearable and ubiquitous sensors, mobile phones, and advanced human-computer interfaces to support health and wellness research and practice. A key aspect of the work is automatic detection of everyday health-related behaviors using applied machine learning. We are interested in getting our consumer electronics devices to reliably and continuously detect human behaviors, states, and habits, especially physical activities, sedentary behaviors and postures, sleep, and social contexts. We apply and adapt machine learning to create systems that use data collected from smartwatches and fitness trackers, such as accelerometer data, as well as mobile phone data, such as location and phone usage patterns, and infer what someone is doing. We would also like the system to be able to reliably predict what a person might do next. Our goal is to provide health researchers with reliable and high fidelity scientific measurement of behavior, and to create new opportunities for just-in-time, tailored health interventions delivered on mobile devices.

Possible REU projects include:

        • Testing and then extending existing algorithms for detecting specific types of physical activity         and sleep states on several different datasets obtained from study participants wearing         wrist-worn monitors for a week (up to 18,000 individuals are included in one important dataset         from a national health study).

        • Wearing a suite of 12+ wearable sensors and collecting detailed data on one’s own body         movement, behavior, and habits for at least 4 weeks (i.e., becoming a cyborg), while         simultaneously adapting algorithms to detect patterns of behavior from subsets of the data         collected.

Programming experience using Java, Python, R, or Matlab is recommended.