Mirek Riedewald

PhotoAssociate Professor
Northeastern University

Khoury College of Computer Sciences, 202 West Village H
360 Huntington Avenue
Boston, MA 02115

phone +1-617-373 4766

2002 Ph.D. (UC Santa Barbara)
2002-2008 Research Associate (Cornell University)
Since 2009 Associate Professor (Northeastern University)


Expertise

Cloud computing, distributed big-data management and analysis, data stream processing, data-driven science

Research collaborations: I have been collaborating with industrial partners and with scientists from various disciplines since 1999. While specific challenges vary, there is always the same common theme: everybody is collecting and generating an ever increasing amount of data. In this world of big data and of data-driven science, groundbreaking discoveries depend on the ability to efficiently analyze and process these massive amounts of data. We have been designing scalable data management and analysis techniques for neuroscience, discovery and linking of personal information (e.g., as mandated by GDPR), ornithology, ecology, rocket science (really!), astronomy, and high-energy physics---to name a few.

Research

Research vision: Create algorithms that scale in the size and complexity of data, with a focus on analysis problems motivated by grand challenges in Open Data and data-driven science.

What our PhD students do: design novel algorithms; prove lower bounds, upper bonds, optimality; build big-data systems; publish results in the premier CS and domain-science venues.

DATA Lab @ Northeastern logoProf. Riedewald is co-founder and co-leader of the DATA Lab @ Northeastern. Currently he focuses on the development of novel techniques for large-scale distributed data analysis, data management, and data mining. His research agenda is driven by collaborations with domain scientists and industry, with the goal to produce results that are publishable in both premier computer science venues as well as those in the application domain.

Publications

Current Projects

Distributinator: Scalable Big-Data Analytics

How do we effectively and efficiently use many machines in a cluster or in a cloud to solve a big-data-analysis challenge? What is the best way to partition a dataset so that running time of the distributed computation is minimized? How do we abstract a complex distributed computation so that we can learn a mathematical model of how running time depends on parameters affecting data partitioning?

NCTracer Web

How do we turn 20,000 3D image stacks (10 terabytes per mouse brain) taken by a high-resolution light microscope into a coherent 3D image of the brain? How do we extract from this massive dataset a graph representing the neurons captured in the image? And how do we analyze this graph efficiently? Can we extend this approach to include other brain data, e.g., from fMRI and electron microscopes? And can we generalize our techniques to graph problems in other domains such as social network analysis?

Any-k: Optimal Ranked Enumeration for Conjunctive Queries

When a query on big data produces huge output, can we quickly return the "most important" results without even computing the entire output? If the notion of importance is difficult to define, can we return the top-ranked results so quickly that the user can try out different options (nearly) interactively? For what types of queries and data can this functionality be supported? And what are the best time and space guarantees we can provide?

 

Selected Past ProjectsScolopax logo

Scolopax: Making Analysis of Scientific Data Fast and Easy

Cayuga: A Scalable System for Data Stream Processing

Additive Groves Prediction Technique and Automatic Interaction Detection


Teaching

Advising