Associate Professor
Northeastern University
College of Computer and Information Science, 202 West Village H
360 Huntington Avenue
Boston, MA 02115
![]()
+1-617-373 4766, fax (dept): +1-617-373 5121
2002 Ph.D. (UC Santa Barbara)
2002-2008 Research Associate (Cornell University)
Since 2009 Associate Professor (Northeastern University)
Big Data; database systems, with an emphasis on large-scale distributed data analysis and data management and data mining for the sciences.

I have been collaborating with scientists from various disciplines since 1999. While specific challenges vary, there is always the same common theme: scientists are collecting and generating an ever rapidly increasing amount of data. In this new world of data-driven science, groundbreaking discoveries depend on the ability to efficiently analyze and process these massive amounts of data. To let scientists do science, not force them to become experts on parallel algorithms, data mining, and databases, we are developing Scolopax. Scolopax is a tool for scientific discovery. It will support a user-friendly interface for declaratively specifying discovery goals. All data processing will then be optimized automatically for fast and efficient execution on multiple processors, relying on novel data management techniques.
Consider a citizen scientist or casual observer who spots an interesting bird. Later at home, she wants to know the species of this bird. Despite availability of excellent bird guides, this often becomes a tedious process. Traditional classification techniques are not effective due to the nature of the problem, including having to deal with wrong and uncertain user inputs. Similar problems occur in many other contexts. We are developing novel interactive category identification techniques whose goal is to minimize user effort. Merlin is part of a major inter-institutional collaboration led by the Cornell Lab of Ornithology. The overall goal is to build a social networking site that connects citizen scientists, bird experts, and ecology researchers. Users can contribute data, explore birds, interact with others to learn more about ecology, and play online "games with a purpose". This system will broaden interest in (citizen) science and contribute to science education. (Recently started. More information coming soon.)
Cayuga: A Scalable System for Data Stream Processing
Additive Groves Prediction Technique and Automatic Interaction Detection
Interactive Search Queries for Online Communities (UC
Berkeley, April 2012)
Scolopax: Supporting Exploratory Analysis of Scientific Data (University of
Wisconsin, Madison, March 2012)
Scolopax: Supporting Exploratory Analysis of Scientific Data (MIT, February
2012)
Scolopax: Supporting Exploratory Analysis of Scientific Data (Brown University,
February 2012)
Scolopax: Supporting Exploratory Analysis of Scientific
Data (UPenn, Philadelphia, November 2011)
Near-Optimal Parallel Join Processing in MapReduce (Yahoo! Research; Google; IBM
Almaden Research Lab; May 2011)
PC area vice chair for data warehousing, statistics,
aggregate processing for the 2014 IEEE Int. Conf. on Data Engineering (ICDE)
Co-Chair of the program committee of the demo track for the 2012
IEEE Int. Conf. on Data Engineering (ICDE)
Member of the Editorial Advisory Board, Information Systems, Elsevier
2013 ACM SIGMOD Int. Conf. on Management of Data, Demo track, Program Committee
2013 Int. Conf. on Extending Database Technology (EDBT)
2012 ACM SIGKDD Int. Conf. on Knowledge Discovery and
Data Mining
2012 ACM SIGMOD Int. Conf. on Management of Data, Demo track, Program Committee
2012 Int. Conf. on Extending Database Technology (EDBT), Program Committee
2012 Int. Conf. on Extending Database Technology (EDBT), Data Analytics in the
Cloud Workshop, Program Committee
2012 IEEE Int. Conf. on Distributed Computing Systems (ICDCS)
2012 ACM
Symposium on Applied Computing (SAC), "Mobile Computing and Applications" track,
Program Committee
2012 Int. Conf. on Data Warehousing and Knowledge
Discovery (DaWaK), Program Committee
2012 ACM Int. Workshop on Data Warehousing and OLAP (DOLAP)
2012 Int. Symp. on Methodologies for Intelligence Systems (ISMIS), Warehousing
and OLAPing Complex, Spatial and Spatio-Temporal Data track
2011 ACM SIGMOD Int. Conf. on Management of Data, Demo track, Program Committee
2011 IEEE Int. Conf. on Data Engineering (ICDE), Program Committee
2011 ACM
Symposium on Applied Computing (SAC), "Mobile Computing and Applications" track,
Program Committee
2011 Summit of the New England Database Society
(NEDSummit), Program
Committee
2011 IEEE Int. Conf. on Intelligence and Security Informatics
(ISI), Program Committee
2011 Int. Conf. on Data Warehousing and Knowledge
Discovery (DaWaK), Program Committee
2011 ACM Int. Workshop on Data Warehousing and OLAP (DOLAP)
2011 Int. Conf. on Complex, Intelligent
and Software Intensive Systems (CISIS), Program Committee
Program committee membership before 2011:
ACM SIGMOD Int. Conf. on Management
of Data: 2004, 2009, 2010
Int. Conf. on Very Large Databases (VLDB): 2007
IEEE
Int. Conf. on Data Engineering (ICDE): 2006, 2007, 2008, 2009, 2010
ACM Conf. on
Information and Knowledge Management (CIKM): 2005, 2006, 2008
IEEE Int. Conf.
on Intelligence and Security Informatics (ISI) (formerly NSF/NIJ Symp. on Intelligence and Security Informatics (ISI)):
2003, 2004, 2005, 2006, 2007, 2008, 2009, 2010
Int. Conf. on Data Warehousing and
Knowledge Discovery (DaWaK): 2005, 2006, 2007, 2008
ACM Int. Workshop on Data
Warehousing and OLAP (DOLAP): 2005, 2006, 2007, 2008
Int. Conf. on Complex, Intelligent and Software Intensive Systems (CISIS): 2010
East-European Conf. on Advances in Database and Information Systems (ADBIS):
2010
Int. Symp. on Temporal
Representation and Reasoning (TIME): 2008
IEEE Int. Conf. on Computational
Science and Engineering: 2008
AAAI Nectar (New sCientific and Technical
Advances in Research): 2007
Int. Workshop on Mining Multimedia Streams in
Large-Scale Distributed Environments (MMSDE): 2008
Int. Conf. on Geosensor
Networks (GSN): 2006, 2009
SIGMOD Ph.D. Workshop on Innovative Database
Research (IDAR): 2008
Int. Workshop on Scalable Stream Processing Systems
(SSPS): 2007, 2008
ACM SIGKDD Int. Conf. on Knowledge Discovery and Data Mining:
2004
Int. Conf. on Machine Learning (ICML): 2003
Int. Conf. of Asian Digital Libraries (ICADL):
2003
Reviewer for leading research journals: ACM Transactions on Database Systems (TODS), ACM Transactions on Information Systems (TOIS), VLDB Journal, IEEE Transactions on Knowledge and Data Engineering (TKDE), IEEE Transactions on Multimedia, IEEE Computer, Data and Knowledge Engineering (DKE), International Journal of Business Intelligence and Data Mining (IJBIDM), Information Systems, Information Processing Letters (IPL), and others