Associate Professor
College of Computer and Information Science
202 West Village H
360 Huntington Avenue
Boston, MA 02115
![]()
+1-617-373 4766, fax (dept): +1-617-373 5121
My general areas of interest are databases and information systems. Currently I am focusing on the following areas:
Since September 2004 I am working on novel approaches for tracking environmental change based on bird abundance data. Currently we are mining a wealth of observational data hosted by Cornell's Lab of Ornithology in order to determine the relationship between environmental features and the abundance of wild bird species in North America [CEMR+06, KHFR+09]. A major direction of our research is to develop highly accurate prediction models [HCFM+07]; this work has already resulted in a novel regression technique that produces better predictions than state-of-the art methods [SCR07]. We also recently started to explore new approaches to enable scientists to discover interesting patterns in the complex prediction models trained from the collected data [PRF10]. One important type of patterns are statistical interactions between predictor variables [SCRF08, SCRH+09].
This material is based upon work supported by the National Science Foundation under Grant Nos. 0427914, 0612031, and 0748626. Any opinions, findings, and conclusions or recommendations expressed in this material are those of the authors and do not necessarily reflect the views of the National Science Foundation.
I have worked with domain scientists from different areas since 1999. In recent collaborations with physicists at Cornell's Wilson Lab the emphasis was on mining high-energy physics data and on managing metadata and provenance for elementary particle physics [DGJK+08]. Our work with the Cornell Astronomy department is surveyed in [CCD+04]. Data flow challenges for managing and analyzing astronomy data, elementary particle physics, and snapshots of the WWW are discussed in [AAC+06]. We recently started working with researchers in Cornell's Sibley School of Mechanical and Aerospace Engineering. The goal of this collaboration is to improve the performance of long-running complex simulations of combustions [PRPG+06, PRGP07].
[PRF10] B.
Panda, M. Riedewald, and D. Fink. The Model Summary Problem and a Solution for
Trees. To appear in Proc. IEEE Int. Conf. on Data
Engineering (ICDE), 2010
[KHFR+09] S.
Kelling, W. M. Hochachka, D. Fink, M. Riedewald, R. Caruana, G. Ballard, and G.
Hooker. Data Intensive Science: A
New Paradigm for Biodiversity Studies. BioScience, 57(7):613-620,
2009
[SCRH+09] D.
Sorokina , R. Caruana, M. Riedewald, W. M. Hochachka, and S. Kelling. Detecting
and Interpreting Variable Interactions in Observational Ornithology Data. To
appear in Proc. IEEE Int. Workshop on Domain Driven Data Mining (DDDM),
2009
[SCRF08] D.
Sorokina, R. Caruana, M. Riedewald, and D. Fink.
Detecting
Statistical Interactions with Additive Groves of Trees. In Proc.
International Conference on Machine Learning (ICML), pages 1000-1007, 2008
[DGJK+08] A.
Dolgert, L. Gibbons, C. D. Jones, V. Kuznetsov, M. Riedewald, D. Riley, G. J.
Sharp, and P. Wittich.
Provenance in
High-Energy Physics Workflows. In IEEE Computing in Science and
Engineering (CiSE), 10(3):22-29, 2008
[SCR07] D.
Sorokina, R. Caruana, and M. Riedewald:
Additive Groves of
Regression Trees. In Proc. European Conf. on Machine Learning (ECML),
pages 323-334, 2007 (Best Student Paper)
[PRGP07] B.
Panda, M. Riedewald, J. Gehrke, and S. B. Pope:
High-Speed Function
Approximation. In Proc. IEEE Int. Conf. on Data Mining (ICDM),
pages 613-618, 2007
[HCFM+07] W. M. Hochachka, R. Caruana, D. Fink, A. Munson, M. Riedewald, D.
Sorokina, and S. Kelling.
Data-Mining
Discovery of Pattern and Process in Ecological Systems. In Journal of
Wildlife Management, 71(7):2427--2437, 2007
[PRPG+06]
B. Panda, M. Riedewald, S. B. Pope, J. Gehrke, L. P. Chew.
Indexing for Function
Approximation. In Proc. Int. Conf. on Very Large Databases (VLDB),
pages 523-534, 2006
[CEMR+06] R. Caruana, M. Elhawary, A. Munson, M. Riedewald, D. Sorokina, D.
Fink, W. M. Hochachka, S. Kelling: Mining Citizen
Science Data to Predict Prevalence of Wild Bird Species. In Proc. ACM
SIGKDD Int. Conf. on Knowledge Discovery and Data Mining, pages 909-915,
2006
[AAC+06]
W. Y. Arms, S. Aya, M. Calimlim, J. Cordes, J. Deneva, P. Dmitriev, J. Gehrke,
L. Gibbons, C. D. Jones, V. Kuznetsov, D. Lifka, M. Riedewald, D. Riley, A. Ryd,
and G. J. Sharp. Three Case Studies of Large-Scale Data Flows. In Proc.
IEEE
Workshop on Workflow and Data Flow for Scientific Applications (SciFlow).
2006
[CCD+04] M. Calimlim, J. Cordes, A. Demers, J. Deneva, J. Gehrke,
D. Kifer, M. Riedewald, and J. Shanmugasundaram.
A Vision for
PetaByte Data Management and Analysis Services for the Arecibo Telescope.
Bulletin of the Technical Committee on Data Engineering, IEEE Computer
Society, 27(4), 2004
Cayuga is a highly scalable data stream processing system that can sustain very high throughput, up to thousands of events per second depending on the application, even if it has to process tens of thousands of active stream monitoring queries [DGHRW06, DGPR+07]. Cayuga supports a variety of applications, ranging from monitoring of large distributed computing systems and networks, automated stock trading, Business Activity Monitoring (BAM), and Business Process Management (BPM), all the way to expressive publish-subscribe for intelligent filtering and dissemination of RSS feeds and blogs [BDGH+07].
Cayuga achieves scalability and high performance through an automaton-based implementation and aggressive multi-query optimization (MQO). In recent work, we showed that these MQO benefits are not limited to automaton-based implementations. We developed a novel operator-based MQO framework that unifies traditional database optimization, relational-style data stream query optimizations, and automaton-style query optimizations [HRKG+09]. As a by-product, this multi-query optimizer eliminates the need for separating stream processing systems into operator-based (e.g., STREAM) and automaton-based (e.g., SASE, Cayuga) ones: all types of stream processing can be done efficiently in an operator-based system.
The work on Cayuga resulted in several related research paths. A challenge for users of event stream monitoring systems like Cayuga is to come up with the right queries. For example, when monitoring computing systems, which event patterns are signaling major software problems or hardware components that are going to fail soon? One way to find the right queries is to analyze event logs and to discover frequent sequence patterns that end with severe faults. These patterns can then be monitored in realtime by the Cayuga engine. In practice, bursts of common events make sequence mining costly and they tend to produce irrelevant patterns with high support that bury more interesting ones. We propose a data transformation to address this issue and prove desirable properties of the transformation [LR08].
We also developed novel techniques for efficiently processing a large number of concurrently active join queries, which correlate the contents of multiple streams of XML documents [HDGK+07]. And we developed an axiomatic framework for temporal models for event processing [WRGD07]. Using this framework we show that requirements for the "reasonable" semantics of event pattern queries dramatically limit the possibilities for choosing the appropriate temporal model.
[HRKG+09] M.
Hong, M. Riedewald, C. Koch, J. Gehrke, and A. Demers.
Rule-Based Multi-Query
Optimization. In
Proc. Int. Conf. on Extending Database Technology (EDBT), pages 120-131, 2009
[LR08] A.
Lachmann and M. Riedewald.
Finding Relevant Patterns in Bursty Sequences.
In Proc. of the VLDB Endowment (PVLDB), 1(1):78-89, 2008
[HDGK+07] M.
Hong, A. Demers, J. Gehrke, C. Koch, M. Riedewald, and W. White.
Massively
Multi-Query Join Processing in Publish/Subscribe Systems. In Proc. ACM SIGMOD Int. Conf. on Managament of Data,
pages 761-772, 2007
[BDGH+07] L.
Brenna, A. Demers, J. Gehrke, M. Hong, J. Ossher, B. Panda, M. Riedewald, M.
Thatte, and W. White. Cayuga: A High-Performance Event Processing Engine
(Demo Paper). In Proc. ACM SIGMOD Int. Conf. on Managament of Data,
pages 1100-1102, 2007
[WRGD07] W.
White, M. Riedewald, J. Gehrke and A. Demers.
What is "Next" in Event
Processing? In Proc. ACM Symp. on Principles of Database Systems,
pages 263-272, 2007
[DGPR+07] A.
Demers, J. Gehrke, B. Panda, M. Riedewald, V. Sharma, and W. White.
Cayuga: A General
Purpose Event Monitoring System. In Proc. Biennial Conf. on Innovative
Data Systems Research (CIDR), pages 411-422, 2007
[DGHRW06] A.
Demers, J. Gehrke, M. Hong, M. Riedewald, and W. White.
Towards Expressive
Publish/Subscribe Systems. In Proc. Int. Conf. on Extending
Database Technology (EDBT), pages 627-644, 2006
Finding Patterns in Large-Scale Observational Data (Int. Conf. on
Computational Sustainability, Working Group on Species Distribution, June 2009)
Indexing for Function Approximation (Northwest Database Society seminar at
University of Washington, Seattle, December 2006)
Indexing for Function Approximation (database and data mining seminar at
Microsoft Research, Redmond, November 2006)
Towards Expressive and Scalable Publish/Subscribe (invited talk at Microsoft
Research, Redmond, October 2005)
Cayuga: Internet-Scale Monitoring of Data Streams (CS colloquium at the
University of Florida, Gainesville, April 2005)
Data Warehouse Meets Data Stream (Dagstuhl Perspectives Workshop: Data
Warehousing at the Crossroads, August 2004)
Efficient Processing of Data Streams for Mining and Monitoring (35th Symp. on
the Interface, Salt Lake City, Utah, March 2003)
Efficient Analysis of Massive Data in Data Warehouses and Data Stream Processing
Systems (CS colloquium at the University of Rostock, Germany, December 2002)
2010 IEEE Int. Conf. on Data Engineering (ICDE), Program Committee
2009 ACM SIGMOD Int. Conf. on Management of Data, Program Committee
2009 IEEE Int. Conf. on Data Engineering (ICDE), Program Committee
2009 IEEE Int. Conf. on Intelligence and Security Informatics (ISI), Program
Committee
2009 Int. Conf. on Geosensor Networks (GSN), Program Committee
2008 IEEE Int. Conf. on Data Engineering (ICDE), Program Committee
2008 ACM Conf. on Information and Knowledge Management (CIKM), Program Committee
2008 Int. Symp. on Temporal Representation and Reasoning (TIME), Program
Committee
2008 Int. Conf. on Data Warehousing and Knowledge Discovery (DaWaK), Program
Committee
2008 ACM Int. Workshop on Data Warehousing and OLAP (DOLAP), Program Committee
2008 Int. Workshop on Mining Multimedia Streams in Large-Scale Distributed
Environments (MMSDE), Program Committee
2008 Int. Workshop on Scalable Stream Processing Systems (SSPS), Program
Committee
2008 IEEE Int. Conf. on Computational Science and Engineering, Program Committee
2008 IEEE Int. Conf. on Intelligence and Security Informatics (ISI), Program
Committee
2008 SIGMOD Ph.D. Workshop on Innovative Database Research (IDAR), Program
Committee
2007 Int. Conf. on Very Large Databases (VLDB), Program Committee
2007 AAAI Nectar (New sCientific and Technical Advances in Research), Program
Committee
2007 Int. Conf. on Data Warehousing and Knowledge Discovery (DaWaK), Program
Committee
2007 IEEE Int. Conf. on Intelligence and Security Informatics (ISI), Program
Committee
2007 IEEE Int. Conf. on Data Engineering (ICDE), Program Committee
2007 Int. Workshop on Scalable Stream Processing Systems (SSPS), Program
Committee
2006 ACM Conf. on Information and Knowledge Management (CIKM), Program Committee
2006 Int. Conf. on Geosensor Networks (GSN), Program Committee
2006 Int. Conf. on Data Warehousing and Knowledge Discovery (DaWaK), Program
Committee
2006 ACM Int. Workshop on Data Warehousing and OLAP (DOLAP), Program Committee
2006 IEEE Int. Conf. on Intelligence and Security Informatics (ISI), Program
Committee
2006 IEEE Int. Conf. on Data Engineering (ICDE), Program Committee
2005 ACM Conf. on Information and Knowledge Management (CIKM), Program Committee
2005 Int. Conf. on Data Warehousing and Knowledge Discovery (DaWaK), Program
Committee
2005 ACM Int. Workshop on Data Warehousing and OLAP (DOLAP), Program Committee
2005 IEEE Int. Conf. on Intelligence and Security Informatics (ISI), Program
Committee
Program committee membership before 2005: ACM SIGMOD Int. Conf. on Management of Data, ACM SIGKDD Int. Conf. on Knowledge Discovery and Data Mining, Int. Conf. on Machine Learning (ICML), Int. Conf. of Asian Digital Libraries (ICADL), NSF/NIJ Symp. on Intelligence and Security Informatics (ISI)
Reviewer for leading research journals: ACM Transactions on Database Systems (TODS), ACM Transactions on Information Systems (TOIS), VLDB Journal, IEEE Transactions on Knowledge and Data Engineering (TKDE), IEEE Transactions on Multimedia, IEEE Computer, Data and Knowledge Engineering (DKE), International Journal of Business Intelligence and Data Mining (IJBIDM), Information Systems, Information Processing Letters (IPL), and others