Mirek RiedewaldPhoto

Associate Professor
College of Computer and Information Science
202 West Village H

360 Huntington Avenue

Boston, MA 02115

phone +1-617-373 4766, fax (dept): +1-617-373 5121
 

Research Interests

My general areas of interest are databases and information systems. Currently I am focusing on the following areas:

eScience: Data Management and Analysis Services for the Sciences

Since September 2004 I am working on novel approaches for tracking environmental change based on bird abundance data. Currently we are mining a wealth of observational data hosted by Cornell's Lab of Ornithology in order to determine the relationship between environmental features and the abundance of wild bird species in North America [CEMR+06, KHFR+09]. A major direction of our research is to develop highly accurate prediction models [HCFM+07]; this work has already resulted in a novel regression technique that produces better predictions than state-of-the art methods [SCR07]. We also recently started to explore new approaches to enable scientists to discover interesting patterns in the complex prediction models trained from the collected data [PRF10]. One important type of patterns are statistical interactions between predictor variables [SCRF08, SCRH+09].

This material is based upon work supported by the National Science Foundation under Grant Nos. 0427914, 0612031, and 0748626. Any opinions, findings, and conclusions or recommendations expressed in this material are those of the authors and do not necessarily reflect the views of the National Science Foundation.

I have worked with domain scientists from different areas since 1999. In recent collaborations with physicists at Cornell's Wilson Lab the emphasis was on mining high-energy physics data and on managing metadata and provenance for elementary particle physics [DGJK+08]. Our work with the Cornell Astronomy department is surveyed in [CCD+04]. Data flow challenges for managing and analyzing astronomy data, elementary particle physics, and snapshots of the WWW are discussed in [AAC+06]. We recently started working with researchers in Cornell's Sibley School of Mechanical and Aerospace Engineering. The goal of this collaboration is to improve the performance of long-running complex simulations of combustions [PRPG+06, PRGP07].

[PRF10] B. Panda, M. Riedewald, and D. Fink. The Model Summary Problem and a Solution for Trees. To appear in Proc. IEEE Int. Conf. on Data Engineering (ICDE), 2010
[KHFR+09] S. Kelling, W. M. Hochachka, D. Fink, M. Riedewald, R. Caruana, G. Ballard, and G. Hooker. Data Intensive Science: A New Paradigm for Biodiversity Studies. BioScience, 57(7):613-620, 2009
[SCRH+09] D. Sorokina , R. Caruana, M. Riedewald, W. M. Hochachka, and S. Kelling. Detecting and Interpreting Variable Interactions in Observational Ornithology Data. To appear in Proc. IEEE Int. Workshop on Domain Driven Data Mining (DDDM), 2009
[SCRF08] D. Sorokina, R. Caruana, M. Riedewald, and D. Fink. Detecting Statistical Interactions with Additive Groves of Trees. In Proc. International Conference on Machine Learning (ICML), pages 1000-1007, 2008
[DGJK+08] A. Dolgert, L. Gibbons, C. D. Jones, V. Kuznetsov, M. Riedewald, D. Riley, G. J. Sharp, and P. Wittich. Provenance in High-Energy Physics Workflows. In IEEE Computing in Science and Engineering (CiSE), 10(3):22-29, 2008
[SCR07] D. Sorokina, R. Caruana, and M. Riedewald: Additive Groves of Regression Trees. In Proc. European Conf. on Machine Learning (ECML), pages 323-334, 2007 (Best Student Paper)
[PRGP07] B. Panda, M. Riedewald, J. Gehrke, and S. B. Pope: High-Speed Function Approximation. In Proc. IEEE Int. Conf. on Data Mining (ICDM), pages 613-618, 2007
[HCFM+07] W. M. Hochachka, R. Caruana, D. Fink, A. Munson, M. Riedewald, D. Sorokina, and S. Kelling. Data-Mining Discovery of Pattern and Process in Ecological Systems. In Journal of Wildlife Management, 71(7):2427--2437, 2007
[PRPG+06] B. Panda, M. Riedewald, S. B. Pope, J. Gehrke, L. P. Chew. Indexing for Function Approximation. In Proc. Int. Conf. on Very Large Databases (VLDB), pages 523-534, 2006
[CEMR+06] R. Caruana, M. Elhawary, A. Munson, M. Riedewald, D. Sorokina, D. Fink, W. M. Hochachka, S. Kelling: Mining Citizen Science Data to Predict Prevalence of Wild Bird Species. In Proc. ACM SIGKDD Int. Conf. on Knowledge Discovery and Data Mining, pages 909-915, 2006
[AAC+06] W. Y. Arms, S. Aya, M. Calimlim, J. Cordes, J. Deneva, P. Dmitriev, J. Gehrke, L. Gibbons, C. D. Jones, V. Kuznetsov, D. Lifka, M. Riedewald, D. Riley, A. Ryd, and G. J. Sharp. Three Case Studies of Large-Scale Data Flows. In Proc. IEEE Workshop on Workflow and Data Flow for Scientific Applications (SciFlow). 2006
[CCD+04] M. Calimlim, J. Cordes, A. Demers, J. Deneva, J. Gehrke, D. Kifer, M. Riedewald, and J. Shanmugasundaram. A Vision for PetaByte Data Management and Analysis Services for the Arecibo Telescope. Bulletin of the Technical Committee on Data Engineering, IEEE Computer Society, 27(4), 2004

Cayuga: Managing Data Streams

Cayuga is a highly scalable data stream processing system that can sustain very high throughput, up to thousands of events per second depending on the application, even if it has to process tens of thousands of active stream monitoring queries [DGHRW06, DGPR+07]. Cayuga supports a variety of applications, ranging from monitoring of large distributed computing systems and networks, automated stock trading, Business Activity Monitoring (BAM), and Business Process Management (BPM), all the way to expressive publish-subscribe for intelligent filtering and dissemination of RSS feeds and blogs [BDGH+07].

Cayuga achieves scalability and high performance through an automaton-based implementation and aggressive multi-query optimization (MQO). In recent work, we showed that these MQO benefits are not limited to automaton-based implementations. We developed a novel operator-based MQO framework that unifies traditional database optimization, relational-style data stream query optimizations, and automaton-style query optimizations [HRKG+09]. As a by-product, this multi-query optimizer eliminates the need for separating stream processing systems into operator-based (e.g., STREAM) and automaton-based (e.g., SASE, Cayuga) ones: all types of stream processing can be done efficiently in an operator-based system.

The work on Cayuga resulted in several related research paths. A challenge for users of event stream monitoring systems like Cayuga is to come up with the right queries. For example, when monitoring computing systems, which event patterns are signaling major software problems or hardware components that are going to fail soon? One way to find the right queries is to analyze event logs and to discover frequent sequence patterns that end with severe faults. These patterns can then be monitored in realtime by the Cayuga engine. In practice, bursts of common events make sequence mining costly and they tend to produce irrelevant patterns with high support that bury more interesting ones. We propose a data transformation to address this issue and prove desirable properties of the transformation [LR08].

We also developed novel techniques for efficiently processing a large number of concurrently active join queries, which correlate the contents of multiple streams of XML documents [HDGK+07]. And we developed an axiomatic framework for temporal models for event processing [WRGD07]. Using this framework we show that requirements for the "reasonable" semantics of event pattern queries dramatically limit the possibilities for choosing the appropriate temporal model.

[HRKG+09] M. Hong, M. Riedewald, C. Koch, J. Gehrke, and A. Demers. Rule-Based Multi-Query Optimization. In Proc. Int. Conf. on Extending Database Technology (EDBT), pages 120-131, 2009
[LR08] A. Lachmann and M. Riedewald. Finding Relevant Patterns in Bursty Sequences. In Proc. of the VLDB Endowment (PVLDB), 1(1):78-89, 2008
[HDGK+07] M. Hong, A. Demers, J. Gehrke, C. Koch, M. Riedewald, and W. White. Massively Multi-Query Join Processing in Publish/Subscribe Systems. In Proc. ACM SIGMOD Int. Conf. on Managament of Data, pages 761-772, 2007
[BDGH+07] L. Brenna, A. Demers, J. Gehrke, M. Hong, J. Ossher, B. Panda, M. Riedewald, M. Thatte, and W. White. Cayuga: A High-Performance Event Processing Engine (Demo Paper). In Proc. ACM SIGMOD Int. Conf. on Managament of Data, pages 1100-1102, 2007
[WRGD07] W. White, M. Riedewald, J. Gehrke and A. Demers. What is "Next" in Event Processing? In Proc. ACM Symp. on Principles of Database Systems, pages 263-272, 2007
[DGPR+07] A. Demers, J. Gehrke, B. Panda, M. Riedewald, V. Sharma, and W. White. Cayuga: A General Purpose Event Monitoring System. In Proc. Biennial Conf. on Innovative Data Systems Research (CIDR), pages 411-422, 2007
[DGHRW06] A. Demers, J. Gehrke, M. Hong, M. Riedewald, and W. White. Towards Expressive Publish/Subscribe Systems. In Proc. Int. Conf. on Extending Database Technology (EDBT), pages 627-644, 2006

Selected Professional Activities

Invited Talkstypewriter

Finding Patterns in Large-Scale Observational Data (Int. Conf. on Computational Sustainability, Working Group on Species Distribution, June 2009)
Indexing for Function Approximation (Northwest Database Society seminar at University of Washington, Seattle, December 2006)
Indexing for Function Approximation (database and data mining seminar at Microsoft Research, Redmond, November 2006)
Towards Expressive and Scalable Publish/Subscribe (invited talk at Microsoft Research, Redmond, October 2005)
Cayuga: Internet-Scale Monitoring of Data Streams (CS colloquium at the University of Florida, Gainesville, April 2005)
Data Warehouse Meets Data Stream (Dagstuhl Perspectives Workshop: Data Warehousing at the Crossroads, August 2004)
Efficient Processing of Data Streams for Mining and Monitoring (35th Symp. on the Interface, Salt Lake City, Utah, March 2003)
Efficient Analysis of Massive Data in Data Warehouses and Data Stream Processing Systems (CS colloquium at the University of Rostock, Germany, December 2002)

Professional Service (most recent)

2010 IEEE Int. Conf. on Data Engineering (ICDE), Program Committee
2009 ACM SIGMOD Int. Conf. on Management of Data, Program Committee
2009 IEEE Int. Conf. on Data Engineering (ICDE), Program Committee
2009 IEEE Int. Conf. on Intelligence and Security Informatics (ISI), Program Committee
2009 Int. Conf. on Geosensor Networks (GSN), Program Committee
2008 IEEE Int. Conf. on Data Engineering (ICDE), Program Committee
2008 ACM Conf. on Information and Knowledge Management (CIKM), Program Committee
2008 Int. Symp. on Temporal Representation and Reasoning (TIME), Program Committee
2008 Int. Conf. on Data Warehousing and Knowledge Discovery (DaWaK), Program Committee
2008 ACM Int. Workshop on Data Warehousing and OLAP (DOLAP), Program Committee
2008 Int. Workshop on Mining Multimedia Streams in Large-Scale Distributed Environments (MMSDE), Program Committee
2008 Int. Workshop on Scalable Stream Processing Systems (SSPS), Program Committee
2008 IEEE Int. Conf. on Computational Science and Engineering, Program Committee
2008 IEEE Int. Conf. on Intelligence and Security Informatics (ISI), Program Committee
2008 SIGMOD Ph.D. Workshop on Innovative Database Research (IDAR), Program Committee
2007 Int. Conf. on Very Large Databases (VLDB), Program Committee
2007 AAAI Nectar (New sCientific and Technical Advances in Research), Program Committee
2007 Int. Conf. on Data Warehousing and Knowledge Discovery (DaWaK), Program Committee
2007 IEEE Int. Conf. on Intelligence and Security Informatics (ISI), Program Committee
2007 IEEE Int. Conf. on Data Engineering (ICDE), Program Committee
2007 Int. Workshop on Scalable Stream Processing Systems (SSPS), Program Committee
2006 ACM Conf. on Information and Knowledge Management (CIKM), Program Committee
2006 Int. Conf. on Geosensor Networks (GSN), Program Committee
2006 Int. Conf. on Data Warehousing and Knowledge Discovery (DaWaK), Program Committee
2006 ACM Int. Workshop on Data Warehousing and OLAP (DOLAP), Program Committee
2006 IEEE Int. Conf. on Intelligence and Security Informatics (ISI), Program Committee
2006 IEEE Int. Conf. on Data Engineering (ICDE), Program Committee
2005 ACM Conf. on Information and Knowledge Management (CIKM), Program Committee
2005 Int. Conf. on Data Warehousing and Knowledge Discovery (DaWaK), Program Committee
2005 ACM Int. Workshop on Data Warehousing and OLAP (DOLAP), Program Committee
2005 IEEE Int. Conf. on Intelligence and Security Informatics (ISI), Program Committee

Program committee membership before 2005: ACM SIGMOD Int. Conf. on Management of Data, ACM SIGKDD Int. Conf. on Knowledge Discovery and Data Mining, Int. Conf. on Machine Learning (ICML), Int. Conf. of Asian Digital Libraries (ICADL), NSF/NIJ Symp. on Intelligence and Security Informatics (ISI)

Reviewer for leading research journals: ACM Transactions on Database Systems (TODS), ACM Transactions on Information Systems (TOIS), VLDB Journal, IEEE Transactions on Knowledge and Data Engineering (TKDE), IEEE Transactions on Multimedia, IEEE Computer, Data and Knowledge Engineering (DKE), International Journal of Business Intelligence and Data Mining (IJBIDM), Information Systems, Information Processing Letters (IPL), and others

Links

30 Oct 2009