Cayuga: A Scalable System for Data Stream Processing

The main project homepage is located at Cornell University. Below is a brief summary.

Cayuga is a highly scalable data stream processing system that can sustain very high throughput, up to thousands of events per second depending on the application, even if it has to process tens of thousands of active stream monitoring queries [DGHRW06, DGPR+07]. Cayuga supports a variety of applications, ranging from monitoring of large distributed computing systems and networks, automated stock trading, Business Activity Monitoring (BAM), and Business Process Management (BPM), all the way to expressive publish-subscribe for intelligent filtering and dissemination of RSS feeds and blogs [BDGH+07].

Cayuga achieves scalability and high performance through an automaton-based implementation and aggressive multi-query optimization (MQO). In recent work, we showed that these MQO benefits are not limited to automaton-based implementations. We developed a novel operator-based MQO framework that unifies traditional database optimization, relational-style data stream query optimizations, and automaton-style query optimizations [HRKG+09]. As a by-product, this multi-query optimizer eliminates the need for separating stream processing systems into operator-based (e.g., STREAM) and automaton-based (e.g., SASE, Cayuga) ones: all types of stream processing can be done efficiently in an operator-based system.

The work on Cayuga resulted in several related research paths. A challenge for users of event stream monitoring systems like Cayuga is to come up with the right queries. For example, when monitoring computing systems, which event patterns are signaling major software problems or hardware components that are going to fail soon? One way to find the right queries is to analyze event logs and to discover frequent sequence patterns that end with severe faults. These patterns can then be monitored in realtime by the Cayuga engine. In practice, bursts of common events make sequence mining costly and they tend to produce irrelevant patterns with high support that bury more interesting ones. We propose a data transformation to address this issue and prove desirable properties of the transformation [LR08].

We also developed novel techniques for efficiently processing a large number of concurrently active join queries, which correlate the contents of multiple streams of XML documents [HDGK+07]. And we developed an axiomatic framework for temporal models for event processing [WRGD07]. Using this framework we show that requirements for the "reasonable" semantics of event pattern queries dramatically limit the possibilities for choosing the appropriate temporal model.

[CGBR+11] B. Chandramouli, J. Goldstein, R. Barga, M. Riedewald, and I. Santos. Accurate Latency Estimation in a Distributed Event Processing System. To appear in Proc. IEEE Int. Conf. on Data Engineering (ICDE), 2011
[HRKG+09] M. Hong, M. Riedewald, C. Koch, J. Gehrke, and A. Demers. Rule-Based Multi-Query Optimization. In Proc. Int. Conf. on Extending Database Technology (EDBT), pages 120-131, 2009
[LR08] A. Lachmann and M. Riedewald. Finding Relevant Patterns in Bursty Sequences. In Proc. of the VLDB Endowment (PVLDB), 1(1):78-89, 2008
[HDGK+07] M. Hong, A. Demers, J. Gehrke, C. Koch, M. Riedewald, and W. White. Massively Multi-Query Join Processing in Publish/Subscribe Systems. In Proc. ACM SIGMOD Int. Conf. on Managament of Data, pages 761-772, 2007
[BDGH+07] L. Brenna, A. Demers, J. Gehrke, M. Hong, J. Ossher, B. Panda, M. Riedewald, M. Thatte, and W. White. Cayuga: A High-Performance Event Processing Engine (Demo Paper). In Proc. ACM SIGMOD Int. Conf. on Managament of Data, pages 1100-1102, 2007
[WRGD07] W. White, M. Riedewald, J. Gehrke and A. Demers. What is "Next" in Event Processing? In Proc. ACM Symp. on Principles of Database Systems, pages 263-272, 2007
[DGPR+07] A. Demers, J. Gehrke, B. Panda, M. Riedewald, V. Sharma, and W. White. Cayuga: A General Purpose Event Monitoring System. In Proc. Biennial Conf. on Innovative Data Systems Research (CIDR), pages 411-422, 2007
[DGHRW06] A. Demers, J. Gehrke, M. Hong, M. Riedewald, and W. White. Towards Expressive Publish/Subscribe Systems. In Proc. Int. Conf. on Extending Database Technology (EDBT), pages 627-644, 2006