Database, Data Mining, Information Retrieval, and Machine Learning Seminar

What: cutting edge research in data management, data mining, information retrieval, and machine learning
Who: everybody who is interested in the seminar topic, in particular PhD students, faculty, and researchers
When: Mondays 12:15-1:15pm
Where: 166 WVH

Date Topic Presenter Comments
September 21 Organizational meeting Mirek Riedewald  
September 28 Introduction to Probabilistic Databases Maryam Bashir and Evangelos Kanoulas  
October 5 A Unified Approach to Ranking in Probabilistic Databases
by Li, Saha, and Deshpanda (Univ. of Maryland)
Peter Golbus  
October 12 No seminar (Columbus Day)    
October 19 Artifcial Scientific Markets for Combinatorial Optimization: Can Data Mining and Machine Learning Help? Karl Lieberherr and Ahmed Abdelmeged More information here
October 26 No seminar (Amazon visit day)    
November 2 Generating Example Data for Dataflow Programs
by Olston, Chopra, and Srivastava
Alper Okcan  
November 9 Programming Search Engines With High-Level Languages Stefan Savev Search engines are typically implemented in C, C++ and Java. In this paper, we argue that because of the choice of programming language and the design principles that are applied with it, those implementations are non-modular and unduly complicated. We show that search engines are better implemented in functional languages using the stratified design principle known from the days of LISP of creating layers of domain specific language within the main, host, language. Then those languages are used to implement any conceivable program in the domain of interest. Instead of implementing an search engine as a monolithic application, we show that it is better to implement it as a script using a few domain specific languages for parsing, streaming, physical data organization, numerical optimization, querying and plotting, embedded in the host functional language. The benefits are that developers can roll any kind of search engine using the combinators provided in a few lines of code.
Many of those languages (e.g. Parsec) are already known but have never been applied in the domain of Information Retrieval. Others (LINQ and PigLatin) are based on the standard list library of functional languages and have been recently proposed for large-scale data processing.
We also developed a language for physical data layout allowing us to express concisely how various components of the index should be compressed. Our argument is about design, not language. Functional languages, however, express the designs most easily.
November 16 Named Entities, Facts, and Patterns in Text Virgil Pavlu and Shahzad Rajput First we are going to discuss two papers on fact extraction. Essentially the idea is to generate patterns from very scarce input, and then use the patterns to match entity relations. Secondly, Shahzad will talk about entity tagging, including our own efforts.
       
       

If you have any questions, please contact Mirek Riedewald.