What: cutting edge research in data management, data mining,
information retrieval, and machine learning
Who: everybody who is interested in the seminar topic, in particular PhD
students, faculty, and researchers
When: Mondays 12:15-1:15pm
Where: 166 WVH
| Date | Topic | Presenter | Comments |
|---|---|---|---|
| September 21 | Organizational meeting | Mirek Riedewald | |
| September 28 | Introduction to Probabilistic Databases | Maryam Bashir and Evangelos Kanoulas | |
| October 5 | A
Unified Approach to Ranking in Probabilistic Databases by Li, Saha, and Deshpanda (Univ. of Maryland) |
Peter Golbus | |
| October 12 | No seminar (Columbus Day) | ||
| October 19 | Artifcial Scientific Markets for Combinatorial Optimization: Can Data Mining and Machine Learning Help? | Karl Lieberherr and Ahmed Abdelmeged | More information here |
| October 26 | No seminar (Amazon visit day) | ||
| November 2 |
Generating Example Data for Dataflow Programs by Olston, Chopra, and Srivastava |
Alper Okcan | |
| November 9 | Programming Search Engines With High-Level Languages | Stefan Savev | Search engines are typically implemented in C, C++ and Java. In this
paper, we argue that because of the choice of programming language and
the design principles that are applied with it, those implementations
are non-modular and unduly complicated. We show that search engines are
better implemented in functional languages using the stratified design
principle known from the days of LISP of creating layers of domain
specific language within the main, host, language. Then those languages
are used to implement any conceivable program in the domain of interest.
Instead of implementing an search engine as a monolithic application, we
show that it is better to implement it as a script using a few domain
specific languages for parsing, streaming, physical data organization,
numerical optimization, querying and plotting, embedded in the host
functional language. The benefits are that developers can roll any kind
of search engine using the combinators provided in a few lines of code. Many of those languages (e.g. Parsec) are already known but have never been applied in the domain of Information Retrieval. Others (LINQ and PigLatin) are based on the standard list library of functional languages and have been recently proposed for large-scale data processing. We also developed a language for physical data layout allowing us to express concisely how various components of the index should be compressed. Our argument is about design, not language. Functional languages, however, express the designs most easily. |
| November 16 | Named Entities, Facts, and Patterns in Text | Virgil Pavlu and Shahzad Rajput | First we are going to discuss two papers on fact extraction. Essentially the idea is to generate patterns from very scarce input, and then use the patterns to match entity relations. Secondly, Shahzad will talk about entity tagging, including our own efforts. |
If you have any questions, please contact Mirek Riedewald.