ACL-02 Tutorial






Machine Learning for Text Classification Applications

David D. Lewis, Independent Consultant, Chicago, IL - Dave@DavidDLewis.com

Research on machine learning for text classification has exploded in the past decade. Practical applications have followed, in areas such as knowledge management, process improvement, customer service automation, text mining, alerting, intelligence and law enforcement, information feeds, spam and porn filtering, bioinformatics, and survey research.

After introducing text classification as an information processing task, I will discuss how machine learning can be used to reduce human effort in this task. Text classification poses particular challenges for machine learning, and I will look in detail at how algorithms for learning both rule-based and statistical/numerical classifiers are affected. I will also emphasize techniques for data preparation and transformation that are critical to success in operational settings.

Text classification is far from a solved problem, and I will conclude by discussing areas where further progress is likely to depend on contributions from computational linguistics.

Presenter: David D. Lewis is an independent consultant and researcher working with clients on technology for information retrieval, machine learning, natural language processing, and data mining. He previously held research positions at AT&T Labs, Bell Labs, and the University of Chicago. He received his Ph.D. in computer science from the University of Massachusetts, Amherst in 1992.

Back to the tutorials page.