In the last ten years we have seen the creation of massive digital text collections, from Twitter feeds to million-book libraries. At the same time, researchers have developed text mining methods that go beyond simple word frequency analysis to uncover thematic patterns. When we combine big data with powerful algorithms, we enable analysts in many different fields to enhance qualitative perspectives with quantitative measurements. But these methods are only useful if we can apply them at massive scale and distinguish consistent patterns from random variations. In this talk I will describe my work building reliable topic-mining methodologies for humanists, social scientists and science policy officers.
David Mimno is a postdoctoral researcher in the Computer Science department at Princeton University. He received his PhD from the University of Massachusetts, Amherst. Before graduate school, he served as Head Programmer at the Perseus Project, a digital library for cultural heritage materials, at Tufts University. He is supported by a CRA Computing Innovation fellowship.