David Smith

Assistant Professor

Current Research Projects

  • Inferring the structure of networks from text, in particular by discovering and tracking reused passages embedded in terascale text collections: This project involves collaborations with colleagues at Northeastern, on modeling social and communication networks in the 19th century, and with researchers at the University of Washington, on modeling and visualizing networks of political influence.
  • Exploiting clustering for structured prediction: Using learned models of document similarity helps machine learning models, and human users of information retrieval systems, better exploit large amounts of unlabeled data. With collaborators from UMass Amherst, this project is working on applications from optical character recognition and speech recognition to named entity linking and relation extraction.
  • Modeling topical similarity in multilingual corpora: Documents in multiple languages often discuss the same topics without being direct translations of each other. Modeling these shared topics enables training translation systems on these “comparable corpora” and tracing the flow of ideas around the world.

Research Interests

Efficient inference for machine learning models with complex latent structure; modeling natural language structures, such as morphology, syntax, and semantics; modeling the mutations in texts as they propagate through social networks and in language across space and time; interactive information retrieval and machine learning for expert users.


  • BA | Harvard
  • PhD in Computer Science | Johns Hopkins University

Google Scholar Page


David A. Smith is an assistant professor in the College of Computer and Information Science and a founding member of the NULab for Texts, Maps, and Networks, Northeastern’s center for the digital humanities and computational social sciences.

Before his Ph.D. in Computer Science from the Johns Hopkins University, which he completed in 2010, he received a B.A. summa cum laude in Classics (Greek) from Harvard and worked for Tufts’ Perseus Digital Library Project, one of the most widely-used linguistic and cultural research systems in the humanities. Prior to joining Northeastern, he was a research assistant professor at the University of Massachusetts, Amherst. Professor Smith has published widely in the areas of natural language processing and computational linguistics, information retrieval, digital libraries, digital humanities, and political science. His research has been funded by the NSF, NEH, DARPA, ONR, AFRL, the Mellon Foundation, and Google.

CCIS Faculty

The brightest and most innovative in the industry