Exploring the humanities with digital tools

David Smith (left), assistant professor of computational social science in the College of Computer and Information Science, and Ryan Cordell, assistant professor of English and digital humanities in the College of Social Sciences and Humanities. Photo by Brooks Canaday.

In the past, a scholar would have to spend years of intense researching in order to assemble a broad humanities-based assess­ment of a topic like the role of race in 19th-century literature.

“That would require reading for years,” said Ryan Cordell, a new assis­tant pro­fessor of Eng­lish in theCol­lege of Social Sci­ences and Human­i­ties at North­eastern. “And after all that time, he or she would have read 0.0001 per­cent of what was written in that era. There are limits of what you can phys­i­cally read.”

Enter the emerging field of dig­ital human­i­ties, which applies com­puter and network-science tech­niques to dig­i­tized texts, like the mas­sive vol­umes of lit­er­a­ture that have been scanned and stored over the past two decades.

“The Internet Archive has scanned more than 2 mil­lion public-domain books span­ning 500 years, so we can see how lan­guage, words and syntax change over time — or look at any broad trend that exists,” said David Smith, a new assis­tant pro­fessor in the Col­lege of Com­puter and Infor­ma­tion Sci­ence. He was pre­vi­ously a research assis­tant pro­fessor at the Uni­ver­sity of Massachusetts-Amherst and in 2010 received a Ph.D. from Johns Hop­kins University.

Smith and Cordell are among the fac­ulty mem­bers founding Northeastern’s new Cen­ters for Dig­ital Human­i­ties and Com­pu­ta­tional Social Sci­ence, an inter­dis­ci­pli­nary base for researchers from schools including the Col­lege of Com­puter and Infor­ma­tion Sci­ence, the Col­lege of Social Sci­ences and Human­i­ties and the Col­lege of Sci­ence.

“By turning these archives into data, we can make quan­ti­ta­tive and replica­tive analysis,” said Smith, such as looking at how infor­ma­tion spreads through a society over time or looking at lit­er­a­ture to examine issues like social mobility during a par­tic­ular era.

Cordell, who received his Ph.D. from the Uni­ver­sity of Vir­ginia in 2010, enters the field from a human­i­ties per­spec­tive: While working on his dis­ser­ta­tion, he began to track the (usu­ally uncred­ited) spread of a piece by Nathaniel Hawthorne through news­pa­pers and pub­li­ca­tions across the United States. Hawthorne him­self used the term “pirating” before its per­va­sive use to describe his work’s spread, and Cordell was curious if that same phe­nom­enon existed with other publications.

“If you don’t know what is going to be reprinted, you’re left com­paring every­thing to every­thing else,” said Smith, who explained how digital-humanities methods allow researchers to turn text into search­able data, which can be orga­nized and assessed with network-science tech­niques. “What you ulti­mately get are net­work maps that let us the­o­rize how these pub­li­ca­tions were talking to one another and explain how this infor­ma­tion spread.”

Both Cordell and Smith will be teaching courses for under­grad­u­ates and grad­u­ates this fall: Smith a course on infor­ma­tion retrieval, and Cordell one on tech­nolo­gies of text, which he jokes covers “a his­tory of reading from the scroll to the scroll.”