Northeastern University
College of Computer and Information Science

Contact Us

  • Contact Us

Search

  • Explore CCIS
    • About the College
      • Dean’s Message
    • Undergraduate Programs
      • Advising
      • Degree Programs
      • Minor in Computer Science
      • Minor in Information Science
      • Tutoring
      • Scholarships
      • Student Awards
    • Graduate Programs
      • Degree Programs
      • Current Students
    • Co-op
    • People and Organizations
      • Faculty
      • Administrative Staff
      • Student Organizations
    • Contact Us
    • Research
      • Research Groups
      • Centers and Institutes
    • Technical Help
  • Prospective Students
  • Current Students
  • Alumni
  • Employers
Layout Image
  • About the College
    • Dean’s Message
    • CCIS Videos
  • Undergraduate Programs
    • Advising
    • Degree Programs
    • Minor in Computer Science
    • Minor in Information Science
    • Scholarships
      • Bradley E. Bailey Scholarship
      • Darwin Scholarship
      • Jane K. Wenzinger Scholarship Fund
      • Department of Defense Information Assurance Scholarship Program
      • NSF Federal Cyber Service: Scholarship for Service
    • Student Awards and Research
    • Tutoring
  • Graduate Programs
    • Degree Programs
      • Ph.D. in Computer Science
        • Admission Requirements
        • Academic Requirements
        • Time and Time Limitation
        • Transfer Credit
        • Approved Courses
        • Electives Outside the College
        • Specimen Curriculum
        • Academic Review Process
      • Ph.D. in Information Assurance
        • Admissions Requirements
        • Academic Requirements
        • Time and Time Limitation
        • Transfer Credit
        • Specimen Curriculum
        • Program Faculty
        • Contact Us
      • Ph.D. in Personal Health Informatics
      • M.S. in Computer Science
        • Admissions Requirements
        • Academic Requirements
        • Academic Probation
        • Time and Time Limitation
        • Transfer Credit
        • Approved Courses
        • Specimen Academic Schedule
        • Reading and Project Courses
        • Master’s Thesis
        • Request More Information
      • M.S. in Information Assurance
        • Admissions Requirements
        • Academic Requirements
        • Specimen Academic Schedule
        • Financial Aid and Scholarships
        • Faculty
        • Request More Information- MSIA
      • M.S. in Health Informatics
        • Program Overview
        • Master’s Degree
        • Certificates
        • Course Descriptions
        • Testimonials
        • Faculty
        • Careers
        • Student Profiles
        • Apply
        • Request More Information- MSHI
      • ALIGN
    • Apply
    • Scholarships
    • FAQ
    • Current Students
      • Course Descriptions
      • Course Schedules
      • Graduate Guidebook
      • Commencement
      • Forms
      • Travel Support
      • Wiki
      • Jobs
      • New Student Page
        • MyNeu Account
        • Course Registration
        • Health Insurance Requirements
        • ISSI Orientation
        • CCIS Orientation
        • CCIS Email Account
        • Paying Your Bill
        • Husky ID Cards
        • Online Learning
        • Housing
        • Parking
        • Public Transportation
  • Research
    • Research Groups
      • Algorithms and Theory
      • Artificial Intelligence
      • Data
      • Educational Research
      • Formal Methods
      • Game Design
      • Network Science
      • Personal Health Informatics
      • Programming Languages
      • Security
      • Software Engineering
      • Systems
    • Centers and Institutes
  • Co-op
    • Information for Students
      • FAQ
      • Information for New Students
      • Information for Upperclass Students
      • Information for Graduate Students
      • Prospective
      • Forms
    • Information for Employers
    • Co-op Manual
      • Steps to Finding A Job
      • Taking a Course
      • Academic Standards
    • Research & Data
      • Assessment
    • Calendar
    • Surveys & Evaluations
      • Student Evaluation
      • Employer Evaluation
  • People and Organizations
    • Faculty
    • Administrative Staff
    • Student Organizations
  • News & Events
    • News Archive
    • Events
    • Distinguished Speakers Series

Data Mining in a Complex World

By bironje
Tuesday, January 29th, 2013

 

Yizhou Sun

Gold mining requires a cer­tain amount of patience: For example, you would have to sift through about 300 tons of earth and rock to come up with enough of the pre­cious metal to make a single wed­ding ring. Data mining is sim­ilar. Every day, ter­abytes of data accu­mu­late in the tech­nology that society has come to rely on. But turning that chaotic mess of zeros and ones into mean­ingful knowl­edge can be a com­plex math­e­mat­ical challenge.

Typ­i­cally, researchers try to sim­plify this chal­lenge by lim­iting the scope of their ques­tions. But Yizhou Sun, a newly appointed assis­tant pro­fessor in the Col­lege of Com­puter and Infor­ma­tion Sci­ence, believes that making useful pre­dic­tions and infer­ences with new data requires us to account for its complexity.

“My phi­los­ophy is that in the real world, objects are con­nected together but those objects belong to dif­ferent types,” she said, pointing to humans, build­ings, and dig­ital devices as exam­ples “Even with humans we can still iden­tify dif­ferent groups.”

Instead of looking at two-dimensional rela­tion­ships in an iso­lated system, her approach brings together a series of com­plex algo­rithms that simul­ta­ne­ously address objects from mul­tiple domains and their inter­ac­tions in a much bigger, real-world envi­ron­ment. She has used the method to probe social net­works like Flickr and Twitter for sim­i­lar­i­ties and patterns.

As a grad­uate stu­dent at the Uni­ver­sity of Illi­nois at Urbana-Champaign, Sun took on the task of mining the Dig­ital Bib­li­og­raphy & Library Project’s dataset of com­puter sci­ence pub­li­ca­tions. Her hope was to unearth some inter­esting and unex­pected pat­terns, which she did.

She found that a researcher’s social con­nect­ed­ness was the most impor­tant factor for deter­mining whom he would col­lab­o­rate with in the future. She also found, thank­fully, that social con­nec­tions did not figure very highly in a researcher’s citations.

But per­haps most impor­tant, Sun found that her ques­tions were always more com­pli­cated than she had expected. For instance, auto­mat­i­cally iden­ti­fying the most highly ranked authors in the DBLP dataset might require exam­ining the ranking of the con­fer­ences they attended. But that requires auto­mat­i­cally iden­ti­fying con­fer­ence ranking, which depends on the ranking of the authors in attendance.

The problem was that the data in ques­tion make up a com­plex, het­ero­ge­neous net­work wherein each piece affects every other. If Sun wanted to trust the prod­ucts of her algo­rithm, she was going to have to under­stand the net­work it acted upon.

Sun made it her life’s work to under­stand and then design strate­gies for exam­ining het­ero­ge­neous net­works. Last year, she pub­lished the sem­inal book on the matter, Mining Het­ero­ge­neous Infor­ma­tion Net­works: Prin­ci­ples and Method­olo­gies.

The impli­ca­tions for Sun’s work are vast. In order to take advan­tage of the ter­abytes of data now describing our world, we must under­stand the com­plex net­works of which they are a part. “In the real world, there are so many dif­ferent types of objects that interact with each other,” said Sun. “The real world system can be viewed as gigantic het­ero­ge­neous infor­ma­tion network.”

Categories : Uncategorized
Northeastern University
  • My NEU
  • Find Faculty & Staff
  • Find A – Z
  • Emergency Information
  • Search

360 Huntington Ave. Boston, Massachusetts 02115 • 1 (617) 373-2000

© 2013 Northeastern University

  • twitter
  • facebook
  • youtube