Data Science

Two major forces are driving research in databases and information retrieval: the exponential growth of data and new architectures. This can be seen throughout medicine, the physical sciences and the social sciences, creating an urgent need for intelligence systems to glean patterns and extract information.

Northeastern researchers are investigating important topics emerging from these forces, including new methodologies for ranking information retrieved from massive data sets and search engines for high-dimensional data. In collaboration with their campus colleagues and others, they are pursuing interdisciplinary research on ontologies for mental health and knowledge of diseases, biomedical text analysis, patterns in ornithology data and scalable tools for the analysis of social network data.

Members of the Data Science Group have expertise in machine learning, spatial indexing, data visualization, the Semantic Web and database management. Their work related to machine learning has involved building a diagnostic tool that can automatically look at patient records and learn to set rules and make predictions about diagnoses. In the area of data mining, these researchers have developed some of the most widely used search techniques.

Team Achievements

  • Collaborated with Massachusetts General Hospital to explore the benefits of applying data mining and information retrieval techniques in radiation oncology
  • Developed the “optimal location query,” a spatial indexing method that enables use of multiple criteria to find “optimal” locations within a pre-specified area, such as basing the optimal site for a business within a target area on population, demographics and similar criteria
  • Created tools to characterize and extract knowledge from biomedical literature
  • Introduced applications of ontology-based computing, including the Semantic Web, in the area of health sciences
  • Held leadership positions on the committees of major conferences, such as the Association for Computing Machinery’s International Conference on Research and Development in Information Retrieval (SIGIR), the International Conference on Management of Data (SIGMOD), the International Conference on Very Large Data Bases (VLDB), and the International Conference on Data Engineering (ICDE)
  • Developed novel approaches for tracking massive quantities of observational environmental change data in collaboration with Cornell University’s Laboratory of Ornithology
  • Awarded grants from DARPA and other federal agencies to develop models to map different writing styles and languages into common representations
  • Received International Conference on Advances in Social Networks Analysis and Mining best paper award
  • Recognized with IEEE International Conference on Data Mining best paper award