Northeastern University
College of Computer and Information Science

Contact Us

  • Contact Us

Search

  • Explore CCIS
    • About the College
      • Dean’s Message
    • Undergraduate Programs
      • Advising
      • Degree Programs
      • Minor in Computer Science
      • Minor in Information Science
      • Tutoring
      • Scholarships
      • Student Awards
    • Graduate Programs
      • Degree Programs
      • Current Students
    • Co-op
    • People and Organizations
      • Faculty
      • Administrative Staff
      • Student Organizations
    • Contact Us
    • Research
      • Research Groups
      • Centers and Institutes
    • Technical Help
  • Prospective Students
  • Current Students
  • Alumni
  • Employers
Layout Image
  • About the College
    • Dean’s Message
    • CCIS Videos
  • Undergraduate Programs
    • Advising
    • Degree Programs
    • Minor in Computer Science
    • Minor in Information Science
    • Scholarships
      • Bradley E. Bailey Scholarship
      • Darwin Scholarship
      • Jane K. Wenzinger Scholarship Fund
      • Department of Defense Information Assurance Scholarship Program
      • NSF Federal Cyber Service: Scholarship for Service
    • Student Awards and Research
    • Tutoring
  • Graduate Programs
    • Degree Programs
      • Ph.D. in Computer Science
        • Admission Requirements
        • Academic Requirements
        • Time and Time Limitation
        • Transfer Credit
        • Approved Courses
        • Electives Outside the College
        • Specimen Curriculum
        • Academic Review Process
      • Ph.D. in Information Assurance
        • Admissions Requirements
        • Academic Requirements
        • Time and Time Limitation
        • Transfer Credit
        • Specimen Curriculum
        • Program Faculty
        • Contact Us
      • Ph.D. in Personal Health Informatics
      • M.S. in Computer Science
        • Admissions Requirements
        • Academic Requirements
        • Academic Probation
        • Time and Time Limitation
        • Transfer Credit
        • Approved Courses
        • Specimen Academic Schedule
        • Reading and Project Courses
        • Master’s Thesis
        • Request More Information
      • M.S. in Information Assurance
        • Admissions Requirements
        • Academic Requirements
        • Specimen Academic Schedule
        • Financial Aid and Scholarships
        • Faculty
        • Request More Information- MSIA
      • M.S. in Health Informatics
        • Program Overview
        • Master’s Degree
        • Certificates
        • Course Descriptions
        • Testimonials
        • Faculty
        • Careers
        • Student Profiles
        • Apply
        • Request More Information- MSHI
      • ALIGN
    • Apply
    • Scholarships
    • FAQ
    • Current Students
      • Course Descriptions
      • Course Schedules
      • Graduate Guidebook
      • Commencement
      • Forms
      • Travel Support
      • Wiki
      • Jobs
      • New Student Page
        • MyNeu Account
        • Course Registration
        • Health Insurance Requirements
        • ISSI Orientation
        • CCIS Orientation
        • CCIS Email Account
        • Paying Your Bill
        • Husky ID Cards
        • Online Learning
        • Housing
        • Parking
        • Public Transportation
  • Research
    • Research Groups
      • Algorithms and Theory
      • Artificial Intelligence
      • Data
      • Educational Research
      • Formal Methods
      • Game Design
      • Network Science
      • Personal Health Informatics
      • Programming Languages
      • Security
      • Software Engineering
      • Systems
    • Centers and Institutes
  • Co-op
    • Information for Students
      • FAQ
      • Information for New Students
      • Information for Upperclass Students
      • Information for Graduate Students
      • Prospective
      • Forms
    • Information for Employers
    • Co-op Manual
      • Steps to Finding A Job
      • Taking a Course
      • Academic Standards
    • Research & Data
      • Assessment
    • Calendar
    • Surveys & Evaluations
      • Student Evaluation
      • Employer Evaluation
  • People and Organizations
    • Faculty
    • Administrative Staff
    • Student Organizations
  • News & Events
    • News Archive
    • Events
    • Distinguished Speakers Series

By bironje
Thursday, August 23rd, 2012

h

Web­sites like Face­book, LinkedIn and other social-​​media net­works con­tain mas­sive amounts of valu­able public infor­ma­tion. Auto­mated web tools called web crawlers sift through these sites, pulling out infor­ma­tion on mil­lions of people in order to tailor search results and create tar­geted ads or other mar­ketable content.

But what hap­pens when “the bad guys” employ web crawlers? For Engin Kirda, Sy and Laurie Stern­berg Inter­dis­ci­pli­nary Asso­ciate Pro­fessor for Infor­ma­tion Assur­ance in the Col­lege of Com­puter and Infor­ma­tion Sci­ence and the Depart­ment of Elec­trical and Com­puter Engi­neering, they then become tools for spam­ming, phishing or tar­geted Internet attacks.

“You want to pro­tect the infor­ma­tion,” Kirda said. “You want people to be able to use it, but you don’t want people to be able to auto­mat­i­cally down­load con­tent and abuse it.”

Kirda and his col­leagues at the Uni­ver­sity of California–Santa Bar­bara have devel­oped a new soft­ware call Pub­Crawl to solve this problem. Pub­Crawl both detects and con­tains mali­cious web crawlers without lim­iting normal browsing capac­i­ties. The team joined forces with one of the major social-​​networking sites to test Pub­Crawl, which is now being used in the field to pro­tect users’ information.

Kirda and his col­lab­o­ra­tors pre­sented a paper on their novel approach at the 21st USENIX Secu­rity Sym­po­sium in early August. The article will be pub­lished in the pro­ceed­ings of the con­fer­ence this fall.

In the cyber­se­cu­rity arms race, Kirda explained, mali­cious web crawlers have become increas­ingly sophis­ti­cated in response to stronger pro­tec­tion strate­gies. In par­tic­ular, they have become more coor­di­nated: Instead of uti­lizing a single com­puter or IP address to crawl the web for valu­able infor­ma­tion, efforts are dis­trib­uted across thou­sands of machines.

“That becomes a tougher problem to solve because it looks sim­ilar to benign user traffic,” Kirda said. “It’s not as straightforward.”

Tra­di­tional pro­tec­tion mech­a­nisms, like a CAPTCHA, which oper­ates on an indi­vidual basis, are still useful, but their deploy­ment comes at a cost: Users may be annoyed if too many CAPTCHAs are shown. As an alter­na­tive, non­in­tru­sive approach, Pub­Crawl was specif­i­cally designed with dis­trib­uted crawling in mind. By iden­ti­fying IP addresses with sim­ilar behavior pat­terns, such as con­necting at sim­ilar inter­vals and fre­quen­cies, Pub­Crawl detects what it expects to be dis­trib­uted web-​​crawling activity.

Once a crawler is detected, the ques­tion is whether it is mali­cious or benign. “You don’t want to block it com­pletely until you know for sure it is mali­cious,” Kirda explained. “Instead, Pub­Crawl essen­tially keeps an eye on it.”

Poten­tially mali­cious con­nec­tions can be rate-​​limited and a human oper­ator can take a closer look. If the oper­a­tors decide that the activity is mali­cious, IPs can also be blocked.

In order to eval­uate the approach, Kirda and his col­leagues used it to scan logs from a large-​​scale social net­work, which then pro­vided feed­back on its suc­cess. Then, the social net­work deployed it in real time, for a more robust eval­u­a­tion. Cur­rently, the social net­work is using the tool as a part of its pro­duc­tion system. Going for­ward, the team expects to iden­tify areas where the soft­ware could be evaded and make it even stronger.

Categories : Uncategorized
Northeastern University
  • My NEU
  • Find Faculty & Staff
  • Find A – Z
  • Emergency Information
  • Search

360 Huntington Ave. Boston, Massachusetts 02115 • 1 (617) 373-2000

© 2013 Northeastern University

  • twitter
  • facebook
  • youtube