CS 7775: Seminar in computer security

“Security analytics: Applications of machine learning in cybersecurity”

Class Information

Calendar

Additional Reading

Other Resources

 

Instructor:

  • Instructor: Alina Oprea (alinao).

Class Schedule: Tuesday and Friday 9:50-11:30am, Ryder Hall 155.

Office Hours: Tuesday, 11:30am-12:30pm, WVH 348

Class description:

 

“Big-data” analytics has enabled a number of compute-intensive applications (such as machine translation, speech recognition and precision medicine) with large positive impact to our daily lives. Not surprisingly, “security analytics”, the application of machine learning and data mining in the field of cyber security, is effective as well in learning and predicting attacker behavior, detecting malicious infrastructures and designing more effective defensive techniques. This class will cover various practical applications of machine learning techniques in network security, web security, malware detection and usable authentication.

 

Compared to other areas benefiting from machine learning, security applications exhibit additional challenges due to limited availability of attack datasets, difficulty of validating new findings, high cost of false positives, and the risk of potential adversarial tampering with the datasets and models. The course will also discuss directions for addressing these challenges and include advanced topics in the areas of adversarial machine learning and privacy-preserving analytics.

 

We will be reading and discussing recent research papers from security and machine learning conferences. A major component of the class is a research project conducted in a small team of 1-2 students. A detailed project report suitable for a workshop submission is expected at end of class.

Pre-requisites:

·       Fundamental Networking

·       Introductory security preferable

·       Basic data mining preferable

Grading

The grade will be based on:

-       Class participation – 20%

·       Participation in discussing the papers in class

·       Leading the discussion for several papers

-       Paper summaries - 20%

·       Submit paper summaries before class

·       Detailed comments on weaknesses, strengths and contributions

-       Research project - 60%

·       10 % project proposal  - Due 10/04

·       30% final project report

·       20% presentation in class

Paper summaries

Reading will be assigned for each lecture. The day before lecture (at midnight), every student must submit a report for each assigned paper. The report should contain a one-paragraph summary of the paper, description of three strong points of the paper and three weak points of the paper, discussion on data collection and machine learning methodology. Instructor will provide the template for paper summaries.

 

Please send the reports in Piazza.

 

Project

  • Project proposal (maximum 3 pages) should include:

-        Problem addressed by the project

-        Proposed approach

-        Milestones (main steps and timeline)

-        References: additional literature survey that you intend to do

-        Tools: software, packages

-        Data sources: publicly available datasets for your research

-        Deliverable items: implementation, simulation results, graphs, visualizations, etc.

  • Project final report (10-12 pages) should include:

-        Motivation of addressed problem

-        Description of public dataset used

-        Proposed solution/algorithm including technical details

-        Comparison with related work

-        Experimental results



 Calendar

 

Unit

Week

Date

Topic

Readings

Introduction

1

Fri

09/09

Course outline (syllabus, grading, policies)

Overview of modern attacks and their evolution

Chapter 2 of [ISL] book, pages 15-42

 

2

Tue

09/13

Introduction to data science (classification, clustering, graph mining)

Challenges of using machine learning in security applications

Outside the Closed World: On Using Machine Learning For Network Intrusion Detection. R. Sommer and V. Paxson

 

Malicious  web sites

 

Fri

09/16

Detection of malicious domains on the web using classification techniques

Beyond Blacklists: Learning to Detect Malicious Web Sites from Suspicious URLs. J. Ma, L. K. Saul, S. Savage, and G. M. Voelker

 

EXPOSURE: Finding Malicious Domains Using Passive DNS Analysis. L. Bilge, E. Kirda, C. Kruegel, and M. Balduzzi

Discussion Lead: Alina Oprea

 

3

Tue

09/20

Detection of DGA (Domain Generation Algorithm) malware using clustering

From Throw-Away Traffic to Bots: Detecting the Rise of DGA-Based Malware. M. Antonakakis, R. Perdisci, Y. Nadji, N. Vasiloglou, S. Abu-Nimeh, W. Lee, and D. Dagon

 

Discussion Lead: Sajjad Arshad

 

 

Fri

09/23

Detection of spam-related websites using link analysis on Web graph

Combating Web spam with TrustRank. Z. Gyongyi, H. Garcia-Molina, and J. Pedersen

 

Discussion Lead: Muhammad Ahmad Bashir 

Botnets and malicious infrastructures

4

Tue

09/27

Detection of botnets using unsupervised learning

Traffic Aggregation for Malware Detection. T.-F. Yen and M. Reiter

 

Discussion Lead: Can Gemicioglu

 

BotMiner: Clustering Analysis of Network Traffic for Protocol- and Structure-Independent Botnet Detection. G. Gu, R. Perdisci, J. Zhang, and W. Lee

 

Discussion Lead: Andrea Mambretti

 

 

Fri

09/30

Anomaly Detection: A data science perspective

 

DNS domain abuse detection through supervised learning.

 

Tutorial with R examples.

Presenter Sri Krishnamurthy

 

PREDATOR: Proactive Recognition and Elimination of Domain Abuse at Time-Of-Registration. S. Hao, A. Kantchelian, B. Miller, V. Paxson and N. Feamster

 

Discussion Lead: Deepanjan Basu

 

 

5

Tue

10/04

Evolution of malicious infrastructures

 

Project proposal presentations

Automatically Inferring the Evolution of Malicious Activity on the Internet. S. Venkataraman, D. Brumely, S. Sen and O. Spatscheck

 

Discussion Lead: Praveen Keshava

 

 

Fri

10/07

Graph analysis for detecting peer-to-peer botnets

 

Project proposal presentations

BotGrep: Finding P2P Bots with Structured Graph Analysis. S. Nagaraja, P. Mittal, C.-Y. Hong, M. Caesar, and N. Borisov

 

Discussion Lead: Hridam Basu

 

6

Tue

10/11

Topological relations among hosts in malicious web infrastructure

Finding the Linchpins of the Dark Web: a Study on Topologically Dedicated Hosts on Malicious Web Infrastructures. Z. Li, S. Alrwais, Y. Xie, F. Yu and X. Wang

 

Optional reading:

Shady Paths: Leveraging Surfing Crowds to Detect Malicious Web Pages. G. Stringhini, C. Kruegel, and G. Vigna

 

Discussion Lead: Matthew Jagielski

Malware detection and protection

 

Fri

10/14

Detection of malware delivery and infection

Nazca: Detecting Malware Distribution

in Large-Scale Networks. L. Invernizzi, S. Miskovic R. Torres, S. Saha, S.-J. Lee, M. Mellia, C. Krueger and G. Vigna

 

Discussion Lead: Jack Doerner

 

The Dropper Effect: Insights into Malware Distribution with Downloader Graph Analytics. B.J. Kwon, M. Mondal, J. Jang, L. Bilge and T. Dumitras

 

Discussion Lead: Andrea Mambretti

 

7

Tue

10/18

Reputation-based detection of malicious files

CAMP: Content-Agnostic Malware Protection. M. A. Rajab, L. Ballard, N. Lutz, P. Mavrommatis, and N. Provos

 

Discussion Lead: Muhammad Ahmad Bashir

 

Guilt by Association: Large Scale Malware Detection by Mining File-relation Graphs. A. Tamersoy, K. Roundy and D. H. Chau

 

Discussion Lead: Supraja Krishnan

 

 

Fri

10/21

No class: Alina out of town

 

Abuse and fraud in social networks

8

Tue

10/25

Sybil (fake) accounts and large groups of synchronized activities

You Are How You Click: Clickstream Analysis

for Sybil Detection. G. Wang, T. Konolige, C. Wilson, H. Zheng and B. Y Zhao

 

Discussion lead: Deepanjan Basu

 

Uncovering Large Groups of Active Malicious Accounts in Online Social Networks. Q. Cao, X. Yang, J. Yu and C. Palow

 

Discussion lead: Ahmet Ozcan

 

 

Fri

10/28

Compromise of legitimate accounts

COMPA: Detecting Compromised Accounts on Social Networks. M. Egele, G. Stringhini, C. Kruegel_, and G. Vigna

 

Discussion Lead: Praveen Keshava

 

Consequences of Connectivity: Characterizing

Account Hijacking on Twitter. K. Thomas, F. Li, C. Grie and V. Paxson

 

Discussion Lead: Sri Krishnamurthy

Enterprise log analytics

9

Tue

11/01

Detection of command-and-control traffic in enterprise networks

 

Project checkpoint

 

ExecScent: Mining for New C&C Domains in Live Networks with Adaptive Control Protocol Templates. T. Nelms, R. Perdisci and M. Ahamad

 

Discussion Lead: Can Gemicioglu

 

 

Fri

11/04

Security log analytics for enterprise breach detection

 

Project checkpoint

Operational security log analytics for enterprise breach detection. Z Li and A. Oprea

 

Discussion Lead: Alina Oprea

Behavior-based authentication

10

Tue

11/08

Implicit authentication by learning typical user profiles over time

Implicit Authentication through Learning User Behavior. E. Shi, Y. Niu, M. Jakobsson, and R. Chow

 

Discussion Lead: Mukund Sarma

 

Progressive authentication: deciding when to authenticate on mobile phones. O. Riva, C. Qin, K. Strauss and D. Lymberopoulos

 

Discussion Lead: Supraja Krishnan

 

 

Fri

11/11

Vacation: Veterans’ Day

 

 

11

Tue

11/15

 

Probabilistic model for measuring user authenticity at login time

Who Are You? A Statistical Approach to

Measuring User Authenticity. D. M. Freeman, S. Jain, M. Durmuth, B. Biggio and G. Giacinto

 

Discussion Lead: Sevtap Duman

 

Fast, Lean, and Accurate: Modeling Password Guessability Using Neural Networks. W. Melicher, B. Ur, S. Segreti, S. Komanduri, L. Bauer, N. Christin, and L. Cranor

 

Discussion Lead: Ahmet Ozcan

Privacy-preserving analytics

 

Fri

11/18

Large-scale privacy-preserving regression

Privacy-Preserving Ridge Regression on Hundreds of Millions of Records. V. Nikolaenko, U. Weinsberg, S. Ioannidis, M. Joye, D. Boneh, N. Taft

 

Discussion Lead: Jack Doerner

 

12

Tue

11/22

Classification on encrypted data

Machine Learning Classification over Encrypted Data. R. Bost, R. A. Popa, S. Tu and S. Goldwasser

 

Discussion Lead: Hridam Basu

 

 

 

Fri

11/25

Vacation: Thanksgiving

 

Adversarial machine learning

13

Tue

11/29

Terminology and evasion attacks against classification systems

Adversarial Machine Learning. L. Huang, A.D. Joseph, B. Nelson, B. I. Rubinstein and J. D. Tygar

 

Discussion Lead: Alina Oprea

 

Practical Evasion of a Learning-Based Classifier: A Case Study. N. Srndic and P. Laskov

 

Discussion Lead: Matthew Jagielski

 

 

Fri

12/2

Poisoning and model inversion attacks.

ANTIDOTE: Understanding and Defending against Poisoning of Anomaly Detectors. B. Rubinstein, B. Nelson, L. Huang, AD. Joseph, S. Lau, S. Rao, N. Taft and J. D. Tygar

 

Discussion Lead: Alina Oprea

 

Privacy in Pharmacogenetics: An End-to-End Case Study of Personalized Warfarin Dosing. M. Fredrikson, E. Lantz, S. Jha, S. Lin, D. Page and T. Ristenpart

 

Discussion Lead: Matthew Jagielski

Final research project presentation

14

Tue

12/6

Research project presentations

 

 

 

Fri

12/9

Research project presentations

 

 

 

Projects

 

Additional reading

 

 

Anomaly Detection: A Survey. V. Chandola, A. Banerjee, and V. Kumar

 

 

Other resources

 

Books:

 

[ISL] An introduction to statistical learning with applications in R. G. James, D. Witten, T. Hastie and R. Tibshirani

 

[ESL] The elements of statistical learning. Data mining, Inference, and Prediction. T. HastieR. Tibshirani and J. Friedman