CS7775 Fall 2016: Seminar in computer security

CS 7775: Seminar in computer security

“Security analytics: Applications of machine learning in cybersecurity”

Class Information

Calendar

Additional Reading

Other Resources

Instructor:

Instructor: Alina Oprea (alinao).

Class Schedule: Tuesday and Friday 9:50-11:30am, Ryder Hall 155.

Office Hours: Tuesday, 11:30am-12:30pm, WVH 348

Class description:

“Big-data” analytics has enabled a number of compute-intensive applications (such as machine translation, speech recognition and precision medicine) with large positive impact to our daily lives. Not surprisingly, “security analytics”, the application of machine learning and data mining in the field of cyber security, is effective as well in learning and predicting attacker behavior, detecting malicious infrastructures and designing more effective defensive techniques. This class will cover various practical applications of machine learning techniques in network security, web security, malware detection and usable authentication.

Compared to other areas benefiting from machine learning, security applications exhibit additional challenges due to limited availability of attack datasets, difficulty of validating new findings, high cost of false positives, and the risk of potential adversarial tampering with the datasets and models. The course will also discuss directions for addressing these challenges and include advanced topics in the areas of adversarial machine learning and privacy-preserving analytics.

We will be reading and discussing recent research papers from security and machine learning conferences. A major component of the class is a research project conducted in a small team of 1-2 students. A detailed project report suitable for a workshop submission is expected at end of class.

Pre-requisites:

· Fundamental Networking

· Introductory security preferable

· Basic data mining preferable

Grading

The grade will be based on:

- Class participation – 20%

· Participation in discussing the papers in class

· Leading the discussion for several papers

- Paper summaries - 20%

· Submit paper summaries before class

· Detailed comments on weaknesses, strengths and contributions

- Research project - 60%

· 10 % project proposal - Due 10/04

· 30% final project report

· 20% presentation in class

Paper summaries

Reading will be assigned for each lecture. The day before lecture (at midnight), every student must submit a report for each assigned paper. The report should contain a one-paragraph summary of the paper, description of three strong points of the paper and three weak points of the paper, discussion on data collection and machine learning methodology. Instructor will provide the template for paper summaries.

Please send the reports in Piazza.

Project

Project proposal (maximum 3 pages) should include:

- Problem addressed by the project

- Proposed approach

- Milestones (main steps and timeline)

- References: additional literature survey that you intend to do

- Tools: software, packages

- Data sources: publicly available datasets for your research

- Deliverable items: implementation, simulation results, graphs, visualizations, etc.

Project final report (10-12 pages) should include:

- Motivation of addressed problem

- Description of public dataset used

- Proposed solution/algorithm including technical details

- Comparison with related work

- Experimental results

Calendar

Unit	Week	Date	Topic	Readings
Introduction	1	Fri 09/09	Course outline (syllabus, grading, policies) Overview of modern attacks and their evolution	Chapter 2 of [ISL] book, pages 15-42
	2	Tue 09/13	Introduction to data science (classification, clustering, graph mining) Challenges of using machine learning in security applications	Outside the Closed World: On Using Machine Learning For Network Intrusion Detection. R. Sommer and V. Paxson
Malicious web sites		Fri 09/16	Detection of malicious domains on the web using classification techniques	Beyond Blacklists: Learning to Detect Malicious Web Sites from Suspicious URLs. J. Ma, L. K. Saul, S. Savage, and G. M. Voelker EXPOSURE: Finding Malicious Domains Using Passive DNS Analysis. L. Bilge, E. Kirda, C. Kruegel, and M. Balduzzi Discussion Lead: Alina Oprea
	3	Tue 09/20	Detection of DGA (Domain Generation Algorithm) malware using clustering	From Throw-Away Traffic to Bots: Detecting the Rise of DGA-Based Malware. M. Antonakakis, R. Perdisci, Y. Nadji, N. Vasiloglou, S. Abu-Nimeh, W. Lee, and D. Dagon Discussion Lead: Sajjad Arshad
		Fri 09/23	Detection of spam-related websites using link analysis on Web graph	Combating Web spam with TrustRank. Z. Gyongyi, H. Garcia-Molina, and J. Pedersen Discussion Lead: Muhammad Ahmad Bashir
Botnets and malicious infrastructures	4	Tue 09/27	Detection of botnets using unsupervised learning	Trafﬁc Aggregation for Malware Detection. T.-F. Yen and M. Reiter Discussion Lead: Can Gemicioglu BotMiner: Clustering Analysis of Network Trafﬁc for Protocol- and Structure-Independent Botnet Detection. G. Gu, R. Perdisci, J. Zhang, and W. Lee Discussion Lead: Andrea Mambretti
		Fri 09/30	Anomaly Detection: A data science perspective DNS domain abuse detection through supervised learning.	Tutorial with R examples. Presenter Sri Krishnamurthy PREDATOR: Proactive Recognition and Elimination of Domain Abuse at Time-Of-Registration. S. Hao, A. Kantchelian, B. Miller, V. Paxson and N. Feamster Discussion Lead: Deepanjan Basu
	5	Tue 10/04	Evolution of malicious infrastructures Project proposal presentations	Automatically Inferring the Evolution of Malicious Activity on the Internet. S. Venkataraman, D. Brumely, S. Sen and O. Spatscheck Discussion Lead: Praveen Keshava
		Fri 10/07	Graph analysis for detecting peer-to-peer botnets Project proposal presentations	BotGrep: Finding P2P Bots with Structured Graph Analysis. S. Nagaraja, P. Mittal, C.-Y. Hong, M. Caesar, and N. Borisov Discussion Lead: Hridam Basu
	6	Tue 10/11	Topological relations among hosts in malicious web infrastructure	Finding the Linchpins of the Dark Web: a Study on Topologically Dedicated Hosts on Malicious Web Infrastructures. Z. Li, S. Alrwais, Y. Xie, F. Yu and X. Wang Optional reading: Shady Paths: Leveraging Surfing Crowds to Detect Malicious Web Pages. G. Stringhini, C. Kruegel, and G. Vigna Discussion Lead: Matthew Jagielski
Malware detection and protection		Fri 10/14	Detection of malware delivery and infection	Nazca: Detecting Malware Distribution in Large-Scale Networks. L. Invernizzi, S. Miskovic R. Torres, S. Saha, S.-J. Lee, M. Mellia, C. Krueger and G. Vigna Discussion Lead: Jack Doerner The Dropper Effect: Insights into Malware Distribution with Downloader Graph Analytics. B.J. Kwon, M. Mondal, J. Jang, L. Bilge and T. Dumitras Discussion Lead: Andrea Mambretti
	7	Tue 10/18	Reputation-based detection of malicious files	CAMP: Content-Agnostic Malware Protection. M. A. Rajab, L. Ballard, N. Lutz, P. Mavrommatis, and N. Provos Discussion Lead: Muhammad Ahmad Bashir Guilt by Association: Large Scale Malware Detection by Mining File-relation Graphs. A. Tamersoy, K. Roundy and D. H. Chau Discussion Lead: Supraja Krishnan
		Fri 10/21	No class: Alina out of town
Abuse and fraud in social networks	8	Tue 10/25	Sybil (fake) accounts and large groups of synchronized activities	You Are How You Click: Clickstream Analysis for Sybil Detection. G. Wang, T. Konolige, C. Wilson, H. Zheng and B. Y Zhao Discussion lead: Deepanjan Basu Uncovering Large Groups of Active Malicious Accounts in Online Social Networks. Q. Cao, X. Yang, J. Yu and C. Palow Discussion lead: Ahmet Ozcan
		Fri 10/28	Compromise of legitimate accounts	COMPA: Detecting Compromised Accounts on Social Networks. M. Egele, G. Stringhini, C. Kruegel_, and G. Vigna Discussion Lead: Praveen Keshava Consequences of Connectivity: Characterizing Account Hijacking on Twitter. K. Thomas, F. Li, C. Grie and V. Paxson Discussion Lead: Sri Krishnamurthy
Enterprise log analytics	9	Tue 11/01	Detection of command-and-control traffic in enterprise networks Project checkpoint	ExecScent: Mining for New C&C Domains in Live Networks with Adaptive Control Protocol Templates. T. Nelms, R. Perdisci and M. Ahamad Discussion Lead: Can Gemicioglu
		Fri 11/04	Security log analytics for enterprise breach detection Project checkpoint	Operational security log analytics for enterprise breach detection. Z Li and A. Oprea Discussion Lead: Alina Oprea
Behavior-based authentication	10	Tue 11/08	Implicit authentication by learning typical user profiles over time	Implicit Authentication through Learning User Behavior. E. Shi, Y. Niu, M. Jakobsson, and R. Chow Discussion Lead: Mukund Sarma Progressive authentication: deciding when to authenticate on mobile phones. O. Riva, C. Qin, K. Strauss and D. Lymberopoulos Discussion Lead: Supraja Krishnan
		Fri 11/11	Vacation: Veterans’ Day
	11	Tue 11/15	Probabilistic model for measuring user authenticity at login time	Who Are You? A Statistical Approach to Measuring User Authenticity. D. M. Freeman, S. Jain, M. Durmuth, B. Biggio and G. Giacinto Discussion Lead: Sevtap Duman Fast, Lean, and Accurate: Modeling Password Guessability Using Neural Networks. W. Melicher, B. Ur, S. Segreti, S. Komanduri, L. Bauer, N. Christin, and L. Cranor Discussion Lead: Ahmet Ozcan
Privacy-preserving analytics		Fri 11/18	Large-scale privacy-preserving regression	Privacy-Preserving Ridge Regression on Hundreds of Millions of Records. V. Nikolaenko, U. Weinsberg, S. Ioannidis, M. Joye, D. Boneh, N. Taft Discussion Lead: Jack Doerner
	12	Tue 11/22	Classification on encrypted data	Machine Learning Classification over Encrypted Data. R. Bost, R. A. Popa, S. Tu and S. Goldwasser Discussion Lead: Hridam Basu
		Fri 11/25	Vacation: Thanksgiving
Adversarial machine learning	13	Tue 11/29	Terminology and evasion attacks against classification systems	Adversarial Machine Learning. L. Huang, A.D. Joseph, B. Nelson, B. I. Rubinstein and J. D. Tygar Discussion Lead: Alina Oprea Practical Evasion of a Learning-Based Classifier: A Case Study. N. Srndic and P. Laskov Discussion Lead: Matthew Jagielski
		Fri 12/2	Poisoning and model inversion attacks.	ANTIDOTE: Understanding and Defending against Poisoning of Anomaly Detectors. B. Rubinstein, B. Nelson, L. Huang, AD. Joseph, S. Lau, S. Rao, N. Taft and J. D. Tygar Discussion Lead: Alina Oprea Privacy in Pharmacogenetics: An End-to-End Case Study of Personalized Warfarin Dosing. M. Fredrikson, E. Lantz, S. Jha, S. Lin, D. Page and T. Ristenpart Discussion Lead: Matthew Jagielski
Final research project presentation	14	Tue 12/6	Research project presentations
		Fri 12/9	Research project presentations

Projects

Additional reading

Anomaly Detection: A Survey. V. Chandola, A. Banerjee, and V. Kumar

Other resources

Books:

[ISL] An introduction to statistical learning with applications in R. G. James, D. Witten, T. Hastie and R. Tibshirani

[ESL] The elements of statistical learning. Data mining, Inference, and Prediction. T. Hastie, R. Tibshirani and J. Friedman