Project 4: Raw Sockets
This project is due at 11:59pm on March 25, 2013.
DescriptionThe goal of this assignment is to familiarize you with the low-level operations of the Internet protocol stack. Thus far, you have had experience using sockets and manipulating application level protocols. However, working above the operating system's networking stack hides all the underlying complexity of creating and managing IP and TCP headers. Your task is to write a program called rawhttpget that takes a URL on the command line and downloads the associated web page. You may use any HTTP code that you wrote for project 2 to aid in the process. However, your program must use an SOCK_RAW/IPPROTO_RAW socket, which means that you are responsible for building the IP and TCP headers in each packet. In essence, you will be rebuilding the operating system's network stack within your application.
WARNING: DO NOT TEST YOUR PROGRAM ON PUBLIC WEBSITESAs with project 2, you should not test your program against public web servers. Only test your program against pages hosted by CCIS. For example, you can test your program against Fakebook, or against this assignment page. When you are testing your program, you will almost certainly send packets with invalid IP and TCP headers. These packets may trigger security warnings if you send them to public websites, e.g. the website adiinistrator may think someone is trying to hack their site with spoofed packets.
High Level RequirementsYou goal is to write a program called rawhttpget that takes one command line parameter (a URL), downloads the associated web page or file, and saves it to the current directory. The command line syntax for this program is:
An example invocation of the program might look like this:
This would create a file named project4.html in the current directory containing the downloaded HTML content. If the URL ends in a slash ('/') or does not include any path, then you may use the default filename index.html. For example, the program would generate a file called index.html if you ran the following command:
Since the point of this assignment is not to focus on HTTP, there are many things your program does not need to handle. Your program does not need to support HTTPS. Your program does not need to follow redirects, or handle HTTP status codes other than 200. In the case of a non-200 status code, print an error to the console and close the program. Your program does not need to follow links or otherwise parse downloaded HTML.
Low Level RequirementsThe primary challenge of this assignment is that you must use a raw socket. A raw socket is a special type of socket that bypasses some (or all) of the operating system's network stack. For example, in C a socket of type SOCK_RAW/IPPROTO_RAW bypasses the operating system's IP and TCP/UDP layers. This is the type of socket you must use in this assignment. There are also raw sockets of type SOCK_RAW/IPPROTO_TCP, which only bypasses the operating system's IP layer. This type of socket may be useful during your development, i.e. start by implementing TCP on top of an SOCK_RAW/IPPROTO_TCP socket, then once you have TCP working, switch to a SOCK_RAW/IPPROTO_RAW socket and implement IP.
There are many tutorials online for doing raw socket programming. I recommend Silver Moon's tutorial as a place to get started. That tutorial is in C; Python also has native support for raw socket programming. However, not all languages support raw socket programming. Since many of you program in Java, I will allow the use of the RockSaw Library, which enables raw socket programming in Java.
Your program must implement all features of IP packets. This includes setting the correct version, header length and total length, protocol identifier, and checksum in each packet. Obviously, you will also need to correctly set the source and destination IP in each packet. You may use existing OS APIs to query for the IP of the remote HTTP server (i.e. handle DNS requests) as well as the IP of the source machine. Your code must be defensive, i.e. you must check the validity of IP headers from the remote server. Is the remote IP correct? Is the checksum correct? Does the protocol identifier match the contents of the encapsulated header?
TCP is much too complicated for you to completely implement on your own (e.g. RTT and RTO estimation). Thus, you may implement a bare-bones version of TCP. Your implementation must be able to handle all basic TCP features, including: choosing a valid local port, managing sequence and acknowledgement numbers, performing connection setup and tear-down, and calculating the offset and checksum in each packet. Your code may manage the advertised window in any way you want. As with IP, your code must be defensive: check to ensure that all incoming packets have valid checksums and in-order sequence numbers. If your program does not receive any data from the remote server within a few minutes, your program can assume that the connection has failed or timed-out.
Your code does not need to implement all of TCP's reliable, in-order delivery features. Obviously, your code must ACK all data received from the remote server. If you program receives a corrupt packet (i.e. checksum failure), an out-of-order packet, or a duplicate packet, you may send a TCP RST (Reset) to the server to close the connection, print an error to the console, and exit. In essence, your program does not need to handle or recover from any abnormalities in the data stream; it can simply abort the connection and close. Your code should include basic time-out functionality, i.e. if a packet has not been ACKed within a minute or so, assume that it is lost and retransmit. You may implement more of TCP's reliability mechanisms if you wish.
Your code does not need to implement RTO estimation, or congestion control. Rather than use a complicated sliding window approach, the simplest alternative is to use a stop-and-go approach: only send one packet at a time, and wait for it to be ACKed before sending another packet. In other words, you may implement a version of TCP that uses a fixed cwnd of 1 (i.e. no slow start or AIMD). You may develop a more performant, sliding window mechanism if you wish.
Debugging raw socket code can be very challenging. You will need to get comfortable with Wireshark in order to debug your code. Wireshark is a packet sniffer, and can parse all of the relevent fields from TCP/IP headers. Using Wireshark, you should be able to tell if you are formatting outgoing packets correctly, and if you are correctly parsing incoming packets.
You can write your code in whatever language you choose, as long as your code compiles and runs on unmodified CCIS Linux machines on the command line. It is extremely important that you test your code on the CCIS machines: raw socket programming is very challenging, and not all operating systems handle raw sockets the same way. For example, you must have administrator access to use raw sockets on many Windows versions, but you do not need root privileges on Linux.
Also, be aware that many languages do not support development using raw sockets. I am making an explicit exception for Java, allowing the use of the RockSaw library. If you wish to program in a language (other than Java) that requires third party library support for raw socket programming, ask me for permission before you start development.
As usual, do not use libraries that are not installed by default on the CCIS Linux machines (with the exception of RockSaw). Similarly, your code must compile and run on the command line. You may use IDEs (e.g. Eclipse) during development, but do not turn in your IDE project without a Makefile. Make sure you code has no dependencies on your IDE.
Submitting Your ProjectBefore turning in your project, you and your partner(s) must register your group. To register yourself in a group, execute the following script:
$ /course/cs5700sp13/bin/register project4 [team name]This will either report back success or will give you an error message. If you have trouble registering, please contact the course staff. You and your partner(s) must all run this script with the same [team name]. This is how we know you are part of the same group.
To turn-in your project, you should submit your (thoroughly documented) code along with three other files:
- A Makefile that compiles your code.
- A plain-text (no Word or PDF) README file. In this file, you should briefly describe your high-level approach, what TCP/IP features you implemented, and any challenges you faced.
- If your code is in Java, you must include a copy of the RockSaw library.
$ /course/cs5700sp13/bin/turnin project4 [project directory][project directory] is the name of the directory with your submission. The script will print out every file that you are submitting, so make sure that it prints out all of the files you wish to submit! The turn-in script will not accept submissions that are missing a README or a Makefile. Only one group member needs to submit your project. Your group may submit as many times as you wish; only the last submission will be graded, and the time of the last submission will determine whether your assignment is late.
GradingThis project is worth 16 points. You will receive full credit if 1) your code compiles, runs, and produces the expected output, 2) you have not used any illegal libraries, and 3) you use the correct type of raw socket. All student code will be scanned by plagarism detection software to ensure that students are not copying code from the Internet or each other.
5 points will be awarded for each of the three protocols you must implement, i.e. 5 points for HTTP, 5 ponts for TCP, and 5 points for IP. 1 point will be awarded for your documentation. Essentially, 6 points should be easy to earn; the other 10 are the challenge.
Extra CreditThere is an opportunity to earn 4 extra credit points on this assignment. To earn these points, you must use and AF_PACKET raw socket in your program, instead of a SOCK_RAW/IPPROTO_RAW socket. An AF_PACKET raw socket bypasses the operating systems layer-2 stack as well at layers 3 and 4 (TCP/IP). This means that your program must build Ethernet frames for each packet, as well as IP and TCP headers. You can assume that we will only test your code on machines with Ethernet connections, i.e. you do not need to worry about alternative layer-2 protocols like Wifi or 3G. This extra credit will be quite challenging, since it will involve doing MAC resolution with ARP requests. We have not discussed ARP in class, and you will need to learn about and handle this protocol on your own. Essentially, the challenge is to figure out what is the MAC address of the gateway, since this information needs to be included in the Ethernet header.
If you complete the extra credit, make sure to mention this in your README. Explain how you implemented Ethernet functionality, and any additional challenges your faced (e.g. ARP).