The Skinny - IS1320 Information Retrieval - Spring 2003

Professor Futrelle

A constantly updated list of important items and some odds and ends

Version of 24 March 2003

If perhaps you don't know what "The skinny" means, check out Evan Morris' discussion of the phrase.

Making your web page private
"Logging in" to a web server
Simple Java code to download web pages

#1. 3/12/03 Making your web page private
Your web page for your course assignments should be set up to only be accessible by us, Professor Futrelle and the teaching assistant. This is easy enough to do on our Solaris systems and is described in a separate document.
#2 3/15/03 "Logging in" to a web server
Working in a terminal session you can execute the command
telnet 80
This gets you to the HTTP (Web) server port on the system. You then execute the command (using upper case as shown):
GET /home/futrelle/tiny.html HTTP/1.0
and enter return twice. What you will then see is the reply from the web server shown below. This is basically what a browser is doing when you type in a URL or click on a link on a page. You can get the page in your browser by clicking this URL:
HTTP/1.1 200 OK
Date: Sat, 15 Mar 2003 18:48:18 GMT
Server: Apache/1.3.14 (Unix) mod_perl/1.24_01 mod_ssl/2.7.1 OpenSSL/0.9.6
Last-Modified: Tue, 31 Dec 2002 18:19:19 GMT
ETag: "15319e-28-3e11dfa7"
Accept-Ranges: bytes
Content-Length: 40
Connection: close
Content-Type: text/html

Hi there.

Connection closed by foreign host.
#3 3/15/03 Simple Java code to download web pages
Here is a Java program that you can study, compile and use. It downloads certain information about a web page and prints it out. The web page that's hardwired into the code happens to be Click this link for the java source. You can then paste it into a file and compile and run it. (Your browser may show you the code or may download it as a file. If you have problems, download this copy with the extension .txt, which should have no problems.) With slight variations, this code can be used to download the entire contents of a web page which can then be analyzed for content or for additional links to other pages, images, etc.

Go to IS1320 home page

Return to Prof. Futrelle's home page