Java Sockets Tutorial

Introduction

This is a brief introduction to the Java Socket API. Java Sockets are a mechanism for communication over the Internet. All the classes discussed in this tutorial are in the java.net package.

A socket is an endpoint for communication. There are two kinds of socket, depending on whether one wishes to use a connectionless or a connection-oriented protocol. The connectionless communication protocol of the Internet is called UDP. The connection-oriented communication protocol of the Internet is called TCP. UDP sockets are also called datagram sockets.

Internet Addresses and Ports

Each socket is uniquely identified on the entire Internet with two numbers. The first number is a 32-bit integer called the Internet Address (or IP address). The second number is a 16-bit integer called the port of the socket. The IP address uniquely identifies one machine (also called a host or node) on the Internet. The new version of the Internet protocol (IP version 6 or IPv6) increases the size of the IP address to 128 bits. Java encapsulates the concept of an IP address with the class InetAddress. Java represents ports with 32-bit integers, even though a 16-bit integer would suffice.

While IP addresses uniquely identify machines, the reverse is not true. A machine can have several IP addresses. Note that each 16-bit port number actually represents two distinct ports: a UDP port and a TCP port.

Connectionless Sockets

The simplest kind of socket is a UDP socket. Such a socket is analogous to a mailbox. Data is sent and received in units called datagrams (analogous to letters and parcels) from any UDP socket to any other UDP socket. For this reason UDP sockets are also called datagram sockets. Java encapsulates the concept of a UDP socket with the class DatagramSocket, and the concept of a datagram with the class DatagramPacket.

A DatagramPacket consists of a fixed-length array of bytes together with an IP address and port. When one sends a DatagramPacket, its array of bytes is sent to the socket with the specified IP address and port (if it exists). When one receives a DatagramPacket, the data is copied into its array of bytes, and the IP address and port of the sender are copied into the IP address and port of the DatagramPacket.

Connection-Oriented Sockets

A TCP socket is analogous to (one side of) a telephone connection. TCP sockets are of two kinds: ordinary sockets and server sockets. A server socket is never used for transmission of information. Its sole purpose is to listen for incoming connection requests. When a client process wishes to make a connection with a server, it first constructs an ordinary socket and then it asks for a connection with the server. When a server socket receives a connection request, it constructs an ordinary socket with an unused port number which completes the connection. The server socket then goes back to listening for connections. Once the connection is established, the two connected sockets can communicate with each other using ordinary read and write operations in either direction.

Java encapsulates the concept of an ordinary TCP socket with the class Socket, and the concept of a server socket with the class ServerSocket. Input to and output from a socket is encapsulated in Java using the InputStream and OutputStream classes, respectively.

Connections to Web Servers

One special case of a TCP socket is a socket that is communicating with a Web server. Web servers normally listen on TCP port 80. One could construct a socket and connect directly to a Web server if one knows its IP address. However, when one connects to a Web server, it is usually because one is interested in a document. Documents on the Internet are identified in a very different manner than one uses for identifying sockets on the Internet. The standard identifier for a document on the Internet is its Universal Resource Locator (URL). Java encapsulates the concept of a URL with the class URL.

When one constructs a URL object, the URL is checked to make sure that it is a well-formed URL. Interacting with the URL requires that one first establish a connection with the Web server that is responsible for the document identified by the URL. The TCP socket for the connection is constructed by invoking the openConnection method on the URL object. This method also performs the name resolution necessary to determine the IP address of the Web server. This method returns an object of type URLConnection. The connection to the Web server is requested by calling connect connect on the URLConnection object. Input to (when defined) and output from the document identified by the URL is encapsulated in Java using the InputStream and OutputStream classes, respectively. If the URL specifies the http protocol, then the URLConnection will actually be an object of the subclass HttpURLConnection which has additional methods specific to HTML documents. Other protocols are also supported, but only http has a public subtype associated with it.


Ken Baclawski
324 WVH
College of Computer and Information Science
Northeastern University
360 Huntington Avenue
Boston, MA 02115
ken@baclawski.com
(617) 373-4631

Copyright © 1998, 2004 by Kenneth Baclawski. All rights reserved.