Two-hour Review of Java
for sophomore-level programmers

****************************************************************

Primary reference.

James Gosling, Bill Joy, Guy Steele, Gilad Bracha.
The Java Language Specification, Third Edition.

Online at
http://java.sun.com/docs/books/jls/third_edition/html/j3TOC.html

[The Third Edition describes the version 5.0 release of 2004.
These notes, however, describe Java as it was specified by the
Second Edition, version 1.4.x and older.]

****************************************************************

Outline of review.

Values
Types
Names
Program structure
Execution rules
Common mistakes

****************************************************************

Values.

A value is a datum that can be stored in a variable, passed as
an argument, returned by a method, operated upon.  In general,
we describe a value by writing an expression, and a large part
of the computer's job is to compute the values of expressions.

The values of Java fall into two major categories, primitive and
reference.  These categories can be divided further into
subcategories as follows:

    primitive values
        boolean values
            true
            false
        numeric values
            integral values
                byte values                       (8-bit signed)
                short values                     (16-bit signed)
                int values                       (32-bit signed)
                long values                      (64-bit signed)
                char values                    (16-bit unsigned)
            floating point values
                float values             (IEEE single precision)
                double values            (IEEE double precision)
    reference values
        null
        pointers to class instances
        pointers to arrays

In Java's official terminology, an object is defined as a class
instance or an array.  Officially, therefore, a pointer to an
object is a value, but an object is not itself a value.

Unofficially, just about everyone ignores this distinction.  It's
just too much to have to say "pointer to an object" all the time,
when we could say "an object" instead.  When we speak as though a
Java object were a value, what we mean is that a pointer to the
object is a value.

We couldn't afford to do that if we were C++ programmers, because
the values of C++ include both objects and pointers to objects,
and the distinction between objects and pointers is essential to
understanding C++.  In Java, however, the distinction between
objects and pointers is not very important, and we can ignore it
most of the time.

****************************************************************

Objects.

In Java, an object is like a piece of paper.  Several things are
written on that piece of paper in indelible ink:

    an identification number
    lines that divide the paper into locations
    the type of each location

In addition, each location contains a value, which is normally
written in pencil.  So long as the value is written in pencil,
the value written in a location can be changed by erasing it
and writing another value in its place.

The type of a location is just a constraint on what kinds of
values can be written into the location.  If the type of a
location is int, for example, then programs are not allowed to
write a boolean value into that location.

In Java, objects are created by evaluating a new expression.

    new int[15];              // allocates a new array
    new Vector();             // allocates a new instance

Evaluating a new expression is like taking a clean sheet of
paper of the appropriate size and using indelible ink to

    write a unique identification number, different from the
        identification numbers of all other objects
    divide the paper into locations
    write the type of each location

and then using pencil to write an appropriate value into each
location.

For example, the Java system would write 15 into one location
of a 15-element array to indicate its length, and would write
0 into each of the 15 element locations.

Unofficially, I use the word "allocation" to refer to the part
of this process that uses indelible ink, and the word
"construction" to refer to the part that uses pencil.

After the object has been allocated and constructed, the new
expression returns a pointer to the object.

Java programs typically use an object for a while and then
forget about it completely.  Part of the computer's job is to
locate objects that the program has forgotten, and to recycle
their paper by turning it into clean blank paper that can be
used to create new objects.  Their forgotten identification
numbers can be recycled as well.  For historical reasons, these
recycling processes are known as garbage collection.

****************************************************************

Types.

Types serve several distinct purposes in Java.  On one level,
types constrain the set of values that can be written into a
location, passed to a method, or used in some other context.
These constraints make programming easier by improving

    reliability
    readability
    performance
    ease of expression (we'll come back to this)

In general, two Java types T1 and T2 are the same if and only
if they name the same primitive type, the same class, or the
same interface.  This can be a little confusing because Java
allows the names of classes and interfaces to be abbreviated.

Example.  Math and java.lang.Math usually name the same type.

Example.  If a file of Java code imports java.util.Collection,
then Collection and java.util.Collection name the same interface
type within that file.

The types of Java are

    primitive types
        boolean
        numeric types
            integral types
                byte
                short
                int
                long
                char
            floating point types
                float
                double
    reference types
        null type
        class types
        interface types
        array types

Intuitively, we can often regard each type as the subset of
all Java values that is allowed by the type constraint.  For
example, we can regard the int type as standing for the set
of all int values.  In general, each of the primitive types
stands for the set of all primitive values of that type.

The reference types are more interesting, and therefore more
confusing.  For example, there is a one-to-one correspondence
between classes and class types, but classes are not at all
the same as class types.

The null type stands for the set that contains only the null
value.

Each class type A stands for the set of values

    { null } \union
    { p | B is a subclass of A,
            and p is a pointer to an instance of B }

Example.  If classes D and E extend class C, and classes B and
C extend class A, and these are the only classes that extend A
or B or C or D or E, then the type A stands for the set of
values that includes null together with all pointers to an
instance of classes A, B, C, D, or E.

Each interface type T stands for the union of all class types
A such that class A was declared to implement the interface T.

Example.  If T is a Java interface, and classes A and B are
declared to implement T, and these are the only classes that
implement T, then T stands for the union of the sets of values
represented by the types A and B.

Each array type T[] stands for the set of arrays whose elements
are constrained to hold values of type U, where U is a subtype
of T.

U is a subtype of T if and only the set of values for which U
stands is a subset of the values for which T stands.

Warning:  This isn't quite right.  The truth is much more
complicated than this, but most Java programmers will never
need to know the real truth.

Example.

public interface Shape {
    double area ();
}

class Circle implements Shape { ... }
class Rectangle implements Shape { ... }
class Square extends Rectangle implements Shape { ... }
class Triangle extends Rectangle implements Shape { ... }

With these declarations,

    Circle is a subtype of Shape
    Rectangle is a subtype of Shape
    Square is a subtype of Shape
    Triangle is a subtype of Shape

    Square is a subtype of Rectangle
    Triangle is a subtype of Rectangle

The last of these subtyping relationships may be a bad idea, but
Java allows programmers to express bad ideas as well as good.

****************************************************************

Variables.

In Java, a variable is defined as the combination of a location
together with the type that constrains the values that can be
written into the location.

****************************************************************

Names.

In Java, a name is either a single identifier or a qualified
name, which consists of a sequence of identifiers separated by
periods.

By convention, the names of classes and interfaces begin with an
upper case letter.  If the name is formed from words, the first
letter of each word is capitalized but the subsequent letters of
each word are in lower case.

Example.

    String
    TreeSet
    HashMap

By convention, the names of packages, methods, variables, and
values begin with a lower case letter.  If the name is formed
from each word, then the first letter of each word except the
first is capitalized.

Example.

    java
    java.lang
    java.util
    gui
    main
    intValue
    toArray
    length
    System.out                          (standard output stream)
    System.out.println

By convention, the names of constants and final variables are
in upper case, with words or components separated by underscores.

Example.

    Math.PI
    java.lang.Integer.MAX_VALUE

****************************************************************

Structure of a Java program.

A Java program is divided into packages, which like directories
can be nested in hierarchically.

With Sun's JDK, the package structure of a program normally
corresponds to the directory structure of the program's source
code.  That is, the default package corresponds to the directory
that holds the program's source code, and each named package
corresponds to a subdirectory whose name is exactly the same as
the name of the package to which it corresponds.  If this
correspondence is broken, the program will not compile properly;
this is a common mistake.

Example.  A student's compiler might have the following package
structure.

    <default>
        ast
            attributes
            parser
            scanner
        codegenerator
            env
            target
        typechecker
            typenv
            types

The fully qualified package names of this program would be

    <none>
    ast
    ast.attributes
    ast.parser
    ast.scanner
    codegenerator
    codegenerator.env
    codegenerator.target
    typechecker
    typechecker.typenv
    typechecker.types

Each subpackage is considered to be part of its parent package.
For example, both typechecker.typenv and typechecker.types would
be part of the typechecker package, which is itself part of the
default (top-level) package.

The Java code in each package is usually divided into files.
Each file should declare at least one interface or class.  The
name of a Java file should be the same as the name of the first
interface or class that it declares, followed by a .java suffix.

Example.

    Parser.java

****************************************************************

Compiling and running a Java application.

Every Java application must declare at least one public class
that declares a static method named main whose declaration looks
like

    public static void main (String[] args) { ... }

Suppose this method is declared in the TestShape class, which is
in the default (top-level) package, and is declared in a file
named TestShape.java.  On our Unix systems, the program can be
compiled (translated) from Java to byte code by going into the
directory that corresponds to the default package and saying

    % javac TestShape.java

This creates TestShape.class and a bunch of other .class files,
one for each class that will be required to run the program.
The program is then run by saying

    % java TestShape

The argument to main will be an array of strings, one for each
of the command-line arguments.

Example.  If the application is run by saying

    % java TestShape 1 2 3

then

    args.length will be 3
    args[0] will be "1"
    args[1] will be "2"
    args[2] will be "3"

In Sun's JDK, the javac compiler will not work properly if it
is executed from within a directory that does not correspond to
the default package of the program being compiled.  This is a
common mistake.

****************************************************************

Structure of a Java file.

Files normally begin with a block comment that explains the
purpose of the code in the file, names the authors, gives the
version number, and so on.

After that comment comes a package declaration that names the
package in which the file's code belongs.  If the file is part
of the default (top-level) package, then this declaration is
omitted.

After the package declaration (if any) come a series of import
declarations that name all of the types from other packages that
are mentioned by the file's code.

After the import declarations come the class declarations.  The
name of the class that is declared by the very first class
declaration should match the name of the file.

Example.  Here is a hypothetical (untested!) file named
CodeGenerator.java.

/**
 * Interface to code generators for a Simula 67 compiler.
 * 
 * @author      Ole-Johan Dahl
 * @author      Kristen Nygaard
 * @version     %I%, %G%
 */

package codegenerator;

import java.io.PrintStream;
import ast.Ast;

public interface CodeGenerator {

    public void generateCode (Ast pgm, PrintStream out);

}

****************************************************************

Structure of a class declaration.

A class declaration describes a set of objects that have similar
behavior.  Each class declaration also creates a type whose name
is the same as the name of the class.

In its simplest form, a class declaration consists of the word
class, followed by the name of the class, followed by a pair of
matching curly braces that enclose the class body declarations.

The class declaration may also include one or more of these
modifiers before the word class:

    public
    abstract
    final

The public modifier should be used if the class is intended to
be used by code in other packages.  (With Sun's JDK, a public
class should be the first class declared within its file.)

The abstract modifier should be used if the class has any
abstract methods, and may also be used to prevent any instances
of the class from being created.

The final modifier may be used to prevent any subclasses of the
class from being declared.

Java does not allow a class to be both abstract and final.

The class declaration may also include one of these clauses
after the name of the class:

    extends <class>
    implements <interface>

An extends clause names the immediate superclass, or parent, of
the class.  An implements clause names an interface that the
class implements.  Both kinds of clause imply that the type
declared by the class declaration will be a subtype of the
class or interface type that the declaration extends or
implements.

A class may extend only one class, but it may implement more
than one interface, in which case the interfaces are separated
by commas in the implements clause.

If no extends clause is present, then the class extends the
Object class, which is the root of the Java class hierarchy.

Example.

    class C { }

    public class C { }

    abstract class C { }

    final class C { }

    public abstract class C { }

    public class C extends B { }

    public class C extends B implements CodeGenerator { }

In the last example, the type C will be a subtype of the class
type B and also a subtype of the interface type CodeGenerator.

****************************************************************

Class body declarations.

The body of a class declaration may contain declarations of the
following things:

    initializers  [rare, so I'll ignore them]
    constructors
    members

Constructors are not members.  In particular, a constructor is
not a method, even though a constructor declaration resembles a
method declaration.  Constructors are never inherited, so they
cannot be overridden or hidden within a subclass.

The visibility of a constructor or member can be controlled by
prefixing its declaration with a visibility modifier:

    modifier                            visibility
    --------                            ----------
    public                              universal
    protected                           package + subclasses
                  (no modifier)         package
    private                             class

If you explicitly say that something is protected, then it is
less protected than it would have been had you not specified
the visibility at all.  This is a peculiarity of Java.

****************************************************************

Constructor declarations.

At least one constructor is called whenever an instance of the
class is created by a new expression.  The purpose of a
constructor is to initialize the non-static members of an
instance of the class.

The name of a constructor is the same as the name of the class
in which it is declared.  A constructor declaration looks like
a method declaration, except that a constructor declaration has
no return type.

Example.

class Circle implements Shape {                 // class declaration

    Circle (double diameter) {                  // constructor
	this.diameter = diameter;
    }

    ...

}

If a class does not explicitly declare any constructors, then
the Java compiler will automatically generate a default
constructor with no arguments that does nothing except to invoke
the superclass's constructor with no arguments.  The visibility
of a default constructor is the same as the visibility of the
class.

Example.  If the class Circle did not declare any constructors,
then the effect would be the same as if it had declared

    Circle () {
	super();
    }

To prevent any instances of class Circle from being created by
code that is outside that class, a programmer can make all of
its constructors private.  To prevent the compiler from
creating a non-private default constructor, programmers can
declare an explicit private constructor with no arguments
that does nothing.

Example.

class Circle implements Shape {

    // Don't let anyone else create instances of this class!

    private Circle () { }

    ...

}

****************************************************************

Member declarations.

The members of a class are not only the members that are
explicitly declared within the class declaration, but also any
members that are inherited from the class's superclass.

The following things can be declared as members:

    variables (also known as fields)
    methods
    classes
    interfaces

Example.

    double PI = Math.PI;

    double diameter;

    double area () {
	double radius = diameter / 2.0;
	return PI * radius * radius;
    }

Members can be declared static, which means there is only one
thing that is declared by the member declaration, and that one
thing is associated with the class in which the declaration
occurs.  If a member is not declared static, then each instance
of the class will have its own member, distinct from the members
of other instances.

Exception: an interface is always static, even if it is not
declared static.

The distinction between static and non-static members is so
important that it has given rise to special terminology:

                              static               non-static

    variables             class variable       instance variable
    methods               static method        dynamic method
    classes               local class          inner class
    interface

This review will not cover inner classes.

A member may be declared final.

If a variable is declared final, then the Java compiler will
not allow its value to be changed after the variable has been
initialized.  Constructors are allowed to change the value of
a final instance variable, but methods are not.  A static final
variable should have an explicit initial value.  If that initial
value is an obvious constant, then the static final variable is
effectively a constant, and may be used in the case labels of a
switch statement, for example.

Example.

    static final int READ_ONLY  = 1;
    static final int WRITE_ONLY = 2;
    static final int READ_WRITE = 3;

So far as I can tell, there is no good reason to declare that a
static method is final.

If a dynamic method is declared final, then the method cannot be
overridden within a subclass.

If a class is declared final, then it cannot be extended by a
subclass.

****************************************************************