SoftInst: details

Software Packages

A software package is installed in such a way that all of the installation is contained within a single directory.

The name of that directory is [softwarename]-[version]. (If the version happens to contain a '-' it should be changed to a '_' or some other character, so that the tools can distinguish [parse] the name and version parts.)

For example, emacs version 19.22 would be installed in a directory called emacs-19.22

The software, when configured, should reference the directory name with the version number. This allows one to install a different version of a software package without having collisions between the two versions. This means that the whole compilation process must be done for each new version of software.

In this way, everything related to a software package is located in one directory, namely [software>]-[version]. There will be various references to these files, perhaps even copies of them local to machines, but for the most part the entire package is in this one directory.

Software Package Subdirectories

Each package directory should contain a set of standard directories. These can be configured differently at each site. (Suggested Subdirectories)

[SoftInst tools will be very flexible in this area, and allow sites to configure the "default" directories for a new package of a particular collection are. As well as what to do with each of these directories at the 'linking' stage.]


Collections

Using just the above scheme, it would be possible for users to get the programs that they want by putting the [package]/bin directories for each software package in their path. However, this is intractable because the path would get intolerably long (too long for some shells) and it would take an enormous amount of maintenance. Therefore we combine many software packages into a collection, defining common directories that unite those packages.

Specifically, a collection of software contains a directory called packages. This directory contains a set of all the software packages in this collection. (All in the form of [software]-[version].)

Each package in that directory that is the current version should have a symbolic link pointing to it that contains the name of the package without a version number i.e.:

  [software]  ->  [software]-[version]
By definition, the symbolic link points at the version of the software currently supported.

The software collection directory also has a directory for each of the directories that are known to be in software package directories. Thus it has a bin, a doc, an etc, and so on. Each of these directories is populated with symbolic links pointing to the various package-specific files.

As an example, consider an arbitrary collection of software, it contains the directories which users will reference:

  [collection]/bin
  [collection]/etc
  [collection]/man/man?
  [collection]/lib
  [collection]/packages
  [collection]/src
If emacs-19.25 is installed as part of this collection, then it will be in the packages subdirectory: [collection]/packages/emacs-19.25

Assuming it is the current version, it will have a symbolic link pointing to it:

  [collection]/packages/emacs  ->  emacs-19.25
The files in the emacs package will be mirrored in the collection directories, so if emacs has these files:
  [collection]/packages/emacs-19.25/bin/emacs
  [collection]/packages/emacs-19.25/bin/etags
  [collection]/packages/emacs-19.25/lib/emacs-lisp
Then these files will be created:
  [collection]/bin/emacs       -> ../packages/emacs-19.25/bin/emacs
  [collection]/bin/etags       -> ../packages/emacs-19.25/bin/etags
  [collection]/lib/emacs-lisp  -> ../packages/emacs-19.25/lib/emacs-lisp
Notice that the symbolic links are made relative, not absolute. This will typically minimize the number of file systems references that will have to be made to locate the file. Further, it allows collections of software packages to be moved as a group without depending on the specific absolute pathname of the collection.

Also notice that the symbolic links refererence the version specific pointer (i.e. "emacs-19.25", not just "emacs"). Although it would seem to make sense to link to "emacs", and then only need to change this one link on a later update, however we have decided that since this is automated by tools, they will be able to resolve uneeded symbolic links. Also, an update to an application does not always contain the same programs/binaries, thus it makes more sense to use the tools to "un-link" the previous version, and "link" the new version.


Architecture Specific Directories

Compiled software is typically architecture-specific, that is, a binary program that runs on a sparc will not run on an HP. It makes sense to put all the architecture-specific directories into one directory so that computers of that type can all access it. This is configurable via SoftInst, but in our environment we will use single directory named /arch. This directory is available on all computers on the network, it will, of course, be different on different architectures.

Under the /arch directory there will be a directory for each software collection. Collections, as defined above, can be made of any arbitrary set of software, but it makes sense to group them either by function or obvious associations.

We will initially have these collections:

/arch/adm
Administration software.
/arch/beta
Software undergoing beta testing.
/arch/gnu
The set of software from the GNU project.
/arch/unix
General software. (unix is not the best term for this, but we like it better than misc :)
/arch/X11R5
X Windows (R5).
/arch/X11R6
X Windows (R6).
/arch/Xapps
X Applications. (We realize there may be a few collisions between programs that are X11R6 specific, but its easier to deal with these on a case by case rather than trying to have separate collections.)
/arch/com
Commercial Applications.

Shared Directories

In a multiple architecture network, many files can be used by computers of all types, while others are architecture-specific. For example, the emacs binary only runs on one type of computer (typically the one it was compiled on), but emacs-lisp files can be read by an emacs running on any type of platform. Thus it makes good sense to have one emacs-lisp respository on the network. This minimizes disk space usage, makes it easier to install emacs-lisp packages, and reduces the likelyhood of differences between platforms.

However, we have decided that the 'split installs' complicate things more than is neccessary. The solution we have proposed to fix this is to have /share and /arch be somewhat 'separated'.

The benefits to this solution are:
[Note that our choice of this method of using/arch and/share will not be inherent in the SoftInst implementation.]


Source Organization

Source code is another thing that we really don't want to be spread across computers, that we don't want lots of copies of, and we do want organized nicely. We have decided to place the source code in /ccs/src/[collection]/[software]-[version]. Although this is a separate location from /share and /arch, we like it for a number of reasons.

The reasons for this include:

Other options we considered, but have decided against include:
Physically place the source in /share/[collection]/src
This detracts from the collection scheme by making src a special case.

Physically place the source in the package directory and create symbolic links from /share/[collection]/src
This seems to make it more complicated. Although there are some distinct benefits of this method, for instance, the source is directly associated with the installation. If you want to remove some software, you can remove the entire package and be gone with it. It also helps in keeping source around for the installed programs. In this method, a link such as /share/[collection]/src/[software]-[ver] -> /share/[collection]/package/[software]-[ver]/src could be made.

Homegrown programs

Homegrown source code is a special case. We want to be certain that this source does not accidentally get erased by someone thinking it could be retrieved from the net anytime. It would also be nice to be able to do an 'ls' in one place and admire all of our handywork. :)

Thus, we have decided to make a directory /ccs/src/homegrown to contain all of our own programs. Note that this is not a collection name, as most of the programs in here fall under the other collections.

The programs from this location will often be installed in collections such as /share/adm or /share/unix, or perhaps in some cases, it would live in /priv on certain machines. When this source is installed in a 'collection', a symbolic link from that collection's source tree should be created to point to the homegrown tree. I.e. /ccs/src/adm/req-1.1 -> /ccs/src/homegrown/req-1.1. This makes it easier to locate the source for programs since it is still standardized. This is especially important for new members of the systems group who may not know about our own programming adventures.

Therefore we organize the source this way:

/ccs/src/adm
Source to administration software.
/ccs/src/beta
Source to software undergoing beta testing.
/ccs/src/gnu
Source to GNU software.
/ccs/src/unix
Source to general software.
/ccs/src/X11R5
Source to X Windows R5.
/ccs/src/X11R6
Source to X Windows R6.
/ccs/src/Xapps
Source to X Applications.
/ccs/src/homegrown
Source to Homegrown programs.
To try to tie the installed application to the source, we would make a link in each package directory to the source:
  /arch/gnu/packages/emacs-19.25/src -> /ccs/src/gnu/emacs-19.25

Name Collisions

The question comes up - what do you do with software programs that have the same name? How do they get linked into the directories populated with symbolic links?

Essentially, you decide which will be in the default environment for most users. Then you make the other one available by either:


Suggested Directory Conventions

To make the tools understand how to link a particular package into the collection, some standardized directory names should be adhered to. We suggest the following conventions, and the tools will reconize these by default.

Install
Administrator created log files (such as notes on how it was compiled, with what options, etc), SoftInst created log files, and configuration files for SoftInst specific to this package.
bin
Executables run by users
lib
Files referenced during run-time, including libraries
man
Manual page directories, containing man?? and cat?? subdirectories Autogenerate whatis files for the man directories.
Automatically link pointers into the help system to these.
doc
Documentation of various types of formats Automatically link pointers into the help system to these files.
etc
Executables never run by users
rc
Program 'resource configuration' files. Not too many programs use a separate directory for this, but it would be nice if they could be adapted to. It greatly aids in administration if we know where to look. Most programs use etc for such things, but etc is used for programs as well, and we think making the distinction is a win.
include
header files
info
GNU style info files Autogenerate `DIR' files from all of the info files.
log
Program-generated log files
src
The source code used to build the package (this may be a link to somewhere else)

Growth of Partitions

Growth of individual partitions in which a collection occupies is a concern in large distributed environments. Our solution will be to have multiple "package" directories, called ".package-##" in our scheme, which will represent different disks. This scheme will allow for indefinite growth of the collection, as we can simply add another partition to the available pool. For example:
/arch/gnu/packages/emacs-19.25 -> ../.packages-1/emacs-19.25
/arch/gnu/packages/gcc-2.59    -> ../.packages-2/gcc-2.59
[Note that/share and/arch .package-## directories can differ]

When installing a new package, the tools will be able be able to automatically select a partition with free space, or give you the ability to overide its selection. There will also be tools to move a package from one partition to another after.


Suggested Mounting

How the filesystems associated with software are actually mounted is a separate issue from the method of installing the software. But it is still an important part of the overall software organization. Instead of hard-coding any of these specifications into the system, they are here only as suggestions as to how we handle them here at CCS.

We use amd here at the College of Computer Science, although this is not neccessary, it greatly simplifies administration. We use the convention of having /net be our general amd toplvl filesystem which is used to mount partitions exported from other machines.

Given the terms mentioned above regarding share and arch parts of a collections, the mounted file systems are named to signify all of these facts.

/net/[arch]-[collection] will be the [arch] disk for [collection]. The shared collection will be /net/share-[collection]. When these collections grow beyond the capacity of the disks, we will add a partition number to the names to add new disks. This new partition will be used under the collection by having the '.[packages]-[version]/' directory point to it.

Another important part to our mounting system is the fact that we create /arch and /share as actual directories on each machine. This may seem odd at first, but we believe there are a few good reasons for doing this.


The SoftInst Project