Software Packages
A software package is installed in such a way that all of the installation
is contained within a single directory.
The name of that directory is [softwarename]-[version]. (If
the version happens to contain a '-' it should be changed to a '_' or
some other character, so that the tools can distinguish [parse] the
name and version parts.)
For example, emacs version 19.22 would be installed in a directory
called emacs-19.22
The software, when configured, should reference the directory name
with the version number. This allows one to install a different
version of a software package without having collisions between the
two versions. This means that the whole compilation process must be
done for each new version of software.
In this way, everything related to a software package is located in
one directory, namely [software>]-[version]. There will be various
references to these files, perhaps even copies of them local to
machines, but for the most part the entire package is in this one
directory.
Software Package Subdirectories
Each package directory should contain a set of standard directories.
These can be configured differently at each site. (Suggested Subdirectories)
[SoftInst tools will be very flexible in this area, and allow sites to
configure the "default" directories for a new package of a particular
collection are. As well as what to do with each of these directories
at the 'linking' stage.]
Collections
Using just the above scheme, it would be possible for users to get the
programs that they want by putting the [package]/bin
directories for each software package in their path. However, this is
intractable because the path would get intolerably long (too long for
some shells) and it would take an enormous amount of maintenance.
Therefore we combine many software packages into a collection,
defining common directories that unite those packages.
Specifically, a collection of software contains a directory called
packages. This directory contains a set of all the software packages
in this collection. (All in the form of [software]-[version].)
Each package in that directory that is the current version should
have a symbolic link pointing to it that contains the name of the package
without a version number i.e.:
[software] -> [software]-[version]
By definition, the symbolic link points at the version of the software
currently supported.
The software collection directory also has a directory for each of the
directories that are known to be in software package directories. Thus it
has a bin, a doc, an etc, and so on. Each of these directories is
populated with symbolic links pointing to the various package-specific
files.
As an example, consider an arbitrary collection of software, it
contains the directories which users will reference:
[collection]/bin
[collection]/etc
[collection]/man/man?
[collection]/lib
[collection]/packages
[collection]/src
If emacs-19.25 is installed as part of this collection, then it will be
in the packages subdirectory: [collection]/packages/emacs-19.25
Assuming it is the current version, it will have a symbolic link
pointing to it:
[collection]/packages/emacs -> emacs-19.25
The files in the emacs package will be mirrored in the collection directories,
so if emacs has these files:
[collection]/packages/emacs-19.25/bin/emacs
[collection]/packages/emacs-19.25/bin/etags
[collection]/packages/emacs-19.25/lib/emacs-lisp
Then these files will be created:
[collection]/bin/emacs -> ../packages/emacs-19.25/bin/emacs
[collection]/bin/etags -> ../packages/emacs-19.25/bin/etags
[collection]/lib/emacs-lisp -> ../packages/emacs-19.25/lib/emacs-lisp
Notice that the symbolic links are made relative, not absolute. This
will typically minimize the number of file systems references that will
have to be made to locate the file. Further, it allows collections of
software packages to be moved as a group without depending on the specific
absolute pathname of the collection.
Also notice that the symbolic links refererence the version specific
pointer (i.e. "emacs-19.25", not just "emacs"). Although it would
seem to make sense to link to "emacs", and then only need to change
this one link on a later update, however we have decided that since
this is automated by tools, they will be able to resolve uneeded
symbolic links. Also, an update to an application does not always
contain the same programs/binaries, thus it makes more sense to use
the tools to "un-link" the previous version, and "link" the new
version.
Architecture Specific Directories
Compiled software is typically architecture-specific, that is, a
binary program that runs on a sparc will not run on an HP. It makes
sense to put all the architecture-specific directories into one
directory so that computers of that type can all access it. This is
configurable via SoftInst, but in our environment we will use single
directory named /arch. This directory is available on all computers
on the network, it will, of course, be different on different
architectures.
Under the /arch directory there will be a directory for each
software collection. Collections, as defined above, can be made of
any arbitrary set of software, but it makes sense to group them either
by function or obvious associations.
We will initially have these collections:
- /arch/adm
- Administration software.
- /arch/beta
- Software undergoing beta testing.
- /arch/gnu
- The set of software from the GNU project.
- /arch/unix
- General software. (unix is not the best term
for this, but we like it better than misc :)
- /arch/X11R5
- X Windows (R5).
- /arch/X11R6
- X Windows (R6).
- /arch/Xapps
- X Applications. (We realize there may be a few
collisions between programs that are X11R6
specific, but its easier to deal with these on
a case by case rather than trying to have
separate collections.)
- /arch/com
- Commercial Applications.
Shared Directories
In a multiple architecture network, many files can be used by computers
of all types, while others are architecture-specific. For example, the
emacs binary only runs on one type of computer (typically the one it was
compiled on), but emacs-lisp files can be read by an emacs running on any
type of platform. Thus it makes good sense to have one emacs-lisp
respository on the network. This minimizes disk space usage, makes it
easier to install emacs-lisp packages, and reduces the likelyhood of
differences between platforms.
However, we have decided that the 'split installs' complicate things
more than is neccessary. The solution we have proposed to fix this is
to have /share and /arch be somewhat 'separated'.
- When we have a package that has binaries in it or other arch
specific things, it will be installed entirely in /arch.
- When we have a package which is not architecture specific, such as
scripts, they will be installed into /share.
The benefits to this solution are:
- All in one place, installation greatly simplified.
- Can be placed on a slower, non-critical partition.
- Dependancies on a shared file system is reduced, some
architechtures could go down while others remain usable.
- We still have the ability to have things 'shared'. Shared scripts
are a big-win. A lot of what we write is in the form of utility
scripts, and it just makes sense to share them.
[Note that our choice of this method of using/arch and/share will not
be inherent in the SoftInst implementation.]
Source Organization
Source code is another thing that we really don't want to be spread
across computers, that we don't want lots of copies of, and we do want
organized nicely. We have decided to place the source code in
/ccs/src/[collection]/[software]-[version]. Although this
is a separate location from /share and /arch, we
like it for a number of reasons.
The reasons for this include:
- It is similiar to the system we have been using for a while now.
- Source code is for the most part architecture independant (at
least that is the theory), with a few exceptions.
- It makes it simple to retrieve the source, and play around with
it, softinst is not really needed at this point.
- All of the source lives in one location. It really lives there,
rather than being a symbolic link farm.
Other options we considered, but have decided against include:
- Physically place the source in /share/[collection]/src
- This detracts from the collection scheme by making src a special case.
- Physically place the source in the package directory and create
symbolic links from /share/[collection]/src
- This seems to make it more complicated. Although there are some
distinct benefits of this method, for instance, the source is directly
associated with the installation. If you want to remove some software,
you can remove the entire package and be gone with it. It also helps
in keeping source around for the installed programs. In this method,
a link such as /share/[collection]/src/[software]-[ver] ->
/share/[collection]/package/[software]-[ver]/src could be made.
Homegrown programs
Homegrown source code is a special case. We want to be certain that
this source does not accidentally get erased by someone thinking it
could be retrieved from the net anytime. It would also be nice to be
able to do an 'ls' in one place and admire all of our handywork. :)
Thus, we have decided to make a directory /ccs/src/homegrown
to contain all of our own programs. Note that this is not a
collection name, as most of the programs in here fall under the other
collections.
The programs from this location will often be installed in collections
such as /share/adm or /share/unix, or perhaps in
some cases, it would live in /priv on certain machines. When
this source is installed in a 'collection', a symbolic link from that
collection's source tree should be created to point to the homegrown
tree. I.e. /ccs/src/adm/req-1.1 ->
/ccs/src/homegrown/req-1.1. This makes it easier to locate the
source for programs since it is still standardized. This is
especially important for new members of the systems group who may not
know about our own programming adventures.
Therefore we organize the source this way:
- /ccs/src/adm
- Source to administration software.
- /ccs/src/beta
- Source to software undergoing beta testing.
- /ccs/src/gnu
- Source to GNU software.
- /ccs/src/unix
- Source to general software.
- /ccs/src/X11R5
- Source to X Windows R5.
- /ccs/src/X11R6
- Source to X Windows R6.
- /ccs/src/Xapps
- Source to X Applications.
- /ccs/src/homegrown
- Source to Homegrown programs.
To try to tie the installed application to the source, we would make a
link in each package directory to the source:
/arch/gnu/packages/emacs-19.25/src -> /ccs/src/gnu/emacs-19.25
Name Collisions
The question comes up - what do you do with software programs that have the
same name? How do they get linked into the directories populated with
symbolic links?
Essentially, you decide which will be in the default environment for most
users. Then you make the other one available by either:
- Not linking it into the main directories, but putting it
explicitly in the path of the people who want to access it.
(Soft can be used to simplify this for users, and this is
where the ability to let SoftInst manage Soft's configuration
file would help tie the two together.)
- Put it in a separate collection and letting the user order the
collections in their path (the GNU stuff is a model for this).
Suggested Directory Conventions
To make the tools understand how to link a particular package into the collection, some standardized directory names should be adhered to. We suggest the following conventions, and the tools will reconize these by default.
- Install
- Administrator created log files (such as notes on how it was
compiled, with what options, etc), SoftInst created log files, and
configuration files for SoftInst specific to this package.
- bin
- Executables run by users
- lib
- Files referenced during run-time, including libraries
- man
- Manual page directories, containing man?? and cat?? subdirectories
Autogenerate whatis files for the man directories.
Automatically link pointers into the help system to these.
- doc
- Documentation of various types of formats
Automatically link pointers into the help system to these files.
- etc
- Executables never run by users
- rc
- Program 'resource configuration' files. Not too many programs
use a separate directory for this, but it would be nice if
they could be adapted to. It greatly aids in administration
if we know where to look. Most programs use etc for such
things, but etc is used for programs as well, and we think
making the distinction is a win.
- include
- header files
- info
- GNU style info files
Autogenerate `DIR' files from all of the info files.
- log
- Program-generated log files
- src
- The source code used to build the package (this may be a link to somewhere else)
Growth of Partitions
Growth of individual partitions in which a collection occupies is a
concern in large distributed environments. Our solution will be to
have multiple "package" directories, called ".package-##" in our
scheme, which will represent different disks. This scheme will allow
for indefinite growth of the collection, as we can simply add another
partition to the available pool. For example:
/arch/gnu/packages/emacs-19.25 -> ../.packages-1/emacs-19.25
/arch/gnu/packages/gcc-2.59 -> ../.packages-2/gcc-2.59
[Note that/share and/arch .package-## directories can differ]
When installing a new package, the tools will be able be able to
automatically select a partition with free space, or give you the
ability to overide its selection. There will also be tools to move a
package from one partition to another after.
Suggested Mounting
How the filesystems associated with software are actually mounted is a
separate issue from the method of installing the software. But it is
still an important part of the overall software organization. Instead
of hard-coding any of these specifications into the system, they are
here only as suggestions as to how we handle them here at CCS.
We use amd here at the College of Computer Science, although
this is not neccessary, it greatly simplifies administration. We use
the convention of having /net be our general amd
toplvl filesystem which is used to mount partitions exported
from other machines.
Given the terms mentioned above regarding share and arch parts of a
collections, the mounted file systems are named to signify all of
these facts.
/net/[arch]-[collection] will be the [arch] disk for
[collection]. The shared collection will be
/net/share-[collection]. When these collections grow beyond
the capacity of the disks, we will add a partition number to the names
to add new disks. This new partition will be used under the
collection by having the '.[packages]-[version]/' directory point to
it.
Another important part to our mounting system is the fact that we
create /arch and /share as actual directories on
each machine. This may seem odd at first, but we believe there are a
few good reasons for doing this.
- Additional collections can be added from 'new disks' easily.
- The symbolic links pointing to the 'top of each collection' are
"highly-referenced". Making them local rather than having lstats happen
over the net makes sense.
So for example, our /share and /arch trees look
something like:
/arch/X11R5 -> /net/sun4-X11R5/
/arch/Xapps -> /net/sun4-Xapps/
/arch/adm -> /net/sun4-adm/
/arch/com -> /net/sun4-com/
/arch/gnu -> /net/sun4-gnu/
/arch/unix -> /net/sun4-unix/
/share/adm -> /net/share-adm/
/share/unix -> /net/share-unix/
The SoftInst Project