# Maze generation

I have Scheme code to generate random mazes using an algorithm based on Tarjan's disjoint-set data-structure with amortised union-find operations. It runs on top of Scheme 48. However, it is pretty simple code, and should be easy to port to other systems.

The program has two back-ends: one uses the Functional PostScript package to render to PostScript, the other prints out mazes with a text representation. The maze is generated on a hexagonal tiling of a square area, just for fun.

You can see a large sample maze, it's solution, a smaller maze, or a maze for the truly persistent, all in PostScript format.

By the way: one of the students here, Robert Givan, has solved the maze for the truly persistent by hand. So it can be done. If you are as crazy as Robert is.

## The Algorithm

How should one dig a maze? Here's a simple "straw man" algorithm: Tile the maze area (I used hexagons for fun; squares are much simpler to work with). Pick a random starting point at the top of the maze area, start there, and do a random depth-first search of the maze, knocking down walls as you search out the maze. Mark cells as you go, so you never visit a cell twice. Then open up a random cell at the bottom.

The problem with this algorithm is that the initial paths will be long corridors snaking through the maze. As a result, the later branches off of these corridors will be constrained to be short digressions. So the true branch factor is not high; once you get onto one of the long "highways," it's hard to get lost.

A better algorithm uses the fast union/find disjoint-set data-structure. Again, we start out by tiling the maze area, then we randomly knock down walls in the maze until all the cells are connected. We never knock down a wall between two cells c1 and c2 if there is already a path from c1 to c2, so that when we are done, there is only one path between any two cells.

The algorithm maintains sets of cells representing connected components in the maze. That is, cells c1 and c2 are in set S iff there is a path from c1 to c2. When we knock down a wall between two cells, we union their sets together.

The algorithm starts by putting each cell in a singleton set, so there are as many sets as there are cells. Then it builds a vector containing all the walls in the maze; this vector is then randomly permuted. The inner loop of the maze builder scans the wall vector; for each wall w, it considers the wall's neighboring cells, c1 and c2. If c1 and c2 are in the same set, there is already a path between the two cells, so the wall is left in place. Otherwise, the wall is knocked down (which joins the two regions of the maze), and the two sets are unioned together. If the resulting set contains all the cells in the maze, we are done.

After digging the maze, the program finds the path length between every pair of top-row and bottom-row cells. The pair with the longest path between them are chosen as the entry and exit cells, and are opened up accordingly. So we choose the most difficult entry and exit cell we can, given the maze. This is done by using DFS to build a tree for the maze, then re-rooting the tree at each top-row cell, and searching backwards from each bottom-row cell, counting path length.

Finally, the maze is printed out using one of the back-ends.

## The big one

The biggest maze (mobymaze.ps) is 350x450 hex cells, about the limit of what I figured I could render on a single 8.5" x 11" sheet with a standard laser printer. (Actually, I could go finer, but most senior faculty here at MIT can't resolve the 350x450 maze without using a magnifying glass, so I think we'll leave it at that). It renders to about 6Mb of PostScript. It took about 102 CPU minutes to build the maze on a 90Mhz Pentium box running Scheme 48 with 13 Mword semispaces. Permuting the wall vector seems to be particularly slow, which is odd; perhaps my random number generator is bad. Printing the maze in PostScript format is also quite slow, probably due to S48's slow I/O.

The Pentium box was thrashing the disk the whole time it ran; wall time was much, much greater than 102 CPU minutes, and the entire computer was essentially unusable for any other task while the maze was being built. For starters, an algorithm that traverses a data-structure in a random order is, by design, going to exhibit zero locality. Secondly, Scheme 48's garbage collector is not generational, and hence copies the megabytes of maze data-structure back and forth on every GC. I could probably have helped the GC issue by dumping out a custom heap image containing only the maze code --- flushing the rest of the runtime (compiler, repl loop, and so forth and so forth) --- thus minimising the heap data.

It is intriguing to consider the color PostScript printer we have over at the Media Lab. It prints on rolls of paper three feet wide; images can be as long as you like. Then you could really get lost.