Some Lessons Learned at SuperComputing '95 David N. Blank-Edelman v 1.0 12/12/95 Sitting in my own little puddle of light on a dark airplane headed back to Boston, I thought I would take the time to record some of the lessons I learned from SC95. Jacked into to the seat armrest (listening to music with all the fidelity the two straws in my ears can can provide), I've taken the time to reminisce and catch my breath from a trip that included 21-hour days and more networking equipment than you can shake a teraflop at. Before I start, I have to note with a certain level of irony that the VCR on the plane has started to have intermittent troubles and the captain just headed down the aisle towards it. I can certainly commiserate. -- Building the Network Most of the first week at SC95 consisted of physically and logically building the pre-designed Scinet/WAVE network. This network (not including any equipment in the vendor areas) consisted of at least 10 routers, 6 large-scale ATM switches, 3 FDDI concentrators, wireless network equipment, many Ethernet switches and concentrators, and approximately 850-1200 fiber patches (multi and single-mode), numerous patch panels, and hundreds of 10-BaseT cables and so on. This was largest network I had ever had experience with, and perhaps the largest conference network ever constructed. Here are some of the things I learned as part of the process of building the network: 1) Getting the physical network connected and tested must be the first priority. Almost all other activities block if the network devices (routers, switches, etc) are not connected to each other and their edge devices. For the first week, almost everyone participated (including me), unplanned, in plugging things in. 2) The network was physically connected in a breadth-first manner. Since this can be an inherently parallel task (fifty people can plug in fifty different cables faster than 10 can) this may seem the logical way to construct such a large network. This approach proved to be problematic for a few reasons: o it was difficult at any one time to know how much of the network was wired, or even which parts. This was actually a constant frustration for me because just when I thought the entire network had been assembled, more undone patches would be located. o there were no priorities associated with the process. Had the process been prioritized, smaller subsets of the network could have been constructed and/or tested first, potentially allowing some of the members from other teams to begin their work. o this approach did not allow for point-to-point connectivity (the only kind one really cares about, i.e. is there a path from this router to that router in place?) testing. Besides the occasional information from a network device (i.e. the lights in a Fore ATM switch go off when it has correct physical connectivity to another ATM device) when a network path was completed, we had to rely on other people to let us know when something was improperly patched, patched with bad cables, or connected to a broken device (sometimes all of the above!). Multiple people most likely checked the same connection multiple times. Besides causing grief to the physical team, the people configuring the network devices had to include physical connectivity as part of their debugging cycle. As a result, I believe any large network needs to be constructed and tested point-to-point (depth-first) for reliable operation. 3) The SC95 network was designed to include a separate ethernet control network that would consist strictly of connections to network devices (i.e. so one could talk to a router to configure it). This network needs to be installed/tested _first_ to prevent holding up the network device configuration process. Network devices can be configured even before all edge/peer devices have been physically connected if this is in place. If done right, a control net can be a very good thing. 4) I learned never to buy 3COM etherswitches (or at least never to use them for anything but dumb switching). The entire team lost considerable time because they did not function as we expected them to (they were partially used to connect the control network, but were incapable of not bridging all of their traffic between separate network connections). 5) I learned my way around fiber (FC, ST, SC, and MIC connectors / single vs. multi-mode, etc). 6) I learned about FDDI ring topologies (how constructed, maintained, how to tell when they are not working). 7) *Every* cable (no exceptions) in large network needs to be labeled with both source and destination. When chasing cables, every labeled cable is a blessing and every unlabeled cable is a curse. 8) A cable labeling machine (like the ones we used made by Brady) makes this task significantly easier, longer lasting, and more reasonable. I would recommend CCS purchase one. 9) The cable installer has to get it right. We encountered at least one important patch panel that had been strangely connected (fiber pairs were split among cabinets). This caused a few hours delay while we deciphered it. 10) Sun-made FDDI sbus cards are bad (Interphase makes better ones). Their ATM boards are good. 11) There are better ways to represent network construction than spreadsheets (pictures can be crucial). We worked off of a very large spreadsheet listing connections. This contributed to some of the problems connecting the network mentioned above. 12) When constructing a network with a large group of people there needs to be multiple people coordinating all of the connections. Those people need to have access to a shared, current set of network connections/maps. If this is not in place, there are many opportunities to block waiting for a single person to provide new or changed cable information. 13) Out-of-band communication is essential. I do not believe we could have constructed and tested the network without the distribution of walkie-talkies to most everyone on the project. A MUD (which allows for virtual access to people without requiring physical proximity) would have been useful at the show. 14) The wireless network, albeit in its fairly small form, appeared to work flawlessly once properly connected to the main network. Wireless network technology should be considered another out-of-band communication method worth exploring. -- Getting and Keeping the Network Up Late in the first week the network started to gel but then began to fluctuate as it was expanded to included other outside sites that needed connectivity for WAVE/I-WAY use. 1) Besides needing good networking hardware, having stable software running in the equipment is crucial. For instance, towards the end of the show both of our BayNetwork backbone routers consistently and almost continuously crashed during our efforts to configure them (using the prescribed Bay networking GUI!). We later found out that they were running beta code that included a feature set we did not need (SVC's) and that the previous version was considered stable. Grrr. We also several hardware failures (bad or flaky board, etc). 2) It is crucial to have experienced vendor/carrier support available. It was clear that some of the vendor/carrier reps (those who donated people) did not have a sufficient amount of personal experience with their products and network technology to be helpful without spending large amounts of time communicating back to their home base. 3) There were two basic tools that we used to debug network problems: ping and traceroute. 4) ATM, from a management point of view, could used some maturation. There are no ATM ping or traceroute programs. One tends to rely on cell counters to determine connectivity. Debugging problems can require doing binary arithmetic. ATM sniffers are still relatively new beasts (we had one commercial and another experimental box available but much less used than the FDDI/Ethernet sniffers around). 5) The connection method we used (PVC as opposed to SVC's) requires adding two pieces of config info (i.e. info for both directions of data flow) for every single connection. The interface to the switches (the Fore GUI was mostly used) does no error checking so finding a wrong PVC config usually consists of tedious rechecking of figures. 6) It is not clear how good the interoperability of the other method (SVC's) is among vendors. Towards the end of the show, the SVC approach had to be abandoned by the main networking team. Fore boxes support two methods of signalling (UNI 3.0 and SPANS), but the company vociferously promotes the latter, proprietary method. 7) Because of the sheer amount of connectivity required to do I-WAY and other networking, there were points at which the network appeared to be in severe flux. This lead me to postulate that it might have been useful to view the network as consisting of discrete parts. With this view, one could say that some of these parts needed to be up at particular moments in time rather than looking at the whole and attempting to get the whole thing "working." "Healthy" could not be measured by the aggregate status of all pieces. 8) Given that, it is still very important to have a notion of the current status of the net at any one time (this is useful info for the construction stage as well). It is not an easy dataset to either gather or represent, but it is necessary. SunNet Manager and OpenView were both present at the show but there were not enough resources to make use of them. During some idle time, I wound up (learning Perl5 in the process) writing a tool or two to provide this information. 9) During the show, Internet access was treated by both support and non-support personnel as having the same priority level as air (i.e. had to have it all the time). When it went down many people (though they weren't gasping) got awfully unhappy. 10) HP Jetdirect (printing software) is exceptionally easy to set up on Solaris 2.4 machines. -- Misc Thoughts 1) Setting up a NOC (Network Operations Center) to minimize interrupts is crucial. That broke down in several ways at the show: o physical layout did not prevent access to key personnel o badging system either breached, not honored, or too flat to prevent above 2) Workflow analysis for things like trouble-ticket systems needs to be completely fleshed out before the show (who sees what and in what order to get something achieved). 3) Distributed trouble-ticket systems based mostly on discrete pieces of paper make work-flow management (if attempted) very difficult ("Ok, who has ticket #1234?"). 4) Adding a quiet space in a NOC is an excellent idea (rest, meetings, etc). ---- That's a very incomplete list. With two intense weeks to sift through, I am certain that there are many more pearls of wisdom to be gleaned. Perhaps in a future version of this document... Oh, and the airplane VCR eventually worked fine. Comforting, no?