Systems Seminar
Thursdays, 1:30 (Spring, 2008)
366 WVH
Sponsor: Gene Cooperman

The systems seminars will be ideally held on Thursdays at 1:30 in 366 WVH during Spring, 2008 semester. We've been temporarily bumped by the PhD Admissions Committee during January. Stay tuned to this page for the final time.

Systems is the subject of how to organize computations in the presence of limited resources (which typically requires _system_ calls). We have discussed subjects ranging from parallel computing to overcoming systems bottlenecks due to RAM/Network/Disk; and a variety of scientific applications. This area of research typically involves a fine interplay among algorithms, software, and applications.

===============================

Currently Scheduled Talks (Fall, 2007 Semester)

May 8, 2008
Title: Summing up for this academic year
Gene Cooperman
We will use this to sum up where we are as a working group, and how to organize best for the remaining opportunities.
May 1, 2008
Title: Graceful Degradation of Compact Hash Arrays
Peter Dillinger
I'm working on using lossless sets of hash values to over-approximate sets. I use these rather than inherently probabilistic structures so that I can rehash the sets to accomodate more elements in the same amount of space. Further, I want to be able to rehash within the same space itself (rather than allocating a new copy and copying all the elements).

A data structure by John G. Cleary[1] is clearly capable of this, but making the structure reasonably fast but also compact and flexible while supporting these kinds of rehashing presents some interesting challenges.

A data structure of mine, the Hash Pyramid, appears to be capable of this kind of rehashing except that (1) it would be much slower to rehash and (2) it's not clear whether it can rehash in place to a k=1 Bloom filter, as the Cleary structure can.

I plan to discuss these challenges and get some brainstorming going.

[1] John G. Cleary: Compact Hash Tables Using Bidirectional Linear Probing. IEEE Trans. Computers 33(9): 828-834 (1984) http://ieeexplore.ieee.org/iel5/12/35240/01676499.pdf

Apr. 23, 2008
Title: Report on Trip to CERN
Gene Cooperman
In the visit to CERN, Xin Dong and I proposed a software collaboration based on: (1) automatic multi-threaded parallelism; and (2) checkpointing. This was part of the Workshop on Multi-Core Computing and Virtualization in the LHC. Those two topics are WP8 and WP9 in the planned software efforts at CERN. We also received a presentation by the CMS group, where they describe a 2 million line program with about 1,000 packages, and 900 shared libraries (of which 400 are present in a given application). At CERN, they have discovered new limits on software scalability not typically seen elsewhere (although they anecdotally report that Firefox/Mozilla is having similar experiences). Two issues are: (1) the time to link so many shared libraries; and (2) a 1.5 MHz malloc rate due to use of STL/C++ code from non-C.S. specialists.
Apr. 17, 2008
Title: Dry Run of Thesis Defense
Eric Robinson
I'll be doing a dry-run of my thesis defense at the system's seminar this week, April 17th, 1:30-2:30PM. I'd appreciate as much feedback as I can get.
Mar. 27, 2008
Title: Linear Algebra for Graph Algorithms: Betweenness Centrality
Eric Robinson
With the advances that have been made with modern mathematical software and specifically with sparse matrix representation and storage, representing a graph as an adjacency matrix is no longer space-inefficient. It is now possible to effectively compute over large graphs using software such as Matlab, Maple, Mathematica, etc.

To demonstrate this, the ease of implementing a few of the basic graph algorithms using an adjacency matrix representation is shown. After this, a more complicated algorithm, betweenness centrality, is examined. Betweenness centrality seeks to compute the importance of each node in a graph by determining the number of shortest paths that it lies on. The algorithm's implementation using an adjacency matrix graph representation and only basic matrix and vector operations, such as matrix multiplication and element by element operations, is presented.

The ability to represent complex graph algorithms such as this as linear algebra problems in Matlab, etc. provides a reduced development time in many cases. The resulting code is significantly smaller. In addition, it provide access to "free" parallel versions of the algorithms with software such as pMatlab, starP, and parallel Matlab.

Mar. 20, 2008
Title: A New Approach to Direct Condensation using Parallel Disk as Primary Storage
Ana Maria Visan
Condensation is a method through which we can reduce the size of the object to be analyzed and that guarantees to retain enough information for usefulness. Condensation is a critical technique for the Modular Atlas project, which analyzes modular matrix representations for finite groups. For large simple groups, these representations are more efficient computationally than the next alternative: permutation representations. A new approach to direct condensation is presented that uses parallel disk as primary storage. The running time of the new approach is O(SN), where N is the size of the orbit being condensed and S the size of the generating set. Previous algorithms had a running time of O(LSN), where L is the landmark ratio, thus the new algorithm is an improvement over the previous algorithm. Using the new disk-based approach, we produce condensation matrices for the sporadic simple Janko Group J4 on an orbit of size 131,358,148,251. Each point of the orbit is a 4-dimensional subspace in a vector space of dimension 112 over GF(2). The new algorithm demonstrates a condensation of a group whose orbit is 10 times larger than the largest previous such group for which direct condensation was applied.
Mar. 12, 20008; 12 noon (unusual time, place and day)
212 West Village H
Endomorphism rings and their character tables
Jürgen Müller
Starting points are the notions of distance-transitivity and -regularity, as well as the Ramanujan property, in algebraic graph theory. These are closely connected to the action of the graph automorphism group. From that point of view, graphs appear as orbital graphs associated with a permutation action of a given finite group.

Given a permutation module, representation theory tells us to consider its endomorphism ring, which similar to the group case turns out to have a character table. The latter finally contains information about purely graph theoretic properties, like the ones mentioned above, of the associated orbital graphs.

Together with T. Breuer we have built a database containing the character tables of the endomorphism rings of the multiplicity-free permutation actions of the sporadic simple groups. We report on this, as well as on the computations necessary to compile the data. In particular, to tackle the large cases, together with M. Neunhoeffer and R. Wilson we have invented a new general technique to enumerate very large orbits.

Mar. 6, 2008
"Proving 25 Moves Suffice for Rubik's Cube; Disk-based Computation Wins Again (PART TWO)"
Daniel Kunkle
This is a continuation of last week's talk. Having heard some of the background, we will now hear about the algorithmic improvements needed to show that 25 moves suffice.
Feb. 28, 2008
"Proving 25 Moves Suffice for Rubik's Cube; Disk-based Computation Wins Again"
Daniel Kunkle
Last year, we proved that 26 moves suffice to solve Rubik's cube, beating the previous best upper bound on solution lengths of 27. This was accomplished with two separate computations: a disk-based breadth first search over 1.4 trillion states, which proved 29 moves sufficient; and a large trivially parallel RAM-only computation to refine the upper bound to 26. This presentation will describe our continuing work to further refine the upper bound to 25 (expected to complete in the next one to two months). To accomplish this, we needed to design a new disk-based version of the existing RAM-based refinement method. This example of transforming an existing RAM-based computation into a disk-based computation is representative of many such efforts in our research group, and will serve as a general guide for the process.
Feb. 21, 2008
"Introduction to Distributed Multi-Threaded Checkpointing (DMTCP)"
Kapil Arya
DMTCP (Distributed Multi-Threaded CheckPointing) is a user level transparent checkpointing tool for distributed, multithreaded applications. DMTCP can also be used for debugging processes as well as for process migration. It doesn't require any modification of kernel or application binary and is based on TCP sockets which gives it full generality. There are many subtleties to designing a package with broad coverage over such varied parallel libraries as OpenMPI, MPICH-2, SciPy (python), along with parallel applications such as ParGeant4 (a Monte Carlo simulation), etc. Some of the issues to be discussed are: whole checkpointing procedure including checkpointing of pseudo terminals, sockets, shared memory etc.; and Scalability.
Feb. 14, 2008
"Overview of the NVIDIA G8 Architecture and Design Issues"
Gene Cooperman
This talk will review the NVIDIA G8 architecture with an eye toward deducing the tradeoffs and design issues that were faced by the designers of the NVIDIA G8 architecture.
Jan. 30, 2008, (11:45 a.m., 366 WVH -- note unusual time)
"Multi-Core Computing"
Gene Cooperman
While the PhD admissions committee has our time slot, we continue to have a floating systems seminar. :-)

This Wednesday (tomorrow), we will hold a joint PL/Systems seminar at: 11:45 a.m., Wed., 366 WVH

I will be talking informally (white board) about multi-core computing. The goal is to find common issues affecting both systems and PL. Part of the talk will be review, and part of the talk will be speculation about the future.

Jan. 18, 2008 (1:45 p.m. -- note unusual time)
"Alternative Designs for Automatic Transformations to Implement Thread Parallelism in the Presence of C++ Inheritance
Xin Dong
Xin Dong will speak on the issue of implementing automated thread parallelism in the presence of object inheritance in C++. Xin will briefly present the issues and then ask the seminar attendess to help make suggestions and debug design possibilities. Xin already has a simple transformation for the common case encountered in Geant4 code. In the presence of inheritance, his simple transformation becomes harder to implement (since it must take account of the base class), and Xin is looking for an audience to discuss alternative designs.
Jan. 10, 2008, 1:30 p.m.
"Experience in Implementing Thread Parallelism for Geant4"
Xin Dong
Xin Dong will present his experience in implementing thread parallelism for Geant4. Some of the key ideas were simple transformations for singleton class, and the careful use of the relatively new __thread keyword, introduced in C99 and C++98.
Nov. 8, 2007 (1:30 p.m. -- note unusual time)
"Disk-Based Parallel Coset Enumeration" (cont.)
Vlad Slavici
Vlad will continue the discussion by explaining his parallel, disk-based strategy for coset enumeration. If time permits, Gene will give the background for BDDs (Binary Decision Diagrams) --- another pointer-chasing problem. BDDs are used in verification, and a similar parallel, disk-based strategy may be available for this algorithm.
Nov. 1, 2007 (1:30 p.m. -- note unusual time)
"Disk-Based Parallel Coset Enumeration"
Vlad Slavici
First, a short description of the general way to convert an NFA to a DFA. If the NFA has n states, the DFA will have at most 2^n states. All the states of the DFA can be represented by an n-bit string. Generation of a new state takes at most O(n) time using bit string operations. Pointer chasing is an issue because the next state may be owned by any node. Still, all the generated states are valid, so the disk space will be continually increasing until reaching the final configuration.

Then a problem where pointer chasing visibly dominates the computation: Coset enumeration is the problem of converting a presentation representation of a group into a permutation representation. Many problems require the permutation representation of a group as input. In order to obtain the result - a complete coset table - all known algorithms define additional cosets, most of which turn out to be duplicates. Basically, there are two approaches to this problem:

HLT - which defines as many cosets as necessary to fill a relator loop( a circular sequence of cosets of length equal to the order of a generator) and then tries to collapse them

Felsch - which tries to collapse cosets as soon as it defines new ones

HLT is easier to implement, but since previous approaches only used RAM, Felsch was preferred for using less space. For HLT, the intermediate space necessary overflowed RAM, or even one disk.

Our approach is to use the HLT strategy, but using parallel disks, so that the space necessary for keeping the intermediate states would not be an issue any more. The strategy will be presented, as well as some rough estimates of the number of accesses to disk for a given input size.

Oct. 25, 2007 (3:00 p.m. -- note unusual time)
"Twenty-Six Moves Suffice for Rubik's Cube"
Daniel Kunkle
The minimum number of moves sufficient to solve any configuration of Rubik's Cube has been a matter of long-standing conjecture -- since the puzzle appeared over 25 years ago. This number is sometimes known as "God's Number". This talk will describe the methods we have used to show that God's Number is no more than 26.

To achieve this result, we used new techniques in two areas: group theoretic methods for efficiently representing and computing over the possible configurations of Rubik's Cube; and, parallel disk-based algorithms for enumerating large search spaces. This talk will primarily focus on the disk-based search and enumeration techniques we have introduced.

Oct. 18, 2007
No systems seminar
The systems seminar is preempted by a faculty meeting at the same time.
Oct. 11, 2007 (3:00 p.m. -- note unusual time)
"Distributed Disk-Based Search and Enumeration: Discovering the Shortest Path to a Thesis"
Eric Robinson
With the introduction of multi-core computing, Moore's Law continues to hold. Problems that were previously infeasible due to computation time are now being re-examined. For many of these problems, however, the issue at hand is no longer CPU time, but rather available storage. This is especially true in the case of search and enumeration problems, which are typically characterized as being on the order of $O(N)$, but may require the storage of billions or trillions of ``values''.

Large search and enumeration is examined, and the case for using distributed disk is made. Fundamentally, disk is on the order of 10 times slower than RAM when streamed. This slowdown can be compensated for through the use of a cluster of disks or a RAID system. For this, the application programmer has access to on the order of 100 times more storage. Many techniques for search and enumeration are presented, including one method developed in-house, and a ``fair'' method for comparing these techniques is given.

The talk shall also follow my research as it developed through my 5.13 years (currently) at Northeastern, and will demonstrate some of the goals and pitfalls of being a Ph.D. student. These will be presented through a look at the progress, or lack thereof, of my research during my time at Northeastern.

Oct. 4, 2007
No meeting this week.
Sept. 27, 2007
"Parallel algorithms in numerical algebraic geometry"
Anton Leykin,
   Institute for Mathematics and its Applications (IMA),
   U. Minnesota
Numerical algebraic geometry (NAG) is a young area encompassing numerical methods that deal with systems of polynomial equations. The main engine of NAG, the polynomial homotopy continuation procedure, is easily parallelizable. This makes NAG algorithms suited for parallel and distributed computing better than those based on classical symbolic techniques such as Gröbner bases.

This talk will start with an introduction to NAG and continue with sketches of two parallel algorithms. One is a core algorithm for parallel monodromy breakup, the other is an application to pure mathematics; it computes Galois groups of simple Schubert problems.

Time permitting we will discuss the recent algorithm for numerical primary decomposition, which enables us to solve the ideal membership problem numerically.

Sept. 20, 2007
"Structures for potentially probabilistic state enumeration in RAM"
Peter Dillinger
Part 1: Classical approaches to representing visited lists can require much more memory than is theoretically required to store subsets of some finite universe of elements. In particular, when the subset is neither particularly dense nor particularly sparse in the universe, discrete information theory dictates that classical approaches are far from optimal. For example, it is possible to represent 200,000 arbitrary 32-bit values (such as IP addresses) using less than 16 bits per value. My data structure, the "hash pyramid," shares enough structure between elements to use only 17.5 bits per value in this case. A nearly-forgotten structure by Cleary is similarly compact, but only when made quite slow.

Part 2: When an exact visited list cannot fit in RAM, it is useful in many applications to resort to an over-approximated visited list. Because the elements are hashed, there is a predictable probability of whether a new state will erroneously be considered visited. Once again, discrete information theory can tell us how much space is required to achieve certain probabilities. One underacknowledged structure, the "hash compact" table, is nearly optimal, but only if the number of reachable elements is known in advance. In contrast, the popular Bloom filter is more generally mediocre, meaning it is likely to perform relatively well when the number of reachable elements is not well known. Pete Manolios and I were the first to contrast these structures in this way.

Part 3: Structures for exact sets can also be used as over-approximations, by exactly storing a set of hashes. The hash pyramid and Cleary structure are quite competitive when used in this way, and they have a property that neither the hash compact table nor the Bloom filter has: they can be rehashed to accomodate more entries. The Cleary structure in particular can be rehashed in place with linear scans.

Part 4: All of these schemes depend on hashing. Two hash functions in particular are quite useful in explicit-state verification; these have been implemented in our 3Spin and 3Murphi tools based on Spin and Murphi. The first is the Jenkins LOOKUP2 function, which is heuristically accurate and predictably fast. The second is H3, which has "universal hashing" accuracy guarantees, and can use a differential optimization to make it heuristically fast. Thus, these two hash functions are nice counterparts.

Sept. 13, 2007
"Organizational Session"
Gene Cooperman
We will have an organizational session in which Gene will give a short report on interesting applications he found on his recent travels, and others will give a five minute report on the current status of their research projects. As with all systems seminars, additional participants/listeners are welcome.
===============================

Talks from Previous Semester (Fall, 2006)

Nov. 30, 2006
"Large-Space Computations: Using Disk as RAM"
Gene Cooperman
We have at least two examples in our working group demonstrating the solution of problems by using many terabytes of disk space spread among mainy distributed disks. This seminar will be interactive. We will use the wireless connection and google to help us widen the set of applications in real time. Some sample places to look are very large Markov chains (with hundreds gigabytes of training data?), password breaking using very large rainbow tables, large airline scheduling problems, the search for large codes, the travelling salesman (a perennial favorite), integer programming (and branch-and-bound style solutions), robust optimization in linear programming, and satisfiability (with potential applications to VLSI design).

Nov. 16, 2006
"Disk-Based Distributed Search: A New Approach"
Eric Robinson
When looking at previous methods for search space enumeration and notice an obvious partitioning. Some methods emphasize performing additional computations in order to reduce the overall space of the enumeration. While these methods seem advantageous at first, upon closer examination it is seen that streaming disk is actually faster than accessing memory randomly. The second class of methods take advantage of this by requiring more space, which is accessed only in a streaming manner, however performing less computations.

Both of these techniques focus in on a particular level in the memory hierarchy. The first aims primarily at using memory during the computation, the second at using disk. We offer an alternative that takes advantage of both the memory and disk levels in the memory hierarchy. Through this, we provide an algorithm that improves upon the best known disk-based algorithm in many cases. We offer several examples of where our technique has been used.

Finally, we show how all of these search space enumeration techniques fall into a hierarchy which has a natural tradeoff between space and time. Our goal is to define under exactly what circumstances each method will be optional based on criteria such as the problem size and architectural parameters such as disk and memory speed, along with the number of machines used in the computation and the speed of the network.

Nov. 9, 2006
"Parallelization of the ILU(k) Preconditioner for Solving the Matrixi-Vector Equation Ax=b" (cont.)
Xin Dong
This is a continuation from last week.

Nov. 2, 2006
"Parallelization of the ILU(k) Preconditioner for Solving the Matrixi-Vector Equation Ax=b"
Xin Dong
There is no generally optimal way to speedup the solving of Ax = b when A is a sparse matrix. The direct LU factorization results in so many fill-ins that A often becomes dense. The iterative method requires a good preconditioner to decrease iteration while the preconditioning may not be achieved easily. The incomplete LU factorization for level k (ILU(k)) is one such preconditioner. Although its computation is more effficient for small k, ILU(k) may still be a potential bottleneck even for small k. We implement an algorithm to compute ILU(k) on a cluster. In the area of linear solvers, many researchers believe it difficult to get better speedup with the distributed memory model due to the need for extensive communication and memory synchronization. However, our results show that a good balance between computation and communication leads to exciting speedups in practice.

Oct. 26, 2006
"Duplicate Detection in Very Large Search and Enumeration"
Dan Kunkle
And this week, we will finish up our extended review of duplicate detection methods.

Oct. 19, 2006
"Duplicate Detection in Very Large Search and Enumeration"
Dan Kunkle
Continued from two weeks ago.

Oct. 5, 2006
"Duplicate Detection in Very Large Search and Enumeration"
Continued from previous sesssion.
Sept. 28, 2006
"Duplicate Detection in Very Large Search and Enumeration"
Dan Kunkle
We examine the challenges associated with the enumeration of very large implicit search spaces. An implicit search space is one which is defined by a start state and a number of generating functions, which produce a set of neighboring states. A fundamental problem for all such search spaces is the enumeration of a spanning tree defining a minimum length path from some identity element to all other elements of the search space.
The fundamental challenge in such enumerations is duplicate detection. That is, determining when any particular state has been previously generated. This is especially challenging when the search space does not fit in memory, requiring the use of disk. We examine a number of different duplicate detection strategies, most using some form of hashing and/or sorting. We conclude with current work towards a general framework for duplicate detection strategies in general.

Sept. 21, 2006
"Very Large Disk-Based Search"
Eric Robinson
For large distributed search, duplicate detection through hashes will not always fit in main memory. We discuss these problems and the issues that arise in dealing with them effectively. We show that a solution that emphasizes the use of additional CPU power is not always as effective as one that is able to take advantage of the distributed disks of the cluster. We present a large distributed search problem where using distributed disk leads to a faster solution than any that could be obtained strictly through using distributed memory. In addition, we present other disk-based algorithms and compare them, determining under what conditions each is optimal.

Sept. 14, 2006
"Introductory Session"
Gene Cooperman
This will be a brief discussion os opportunities for research in high performance computing and in performance-related applications. High performance computing includes both parallel computing, and architectural issues: balancing the resources of CPU/RAM/disk/network. CPU, RAM, disk and network can potentially all operate simultaneously, Algorithms play a part both in reducing the difficulty of applications, and in adapting those algorithms to take advantage of parallel computing using a balance of architectural resources.
===============================

Talks from Previous Semester (Spring, 2006)

Mar. 30, 2006
"How Does one Solve Rubik's Cube Optimally?"
Dan Kunkle
A classic search problem is finding shortest solutions to Rubik's cube. Researchers have looked at this problem with a single processor and external memory (disk-based algorithms), and with parallel processors. We ask how to use all of the weapons available (parallel processors and external memory) to do better than previously. This talk will describe some building blocks of such an algorithm. In the remainder of the talk, methods to putg together the building blocks will be examined as a group activity, with audience participation.

Mar. 15, 2006 (Wed.)
"Symmetry, Search, and Research Problems in Applications to Sudoku and Latin Sqares"
Ravi Sundaram, Dan Kunkle, and Gene Cooperman
For this week, only, we hold the systems seminar on Wednesday. We will employ the often-used triple speaker format --- choosing the variation where lectures are given in sequence rather than in parallel. Gene will present the formalisms of how to use computational group theory to efficiently support symmetries for pruning large search problems. Ravi Sundaram will present results on Sudoku and Latin Squares from the Theory Community, while Dan Kunkle will present results and open problems from outside the Theory Community.

Mar. 9, 2006
"The Issues of Choosing a Good Thesis Topic"
Dan Kunkle
I first considered applying a novel methodology in top-down style, in order to demonstrate on a concrete problem (numerical analysis). I am now considering the opposite, bottom-up style: starting with a concrete "challenge problem" in search and applying novel out-of-core techniques to solve it. He would then use the eventual solution to enunciate a general methodology.

Mar. 2, 2006
"Task Oriented Parallel ILU(k) Factorization and Its Optimization"
Ye Wang
This is a continuation of the previous week's talk.

Feb. 23, 2006
"Task Oriented Parallel ILU(k) Factorization and Its Optimization"
Ye Wang
I will briefly introduce the classic(sequencial) ILU factorization algorithm and the task oriented parallel ILU(k) algorithm. According to the features of the algorithm and the spase matrices, I will optimize the TOP-ILU algorithm.

Dec. 7, 2005
"Checkpointing Master-Worker Parallel Computations"
Jason Ansel
I will explore a system-level transparent method of checkpointing in which a snapshot is taken only of one node. In addition I will talk about utilizing system core dumps as a way to checkpoint a process and some of the issues encountered on the way.

Nov. 30, 2005
"Developing Task-Oriented Parallelism for Linear Algebra" (continued)
Ye Wang
This is a continuation of the previous talk in this seminar.

Nov. 16, 2005
"Developing Task-Oriented Parallelism for Linear Algebra"
Ye Wang
This talk will focus on solving important linear algebra problems, such as ILU preconditioner, using a new tool TOP-C. I will introduce the "Classic" algorithm for ILU preconditioner as well as my new task-oriented algorithm, and make comparison of the two algorithms.

Nov. 9, 2005
"Computational Algorithms for Finding Short Solutions for Rukik's Cube"
Gene Cooperman
This lecture will be a "fun" lecture --- no prior experience necessary. Rubik's cube has long held a fascination for computer scientists ranging, for example, from Korf (in A.I.) to Shamir (in Theory). Finding good solutions (or the best solution) requires enormous computational resources in both space and time. This makes it a wonderful testbed for systems: stressing all of network, CPU, disk and bandwidth of RAM.

We will review a variety of algorithms, including: Sims's algorithm for finding some solution; Korf's modification for finding a reasonable solutions (within a factor of two); Shamir's algorithm for finding the best solution (computationally feasible for solutions up to about 18 or 19 moves); my own enumeration of a subset of states of Rubik's cube using only two bits (0.25 bytes) per state in a Cayley graph from which shortest moves in the subset can be recovered; algebraic bounds on the maximum solution length over all states of Rubik's cube; and the "shape" of any breadth-first enumeration of states, in terms of the size of each successive level.

Nov. 2, 2005
Automatic Algorithm Selection
Dan Kunkle
I will describe my proposed automatic algorithm selection system: a system for determining the right algorithm and its parameters given a specific problem to solve. This method is particularly useful when here are many possible algorithm variants for a single problem, the appropriate algorithm is dependent on the actual problem instance being solved, there are no simple rules mapping problem instance to algorithm variant, and the algorithms are dependent on implementation and computing architecture. The system involves methods from multiobjective optimization, case based reasoning, and supervised learning. I will highlight the process of constructing the automatic algorithm selection system with an illustrative example: high performance solving of linear systems of equations.

Oct. 26, 2005
"Distributed Framework for Computation of Very Large Trees"
Eric Robinson
This talk focuses on the computation of trees too large to be computed on a single machine. For trees such as this, it is useful to have an abstract framework to change a local tree computation into a distributed one.

We provide such a framework and show how it can be used to compute over two different trees. One used in the computation of matrix groups, and the other used to compute chess endgame trees.

===============================

Talks from Previous Semester (Spring, 2005)

Feb. 3, 2005
"Ischemic Stroke and a new approach in Machine Learning"
Viet Ha Nguyen

For the introductory lecture this semester, Viet Ha Nguyen will talk on his work in learning and ischemic stroke. He will informally describe how his application-oriented thesis led to a new approach in machine learning.

Ha has worked closely with a medical research group at MGH (Mass. General Hospital) for six months. At the end of six months, he had solved the problem that the "doctors" had assigned him: improve the accuracy of a model predicting which parts of the brain were at risk of dying after ischemic stroke.

However, this striking success in a particular medical application still needed to be translated into a general computer science technique, suitable for a thesis. Ha will describe both his original result, and how he generalized that result, so as to apply to more than a single application.

Feb. 18, 2005 (note, Friday)
"Fast Query Processing Using Cooperative CPU Caching for Index Structures"
Xiaoqin Ma

Data Intensive applications on clusters often require that requests quickly be sent to the node managing the desired data. In many applications, one must look through a sorted tree structure to determine the responsible node for accessing or storing the data. Examples include sensor networks, network routing tables, routing in publish-subscribe middleware, and database indexes. When the tree structure is larger than the CPU cache, the standard implementation potentially incurs a cache miss for a lookup at each successive level of the tree. This significantly degrades the performance because a single cache miss of one word will force an entire cache line to be loaded (e.g. 128 bytes in the case of the Pentium 4). As the CPU-RAM gap grows, this performance degradation will only become worse in the future.

Mar. 10, 2005
"The Conquering of the Baby Monster: A Problem of Resource Balancing"
Eric Robinson

The Baby Monster is a permutation group in a well-known ladder of challenge problems, the sporadic simple groups. Solving for this group is difficult not only in terms of the time for the computation, but also space. We provide a distributed disk-based solution and show that this solution is only marginally slower than an optimal solution given infinite disk and memory resources. In addition, we generalize this solution and show its application in the area of large tree discovery.

Mar. 31, 2005
"Some Research Problems in Systems"
Gene Cooperman

I will discuss some open research problems in systems. This will be an interactive talk. We will learn to propose research problems "by doing it". Some of the topics below are meant simply to be seeds for thought.

They include semi-automatic parallelization (one of my continuing favorites). The TOP-C model should make it exceptionally easy to parallelize for/while loops. There are standard programs for transforming fragments of C, such as CIL. What other constructs could be parallelized?

The MBRAM model can be used as the basis for choosing among compiler optimizations. How can one do that effectively? This could be a very hot area.

What about extending MBRAM to SMP or to distributed memory? For a dual-core chip, what can you say?

The clusters of tomorrow will probably be multiple CPUs at each node, just because of the trend toward dual and quad-core CPU chips. How does one reorganize a parallel computation (TOP-C or other) to take advantage of this?

Can the search methods of computational algebra be repackaged for other kinds of search. In the case of Eric, chess end games had been suggested.

On the area of neural nets, what will be an efficient memory access pattern if you have millions of neural net nodes at each layer?

The current rules of efficient memory access dictate different kinds of sorting algorithms. Can one produce a faster integer sorting algorithm than quicksort or mergesort? (Note that one can test one's ideas without coding by using the MBRAM model.)

Groebner bases are an interesting construct in computational algebra, which, in some sense, "search" for solutions to systems of polynomial equations. Can that search be executed in a more cache-efficient manner?

Cayley graphs are a kind of deterministic automaton. There are interesting types of graphs that are most easily constructed using Cayley graphs. Can one search for a good graph in such a space of graphs?

Apr. 7, 2005
"Towards a Mathematical Justification for Spatial Preconditioning in Spatially Distributed Feature Learning"
Gene Cooperman
This Thursday, I will talk on the current work of Viet Ha Nguyen (since he's in Japan). Ha had a nice applied solution to a problem in brain scans. By generalizing, he developed a new heuristic: spatial preconditioning. The idea is:
  1. learn the spatial influence first
  2. Then do feature learning locally, using the modified features based on the spatial influence of nearest neighbors.
The idea has been demonstrated to work nicely in experiments. We now need to look for a mathematical justification. The talk will discuss the first attempts in that direction. - Gene