The History of Non-Chronological Backtracking
I approached Gerald Sussman about his early work in the 1970s on non-chronological backtracking (which he calls dependency-directed backtracking). This work has been built upon in the Artificial Intelligence community (SAT and CSP solver community), resulting in clause-learning SAT and CSP solvers that can solve big instances.

Also in the 1970s, I independently developed clause learning technology which I termed Superresolution. I formalized Superresolution as a complete proof systems and showed polynomial equivalence to general resolution.

Gerry suggested that we both write down our stories behind non-chronological backtracking. He pointed out that it does not matter who was first and that he has never claimed priority in any case about any idea, and certainly not about "clause learning." He pointed out that " the two explanations can show how it is possible for people to obtain similar results from completely different motivations and by different paths. This is a good lesson for students to be exposed to." First comes Gerry Sussman's story followed by mine (Note: RMS = Richard Stallman). Gerry Sussman wrote:

For me science is a cooperative endeavor
rather than a competitive game: the most important
thing is that we all have fun and help each other to
do good work.  The real goal is for the community to
discover good ideas and find ways to use these
discoveries for the good of humanity.  In my view, we
(the community of scientists and engineers) invent
and discover things together.

I think that the history of our work on this matter
is rather interesting.  The 1975 work of RMS and me
was directed at capturing the way I observed how
expert circuit designers analyzed circuits.  (Note: I
am a "Professor of Electrical Engineering," not of
"Theoretical Computer Science.")

I was teaching introductory circuit theory to
beginning students at the time.  Indeed, my goal was
understanding how to improve my teaching by finding a
way to give my students a "formal" description about
HOW TO quickly determine the behavior of a circuit.
To this end Richard and I wrote a program that could
explain the results of an electrical analysis (MIT
AIM-328, March 1975), in terms of dependencies -- the
"audit trail" of derivations of circuit variables.

However, some circuit components can be thought of as
operating in a small number of distinct "states."
For example, a diode may be conducting or not
conducting.  If the wrong choice is made of the
states of such components the analysis will yield a
contradiction.  We naturally looked at the
explanations of the contradictions.  Once we could
explain the reason for deriving a contradiction we
realized that we could automatically extract the set
of base assumptions that the contradiction depended
upon and make the code that resolved the
contradiction remember not to reassume any such
"nogood" set again.  So we came to the idea of
dependency-directed backtracking by a path that was
very focused on the circuit problem.

I was predisposed to think about things in that way
because of my PhD thesis (1973), which was about how
to make programs that debug themselves by automatic
analysis of their mistakes.  Thus it was natural for
me to think in terms of how to determine and assign
blame for a failure.

My former students Jon Doyle and David McAllester,
and later Ken Forbus and Johan deKleer, abstracted
dependency-directed backtracking into the various
forms of TMS now commonly used in AI programs.  Thus
the invention of these valuable mechanisms and
powerful ideas were really the result of a long-term
joint effort of many contributors, no doubt including
your nice work.

I hope that my view of the history is helpful and
Perhaps we can have a beer together in celebration of the
scientific enterprise and the fact that we have both
enjoyed contributing to it!

Gerald Jay Sussman
Panasonic Professor of Electrical Engineering
Massachusetts Institute of Technology
Here comes Karl Lieberherr's story.
I was fascinated by David Johnson's work on approximation
algorithms for the satisfiability problem.
Johnson published a simple algorithm based on analyzing 
the clause structure of the CNF (conjunctive normal form) 
and he showed that 
you could efficiently find an assignment that was
as good as the prediction. The prediction was based
on the fraction of satisfied clauses by a random assignment.

I asked the question: how can we find a better assignment 
than the one that Johnson's algorithm guarantees?
I wanted to enhance Johnson's algorithm to satisfy all clauses.
Johnson's algorithm, while creating a partial assignment,
will eventually run into a conflict (a clause becomes 

In this situation we would like to learn a clause that
prevents the "mistake" we just made from happening again
without limiting any variable ordering. First (in the ETH
Informatik TR 14) I used resolvents to prevent the 
mistake. Later, for the PhD thesis, I introduced
superresolution which is simpler and allows for
easy generalization to the MAX-CSP case.
Superresolution, in its simplest form, adds a clause that
consists of the disjunction of negated decision literals.

In my dissertation I first researched various algorithms that
produce "minimal" superresolvents based on different
learning schemes. Then later I developed an abstract proof system
for superresolution as described in modern form in:

New Paper on Superresolution for MAX-CSP.

An important idea was that superresolution and Johnson's algorithm would enter a symbiosis because the learned clauses modify the clause structure and will hopefully lead to better choices made by Johnson's algorithm. This idea is still under investigation. Back in the 70s, I talked with Ernst Specker, a very much liked mathematics professor at ETH, about my investigation. I already did my Master's thesis with him. He took my investigation to the next level of complexity and asked the following fundamental question: Which fraction of the clauses in a CNF (conjunctive normal form) can always be satisfied? From Johnson's algorithm we get the trivial answer: 0.5. But what happens, if we consider 2-satisfiable CNFs: where each pair of clauses is satisfiable. The answer turned out to be g=(sqrt(5) - 1)/2 and a completely different algorithm was born based on doubly-transitive permutation groups resulting in a FOCS 1979 and J. ACM 1981 publication. This result is best possible also from an algorithmic point of view: The set of 2-satisfiable CNFs where the fraction g+e can be satisfied, is NP-complete (e an arbitrary small constant). I developed these ideas a bit further and freed the algorithm from doubly-transitive permutation groups using an idea by Erdos and Spencer for derandomization. With Steve Vavasis I introduced the idea of look-ahead polynomials. Only in 2006, during my sabbatical at Novartis, I resumed my interest in MAX-CSP because of its many applications in biology. A 2007 technical report with Ahmed Abdelmeged, Christine Hang and Daniel Rinehart combines superresolution and look-ahead polynomials into a promising algorithmic system called SPOT. Karl J. Lieberherr Professor of Computer and Information Science Northeastern University

This is an interesting example of how two teams, an MIT team and an ETH/Princeton/Northeastern team developed similar ideas but using different motivations. I agree with Gerry that we should celebrate the scientific enterprise. Maybe we can even have a glass of champagne to look forward to many more years of contributing to the scientific enterprise!