CSG 260 Advanced Software Development

Tools for Systems Biology

Why MAX-SAT?

The Maximum Satisfiability (MAX-SAT) problem is an example of a constraint satisfaction problem and it plays an important role (also outside Systems Biology). Bayesian Networks are important in Systems Biology and there is a strong similarity between the MPE problem and MAX-SAT (Park 2002). The Most Probable Explanation (MPE) is the problem of finding the variable instantiation of a Bayesian network that has the highest probability given some evidence.

Markov Logic Networks (Pedro Domingos et al. 2005) combine probability and first-order logic in a single representation and to find the most likely world involves solving a MAX-SAT problem. Markov Logic Networks are believed to be important in Systems Biology. In Markov Logic Networks, the constraints are first-order predicate logic formulas. By normalizing and grounding those formulas, a MAX-SAT problem is obtained.

The MAX-SAT problem also plays an important role in inferring protein interactions (Zhang et al.). The SAT problem (MAX-SAT where all clauses are satisfied) is also important in haplotype inference (Ines Lynce et al. 2006). Using SAT, large instances can be solved.

Given the central role of MAX-SAT we will first apply advanced software development techniques to implement MAX-SAT and its variations. We will focus on the Generalized MAX-SAT problem where the constraints are formulated in terms of a fixed set of relations. Generalized MAX-SAT research falls into two categories:

1. Algorithms that do well in practice (based on empirical comparisons)

2. Algorithms that do well in theory. There are two camps: (2.1) algorithms that come provably close to the optimum. There is a large number of specialized results in this area. (2.2) algorithms that provably satisfy a fraction of the total number of constraints. There is a very nice uniform algorithm (called MAXMEAN) that solves the problem in an optimal way: If there would be a provably better algorithm, then P=NP. It is this uniform algorithm that we are going to implement using adaptive and aspect-oriented programming and we will compare its performance with other published algorithms. This comparison has not been done before so there is a good chance that we discover some new, exciting knowledge during the course. Of course, my hope is that MAXMEAN will outperform the currently best algorithms.

Why is MAXMEAN an interesting candidate for implementation?

1. I know it intimately: I have developed it with Ernst Specker while at ETH Zurich and Princeton University: P-Optimal algorithms.

2. Since the algorithm was developed some 30 years ago, the importance of the satisfiability problem steadily increased. There are conferences dedicated to satisfiability: SAT Live.

3. While the original Journal of the ACM paper:

@article{322260,
author = {K. J. Lieberherr and E. Specker},
title = {Complexity of Partial Satisfaction},
journal = {J. ACM},
volume = {28},
number = {2},
year = {1981},
issn = {0004-5411},
pages = {411--421},
doi = { http://doi.acm.org/10.1145/322248.322260 },
publisher = {ACM Press},
address = {New York, NY, USA},
}
has been frequently cited by the theoretical algorithms community, the more practical paper:
Karl J. Lieberherr, Algorithmic extremal problems in combinatorial optimization, 
Journal of Algorithms, Volume 3, Issue 3, , September 1982, Pages 225-244.
(http://www.sciencedirect.com/science/article/B6WH3-4D7JMY4-2X/2/0013e87a3b79ff12cd3c71596646ac77)
seems to be forgotten by the empirical algorithms community. One reason is that the Journal of Algorithms paper is safely hidden behind Elsevier's web page although it would be much more accessible to practically motivated computer scientists athan the JACM paper.

3. MAXMEAN is very simple, broadly applicable, provably optimal in a very interesting way and we have an excellent opportunity to compare it with other practically useful MAXSAT algorithms.

MAXMEAN is basically a randomized algorithm that has been derandomized using the Method of Conditional Expectations (Erdoes/Selfridge 1973). The randomized algorithm uses carefully bent coins. MAXMEAN is actually a family of algorithms, one algorithm for each fixed family of relations. Adaptive Programming (AP) is an advanced software development technique that is designed for implementing families of algorithms and it will be interesting to explore AP on MAXMEAN.

Aspect-Oriented Programming (AOP) will be used to incrementally develop the algorithm by adding new features using aspects. AOP will also be used to instrument the algorithm with statistics without modifying the source code of the basic algorithm. AOP will be used to turn the maximization algorithm into a minimization algorithm using an aspect.

There is an extensive literature on empirically comparing algorithms for MAX-SAT. We will have to find the best and compare with our own implementation. Algorithms we should consider are: Discrete Lagrangian Multipliers (DLM) (Wah and Shang 1997) and Guided Local Search (GLS) (Mills and Tsang 2000) and MaxWalkSat. We want a useful general purpose tool for systems biology that is easily adaptable and specializable. An AOP organization should help.