CSG 260 Advanced Software Development

Tools for Systems Biology

Biclustering

Biclustering is an important subroutine in analyzing chemical and biological data. Mark Boguski and Bernhard Rohde from Novartis brought this problem to my attention.

Biclustering (at least some versions) has the same property as MAX-SAT: It is NP-hard in general and we are interested in algorithms that work well on problems in drug development.

Data from the 60 NIH cancer cell lines: http://discover.nci.nih.gov/nature2000/ could be used to test your algorithm.

From Wikipedia: There are many biclustering algorithms developed, including: Block clustering, CTWC, ITWC, FLOC, OPC, Plaid Model, OPSMs, Gibbs, SAMBA, cMonkey, PRMs and DCC. As you can see, biclustering in itself is a rich problem with many proposed solutions.