Studies of the Biotechnology industry have shown that employers are increasingly interested in employing people with good backgrounds in both biology and computing. This course has been designed with such people in mind, but it should prove useful to people with a variety of backgrounds.
Many people in industry "pick up" some programming by reading and working on their own. But a lack of proper grounding in the principles underlying computer languages and programming is often a serious limitation to their work and continued learning. This course emphasizes the basics that underlie all programming languages, whether they be Java or Perl or C or C++ or Python or any of the many scripting languages. The course does this by emphasizing specific, hands-on examples using what is perhaps the cleanest and broadest language to date, Java. There are many in biology who use Perl, mainly for its pattern matching capabilities. But Java with its "write once, run anywhere" design, its built-in graphics, its use in browser applets, and its powerful pattern matching capabilities (Perl 5 equivalent), is a superb language for a variety of applications. And of course, most of the principles illustrated by the Java language are the same as those that underlie the other languages just mentioned.
It follows that most important statement we can make about this course is that it will give you the background you need to deal with virtually any programming language you want to learn and use in the future. (And it should give you more insight into any language(s) you are currently using.)
A brief syllabus for the course is given near the end of this document. It is not fully fleshed out because we intend to adapt the course to the needs and skills of the students as the course progresses. But the following discussion, keyed to the numbers in the syllabus topics, should help to clarify what will be covered. To learn more, simply take the course!
1. C++, Java, Python and now Perl, are "object-oriented", and Java is rapidly becoming the most popular object-oriented language there is. An "object" can contain a variety of related data items, arranged in an orderly way. At the same time, it contains functionality that allows you to do things with and to the data. For example, you could design an object to contain a DNA sequence and accompanying functions that would produce the complementary sequence or the amino acid sequence for the coding regions, all neatly packaged together into an object. You could then create as many of these objects as you like, each corresponding to a different sequence, but all sharing the same functionality.
2. To solve complex problems, one usually proceeds by breaking the problem into subproblems, the "divide-and-conquer" strategy. This is reflected in programs by collections of objects and functionality that solve the subproblems and reassemble the final solution. To decide which subproblem needs to be attacked at each stage, conditionals are used. For example, in interactive programs, entering one thing will cause the program to operate in one way and entering another will cause different behavior. You handle this by including conditionals in your programs.
3. Recursion is one of the most simple and natural ways to solve many problems that can be solved by a divide-and-conquer strategy or that simply have to process a collection of elements, one at a time. To use recursion to solve a problem, you only need to design a single function that does one of two things: Solve a specific subproblem or "end-case", or apply the function again to the next subproblem or problems. Because of its simplicity and elegance, this recursion strategy is taught early in the course.
4. Large data objects such as sequences or collections of similar entities such as Genbank entries, can be stored in arrays. An array is very much like a column in a spreadsheet or database. The contents of entire arrays can be processed by recursion or simple iteration, in which the same process is applied once to each element of the array.
5. The beauty of objects is that you can design and use your own objects to match that data and the problems you have to solve. This section of the course explains how to go about this in a systematic way.
6. Not only can you design objects of your own, but you can build them up from already available objects, adding only the extra data and function elements you need to those that already exist. This is particularly useful in graphics and animation, where many powerful and useful objects are already predefined for you and you only need to augment them a bit to use them for your purposes.
7. Another aspect of Java's rich infrastructure is its ability to construct interactive dialogs, with buttons, fill-in text fields, menus, and more. This section of the course will show you how to create such dialogs and to interact with the user's input. Most of the fancy interactive animated displays that you see on various biology websites are applets, written in Java, that are downloaded and run in your browser. An applet is just a piece of Java code satisfying a few special conventions that are particular to applets. You can write applets yourself and try them out, after seeing some examples.
8. Writing a complete applet with an interactive interface is the final topic of the course. Once you've done this, you have a very powerful and flexible set of tools at your command. Using this knowledge you can write real and useful applications.
Professor Futrelle at Northeastern, who is organizing this course, has an extensive background in both biology and computer science. He was on the Biology faculty at the University of Illinois, Urbana-Champaign, for ten years before joining the College of Computer Science at Northeastern in 1986. At Northeastern he founded and directs the Biological Knowledge Laboratory which focuses on the automated analysis of the content of the scientific literature, especially the biological literature. This involves research and development on natural language processing, diagram understanding, ontologies and more.Elaine Yang, the course instructor, is an MIT EECS graduate who now works at MIT. She has extensive teaching experience at both Wellesley and Northeastern. She has done a good deal of teaching using the Java language.