The TeachScheme! (pronounced: teach Scheme, not) project produced "How to Design Programs" (HtDP), which represents the product of 12 years of intensive investigations on how to deal with and how to reach freshmen with principles.
I am proposing an analogous "Logic in Programming and Computing" (LiPC) project for a second-semester freshman course, as a natural follow-up to HtDP. The objective of LiPC should be to teach students reasoning about programming and computation, using ACL2-like automatic theorem proving as a motivational tool. The focus of LiPC is on principles that apply to a wide variety of situations, including those when our graduates will be working in a setting where automatic theorem provers aren’t available.
LiPC should have a denotational scope similar to HtDP, but a significantly shorter operational extent. In other words, it should produce insights and products similar to those of HtDP and should be disseminable in a similar manner. At the same time, it should not take 12 years to get it right; not even the five or six it took us. While it is obviously impossible to create LiPC in a semester or even two, I believe that by exploiting the experience of the TeachScheme! project, we can reach the goal in three or four years at most.
Before setting out on a description of the goals and the development process, let me enumerate a few essential "don’t"s. Some of those were "holy cows" for me when I started the project; if it weren’t for the large positive effect that "slaughtering" them had on the HtDP project, I wouldn’t list them here.
Just like HtDP does not teach Scheme, LiPC should not teach ACL2. Our freshmen are unlikely to get a job in ACL2 theorem proving now. There aren’t just that many ACL2 jobs around. Even if one of them does get such a job eventually, it will only happen because the student will have studied ACL2 and worked with the system in upper-level courses and projects.
Just like HtDP doesn’t rely on "EMACS" (put your favorite IDE here) for editing and running Scheme programs, LiPC should not rely on an environment that is suitable for professional ACL2 programmers. (If whatever we develop evolves into something useful for such programmers, so be it.) This also applies to our beloved Read-Eval-Print Loop (REPL). Giving up "EMACS" and the REPL took me a long time. But once I did, the HtDP project made a large step forward. Again, it is not a goal of LiPC to familiarize students with IDE tools; if they need them later, they will acquire this skill.
Just like HtDP doesn’t rely on the full Scheme run-time environment, LiPC should not rely on the full ACL2 run-time libraries. Since ACL2 is also a theorem prover, not just a language, this means that LiPC may not end up using the exact proof theory that is currently available for ACL2 surface syntax. As with the language and the IDE, it is a waste of time to teach exactly what ACL2 can currently (not) prove. For example, it is perfectly acceptable to use a language where cons always creates a proper list, because this idea jives with what students know and saying differently has no bearing on the course objectives.
Last but not least, just like HtDP assumes no background in programming and computing (but an understanding of simple algebra and geometry), LiPC should not assume more than a (unstable) background in HtDP and discrete mathematics. Also, LiPC should avoid bringing in too many advanced concepts (hardware, software) or teaching too many non-essential skills. Students in this age group have a difficult time distinguishing between the essence and the peripheral. Detours distract young minds.
If we agree on the goal to teach principles of logical reasoning in the context of programming, using an ACL2-like system as motivation, I see the following series of subgoals as the natural skeleton around which we create the course:
The first objective is to teach students how to formulate informal claims about functions and data definitions as theorems.
Example: In HtDP, we may mention that the length of (append l m) is the same as the sum of length of l and the length of m. In LiPC, we should express this as the theorem that "for all lists l and m, (= (length (append l m)) (+ (length l) (length m)))."
We could do so with just the built-in functions of ACL2 and never define our own and get quite far. It wouldn’t catch students’ imagination, however. Thus, before we implement this objective, we must teach students some teaching subset of ACL2.
The zeroth objective is to switch syntax, i.e., to introduce the basic vocabulary for expressing programs from the HtDP Beginning Student language in ACL2. Following 22 years of experience with The Little LISPer, I insist on avoiding numbers at first and focusing on "structures" and "lists" instead.
[(null? x) '()]
[else (cons 'a (f (rest x)))]))
be the "poem" that expresses this idea.
The second objective is to teach students that some theorems are automatically provable.
Example: The theorem on the relationship between length and append is a natural. We should make a catalog of theorems that students could formulate and prove without any intervention in the chosen subset of ACL2.
The third objective is to teach students that ACL2 cannot prove all theorems automatically. At this point we should leave it open whether this proof fails due to shortcomings of ACL2 (e.g., a lack of lemmas) or due to problems with logic (i.e., some true theorems aren’t provable).
This forces us to take a step back and to study what "theorem", "proof", and "truth" means.
The fourth objective is to introduce propositional logic and then ACL2-like logic with quantifiers, always using programming and computing as examples. That is, "theorems" are about programs, computing is "truth", and "proofs" is a finite "game." The umpire of the game is a formal system, which we and students can use to judge whether a finite sequence of reasoning is a proof (or not).
Example: A Type System could be used to introduce propositional logic (though it is of course constructive, not clasical).
For this stage, LiPC must focus on proofs in the formal logic. It is the only way in which students can understand how rigorous a proof must be to count as such. In particular, students should learn to use lemmas in this phase, seeing them as nothing but re-usable building blocks.
Motivation: continue to ask students for simple programs in ACL2 as the "subject matter."
The fifth goal is to introduce students to the workings of ACL2 based on their knowledge of "proofs" and "truths". In particular, we need to show them that ACL2 can prove certain theorems if we help it along with theorems.
Example: sort (flesh out)
Last but not least, the sixth goal is to introduce the idea of theorems that are true but not provable. I do not consider it essential that we prove this meta-theorem; stating it and giving an intuition is enough at this stage in their career.
When I created the TeachScheme! project, all of us (two professors, an instructor, five or six graduate students) were extremely naive about teaching freshman course. Sure all of us had done it once, but none of us realized what it really meant. It wasn’t until I started visiting the lab for hours and keeping my office doors open all the time for all the students, that I saw how badly even the very best (1600 SATs) students understood the principles and how irrelevant details confused them.
So what did we learn? We learned many things, large and small ideas, but there are three key ideas that are so interrelated that we ended up calling them the "trinity:" the language, the IDE, and the design recipe. Here I am going to focus on explaining what they are. The "why" boils down to a single thesis:
Error matters most.
Everyone’s happy when things just work. When you’re stuck (in an error state), however, your "system" shows its true mettle. If it offers a way to recover, it is good; if not, design it again. The word "system" refers to everything: the system of designing programs (can you use it in your office to get a student unstuck?); the system of the language; and the system of the language context.
At some point this kid from an elite private prep school with a near-perfect SAT came to my office, in tears over some Scheme function that just didn’t do what he had expected:
(define (f l)
[(null? l) 0]
[else 1 + (f (cdr l))]))
No matter to what he applied f, all he got was 0. How could that be? For a long time, I had written off this kind of problem as the problem of weak students, kids who couldn’t even count parentheses. How could I tell a really strong student who understood all the principles that this was wrong?
I had also written off debates from "Haskellians" that "calculating is better than scheming" (Wadler’s famous pamphlet) as words from the jealous. But when I stopped being combative and started observing like a scientist, I realized that these problems existed.
Scheme suffers from many such problems, and now I know that that all programming languages suffer from such flaws. The reason is that a compiler assumes that you know the entire language when it explains a lexer, parser, or type-checking error with an error message. Of course, this assumption is wrong for novice programmers - and novice "reasoners."
Our response was the development of teaching language that the compiler knows about. Even if compiler reports the error assuming you know the language, this language is now so small that it is fair to say that a student gets to know the entire language in one lecture. (Not really true: we distribute functions, conditionals, and structures over three, but this seems to be enough.) In the end, we realized that we needed several languages: one per design concept that we introduced. This is how we ended up with five languages, and why Eli Barzilay is experimenting with one language per two-week section in his course.
To this day, people think that a "pedagogic IDE" or a "learning environment" is one that is like a professional one with a few buttons and menus turned off. My lab observations suggest that this is not so at all. The more options students have (buttons, panels, menus - in this order), the more they "click and clack" when they receive an error message. Perhaps, they think, there is some button that I overlooked. It surprises me to this day how many students fail to actually read the error message that DrScheme prints. Once they do, most of them know what to do next. So in the first few labs, we insist on making them read the message. It works wonders.
How did we get there? We carefully studied which minimal functionality we really needed. We came up with five, each represented with a button. The first three are SAVE, RUN, and STOP; there is no question why we need those. The fourth one is STEP; by now we know that we should implement it differently and we will do so some day. Lastly, we added CHECK SYNTAX so that we could explain scope and renaming and simple syntactic concepts. I consider it a mistake that this function shows up on the first day, but it doesn’t seem to disturb too much.
As for the REPL, we eventually agreed that it was important to curtail the power of the REPL. They do not need to define function, set traces, enter debuggers, and other things. They also don’t need to selectively send expressions from some buffer to the REPL. If they can do any of these things, they usually fail to understand in which state the REPL is and why it responds in certain ways. And, eventually I understood, that none of these advanced features helps students understand the principles.
What students really need is an IDE that supports a simple REPL useful for novices. Such a REPL supports the exploration of which primitive functions exist, what they do, how expressions are evaluated. That’s all. And, it should always be clear in which definitional context an evaluation takes place. There should never be any chance for confusion.
Naturally, students must grow into professional environments at some point. As I have tried to clarify, however, you are better off if some other instructor does it for you or in parallel with you. Otherwise you end up spending endless hours solving IDE problems that distract from the true objectives and just cost you too much of the little time you have with them.
The design of programs was the most difficult part. The basic idea of matching function organizations to data organizations goes back to my early 1980s readings of algebraic approaches to studying data and types. It scales to higher-order. Generative recursion came later. Accumulators go back to Mitch Wand’s paper on CPS and accumulator-style, though I had to generalize the notion to make it fit. All of these ideas show up in some form or other in my re-writings of The Little LISPer (later, Little Schemer).
The difficult part came when students came to my office and asked me to help debug a program. Like Abelson and Sussman’s SICP, we used to introduce the entire language at once, and they used it all at once. So naturally 3-line problems turned into 2-page programs. Easy! I would run it, wait for the error message, jump into the debugger, point to their problem, suggest a local fix, and - oops - the program didn’t signal this anymore. The students would leave in a much happier state than when they arrived, but eventually I realized I wasn’t helping them.
At that point, I remembered my roots and introduced a six-step process as a context in which they match data organizations to function organizations. Then I started insisting that we always ask which stages they had completed. And that we’d ask them for the intermediate products that result from each step. The more systematically we asked, the fewer problems we saw. Students learned to help themselves. They could "unstick" themselves out of error states.
As mentioned in the introduction, we should be able to exploit the HtDP experience to cut the development time of LiPC to half or a third. Let me describe how I imagine doing so, starting with the idea that LiPC can assume some understanding of HtDP on the side of students and moving on to the trinity of HtDP.
While ACL2’s programming language isn’t Scheme, it is the closest relative that comes with a theorem prover. Since the philosophy of the theorem prover matches the philosophy of the design recipe, it is still a natural match. What I mean is of course that if students design their programs using the recipe, the idea of reasoning logically and the mechanisms of the theorem prover pick up on several threads of HtDP:
The data definition step asks students to reason about the subset of the Data Universe that is relevant. For work in logic and ACL2, this subset is usually turned into an antecedent of theorems.
In HtDP, we encourage students to think of the purpose statement as a specification of "what" the function does, not "how" it does it. The "how" becomes relevant only when we move to generative recursion in chapter V. - For logic and ACL2, "what" specification can often be turned into theorems that partially characterize the function. - Note: We do not use words like "specification" and "implementation" when we teach HtDP. Freshmen are extremely limited in their vocabulary and "big words" (such as specification and implementation) seem to intimidate them. We introduce them slowly and back off often.
Most function definitions in HtDP (first order) are organized according to the template, which is of course based on the principles of structural recursion/induction. In my (admittedly limited) experience with ACL2, it is much easier to work with functions that use plain structural recursion than spaghetti-code functions. (See above: we used to give students the "simple language of first-order Lisp" and the code we saw was brutal. Students come up with the worst possible solution that you can imagine, even if you take this sentence into account in your predictions.) Thus, LiPC not only introduces students to logic it also reinforces HtDP.
Last but not least, the HtDP design recipe (task structure) concludes with "turn the examples into tests and run them." I consider it natural to turn tests into general conjectures, to test conjectures (see IDE below), and to attempt proofs if the conjectures stand up to probing. This is how real mathematicians work, and working programmers should work in this manner, too.
There are other connections. For example, the design recipe adapted to generative recursion demands a "termination argument" because the design recipe no longer guarantees termination per se. I suspect that this termination argument could be turned into formal statements in ACL2. For now, I am just trying to hit the high lights.
Having said all this, I fully understand that average students never absorb a subject as much as we, the teachers, imagine. Concretely, just because LiPC follows HtDP, we cannot assume that students are complete comfortable with the design recipe’s two dimensions (data, task structure); the syntax of the languages; the IDE; the problem statements; or the informal logical reasoning that HtDP employs. This is not a disadvantage, this is normal. In this sense, LiPC serves the natural function of "repetition material" of primary and secondary schools. There, teachers typically spend at least one third of the school year repeating what students learned before. This helps the average student to catch up and to deepen his understanding. (Yes, it may turn some talented students off.) Since a freshman is transitioning from high school to college, I believe that having some repetition function built into LiPC is actually a "good thing." It is challenging as it is; no other college has anything remotely comparable.
The first two teaching languages of HtDP are rich enough for LiPC. As explained above, the restrictions of the first language to proper lists and quote-for-symbols significantly facilitate the teaching of programming principles. I conjecture that this is also true for teaching the principles of logical reasoning about code. I cannot see how the addition of improper lists and the addition of quoted S-expression can help students any with the principles.
Naturally, I do understand that restricting the language is insufficient. The theorem prover knows about (1 . 2) and assumes that any function can be applied to such "junk" elements.
Challenge: I do not consider this a true problem, however. If your IDE separates the programming language and programming part from the theorem proving part (as DrScheme/ACL2 does, for example), it should be possible to create a (probably non-executable) theory (package) in the theorem prover in which student programs are dealt with as opposed to the built-in one. See The challenge.
I consider an alignment with HtDC/213 the proper solution, meaning I think you should start with DrScheme/ACL2 (possibly with the "sedan" prover), which is what students are guaranteed to know, and switch to Eclipse/ACL2s (aka "Sedan") when the HtDC/213 course does.
This suggestions assumes that the creation and control of teaching languages is significantly easier in DrScheme than in Eclipse. It also assumes that when/if you switch to the "Sedan", you are switching to the full language and the full theorem prover just like HtDC/213 switches to full Java and full Eclipse projects. By the way, when HtDC/213 switches from the restricted Java-like teaching languages to Eclipse/Java, we provide run-time libraries with the exact same interface as those found in the ProfessorJ context of DrScheme (for the methods. We took the liberty of introducing syntax for testing, because Java just lacks so many things in this regard.)
Finally, if we take the "conjecture" item (see first subsection) seriously, we also want QuickCheck-like tools in our toolkit. Realistically, checking and probing general conjectures (as opposed to full-fledged theorem proving) is the most practical insight that students will take away from LiPC. Indeed, I conjecture that QuickCheck-like tools will show up in a range of IDEs, so that students will actually be able to use them in their work (soon). As you know, DrScheme already includes a QuickCheck-like facility for its ACL2 teaching language.
(Personal opinion: I wouldn’t switch all. I believe that the introduction of Eclipse in HtDC/213 is necessary because students must be prepared for their first co-op or summer internship. So seeing a second IDE is necessary, just to get a taste. Based on my information, the majority of co-op and internship students do not use Eclipse or Java for their first foray into industry. Instead, they do not program or they use some scripting language, much more akin to Scheme than a heavy-iron OO language. This is another issue the College must tackle some day.)
The first subsection has already addressed how LiPC and HtDP complement each other with respect to the design recipe.
What is currently missing from the LiPC side of the equation is a method for reasoning about proofs. I appreciate the effort put into "The Method" of the ACL2 text books, but I do not think that it measures up to the design recipe.
Challenge: LiPC needs a task structure for "proof debugging" and possibly a task structure for "proof synthesis." The former refers to the problem of attempting a proof and failing. We need to create a task structure that is appropriate for freshmen so that they can recover from such "stuck states" (see above). The latter refers to the idea of coming up with a systematic way of going from functions to conjecture theorems. I suspect that the idea of enriching tests to conjectures and then probing conjectures systematically puts us on the right path.
When I first created HtDP, I expected that creating a teaching project would kill my research career. Far from it. Eventually I realized that constructing a teaching environment enriched my research program with opportunities for testing ideas in practice and for broadening it. It also created an entire second dimension to my research reputation. And on a personal level, I am extremely happy that my teaching inspires my research and my research inspires my teaching. For most faculty members, teaching – which is what pays 9 months of their salary – is an unwelcome duty that they shirk as much as possible; if they have a conscience, this should bother them a lot (and I bet many are bothered). My approach has unified my existence with my profession.
As I envision it, LiPC poses at least two concrete research problems already. The first concerns the idea of restricting ACL2’s reasoning power to match language subsets. The second concerns the idea of ’proof debugging’. This second one is something that has been on my radar screen since Mitch’s paper on "understanding type errors in HM" appeared. Understanding failures i automatic theorem provers (such as HM or ACL2) is an extremely difficult problem, with theoretical and practical dimensions, and I have used it on several occasions to publicly challenge J Moore (in invited talks). I think we can solve it (at the practical level) and LiPC demands that we do. So there is research and teaching involved in tackling LiPC.