Phillip Mates' Articles

Contract Verification using Symbolic Execution

2013-02-19T00:00:00Z

For Olin Shiver's Program Analysis course this semester, I'm presenting a lecture on “Higher-Order Symbolic Execution via Contracts” by Sam Tobin-Hochstadt and David Van Horn. I'm writing this post to prepare for my presentation, hence it is mostly a reinterpretation of the paper mentioned. With that said, let's get started!

Adapting symbolic execution to work with contracts provides two main gains: increasing the efficiency of running programs with contract and allowing for modular program analysis.

Programming with contracts enables programmers to provide rich dynamic specifications to components. With the ease of writing specifications dynamically comes the expense of checking them at runtime. The ability to statically prove the absence of contract violations would provide a large efficiency gain, enabling contract checks excluded from the execution.

Contracts also enable us to reason about modules in isolation, or modularly, since they specify how two parties should interact at their boundaries. As it turns out, analyzing modular programs is difficult! Abstract interpretation for instance, cannot reason about concrete function flowing into unknown contexts. Symbolic execution allows for analysis in the presence of unknown components but often fails to offer automatic, terminating analyses. If we can utilize abstract interpretation and symbolic execution to verify contracts, we'll be left with a modular and composable automatic program analysis.

With these two problems in mind, let's delve into the intricacies of the paper. On a high level, [1] extends CPCF, a language with contracts, to operate (non-deterministically) over symbolic values (SCPCF) in order to soundly discover blame. This is done by refining symbolic values with successful contract checks and using those to skip future contract checks.

We'll start with an overview of contracts, move to an overview of symbolic execution, and close with how the semantics of contract PCF is extended to include symbolic values.

1. Introduction to Contracts

Contracts are executable specifications that enable blame to be assigned to software components that violate defined specifications. At the first-order level, blame assignment and checking of contracts is simple, but higher-order contracts are limited by Rice's Theorem: it isn't decidable whether a function argument adheres to its contract. Hence higher-order values must be wrapped in a monitor that checks the contract at each use. Monitors track agreements between parties as higher-order values flow back and forth across component boundaries, enabling blame to be assigned correctly.

Summarizing from [1], CPCF is an extension of PCF, introduced by Dimoulas et al. [7], that enables contracts to wrap base values and first class functions.

Types       T ::= B | T → T | con(T)
Base types  B ::= N | B
Terms       E ::= A | X | E E | μ X:T . E | if E E E
                | O1(E) | O2(E, E) | mon f f f (C, E)
Operations O1 ::= zero? | false? | ...
           O2 ::= + | - | ∧ | ∨ | ...
Contracts   C ::= flat(E) | C → C | C → λ X:T . C
Answers     A ::= V | ε[blame f f]
Values      V ::= λ X:T . E | 0 | 1 | -1 | ... | tt | ff
Evaluation  ε ::= [] | ε E | V ε | O2(ε, E) | O2(V, ε)
  contexts      | O1(ε) | if ε E E | mon f,g h(C, ε)

There are three types of contracts: flat(E) wraps base values with a predicate defined in the CPCF language, C1 → C2 is a function contract with the contract C1 for the function's argument, and C2 for the function's result, lastly C1 → λX.C2 are dependent contracts where X binds the function's argument in the body of C2.

A contract C is attached to expression E using the monitor construct mon f,g h (C, E), where the labels f, g and h are used to blame specific components in the event of a contract failure. Label f names the server or the actual expression, g labels the client, or the context, and h names a contract, since contracts are arbitrary expressions, they can also be blamed. The labels in blame f g state that component f broke its contract with g.

[7] introduces complete monitors, which correctly monitor all channels of communication between components, including contracts themselves. For the sake of simplicity, we will work with lax contract semantics, which is not a complete monitor, but is still blame correct.

The reduction relation of CPCF is mostly standard, with the last two rules handling function and flat contract checks.

if tt E1 E2                   ⟼ E1
if ff E1 E2                   ⟼ E2
(λ X:T . E) V                 ⟼ E[V/X]
μ X:T . E                     ⟼ E[μ X:T . E/X]
O(V)                          ⟼ A if δ(O, V) = A
mon f,g h(C1 → λ X:T . C2, V) ⟼ λ X:T . mon f,g h (C2, (V mon g,f h (C1, X)))
mon f,g h(flat(E), V)         ⟼ if (E V) V blame f h

Function contracts are checked by reversing the monitor's blame labels on the argument, since if it doesn't satisfy the domain of the function contract, it is the context's fault. The monitor of the computation's result retains its original blame labels since a result that doesn't satisfy the range of the contract is the function's fault. Flat contract checking involves checking the value against the contract in an if expression, raising blame if the contract isn't satisfied.

Here is a simple example adapted from [9] that shows why label reversal happens for higher-order contracts:

;; foo : (Int → Int) → Int
;; with contract: ((greater-than-9? → between-0-and-10?) → between-10-and-20?)
(define foo (λ (bar) (+ 10 (bar 0))))

When foo invokes bar, foo's greater-than-9? contract fails, but this is due to foo supplying the incorrect value to bar. The greater-than-9? contract is to the left of two arrows in foo's contract, and is said to be in an even position. Base contracts in even position have had their blame labels reversed by the reduction relation an even number of times, so blame will always go to the function.

On the other hand, if we imagine that foo applies bar to 10 and it returns -10, then this will violate between-0-and-10?, which is to the left of one arrow, and thus in an odd position. Being in an odd position means that the reduction relation has reversed the blame labels an odd number of times, leaving blame to be assigned to the context that provided the bar argument.

From here [1] introduces SCPCF, an extension of CPCF that include symbolic values. Before we get into that, I'd like to cover the basics of symbolic execution.

2. Introduction to Symbolic Execution

Symbolic execution is a form of program analysis that lies between the extremes of testing a program and proving it correct. The idea is to generalize the notion of testing a program on concrete inputs by replacing them with symbolic input variables. These symbolic variables represent a broad class of concrete inputs. Hence, the result of a program's symbolic execution is equivalent to a large number of normal test cases run over that program.

To run a program symbolically, the execution semantics of the language must be extended to let operators work on symbolic inputs, producing symbolic formulas as output. Symbolic inputs are arbitrary symbols, and symbolic formulas are manipulations over these symbols, suited to capture data-specific information. If we are trying to symbolically executing arithmetic operations, the formulas outputted by symbolic operations will be algebraic formulas, enabling us to leverage algebraic simplifications. For instance, extending the concrete version of +, δ(+, 4, 5) = 9, to operate over symbolic values returns an algebraic equation: δ(+, α1, α2) = α1 + α2.

Let's look at the following simple arithmetic function:

;; doubleFirst :: Int -> Int -> Int
(define (doubleFirst x y)
  (let ([z (+ x y)])
    (+ x (- z y))))

Executing the it with concrete values, (doubleFirst 4 5), leaves us with the following variable assignments: x = 4, y = 5, z = 9, return = 8. With the symbolic execution (doubleFirst α1 α2), the function's variables are assigned to the following algebraic formulas x = α1, y = α2, z = α1 + α2, return = α1 + α1.

So far, the symbolic execution we've described can only handle programs with linear control flow. How do we handle branching operations in the presence of symbolic values? Symbolic values generally won't have enough constraints to determine which branch of a branching operation to take, hence we need to add the path conditional (pc) to our program's execution state. The path conditional is a boolean expression over symbolic values that accumulates constraints which must be satisfied in order for the execution to follow the associated path.

Let's run through a symbolic execution of the divCond function below instantiated with symbolic values α1 and α2 and starting with pc = true.

;; divCond :: Int -> Int -> Int
(define (divCond x y)
  (if (== y 10)
    (if (== x 0)
      (error "div by zero")
      (/ y x))
    x))

At the (== y 10) expression, y = α2, so we don’t have enough information to branch on it, hence we must fork:

For the “then” case we set pc = true ∧ (α2 == 10) and process the (if (== x 0) ...) expression, where:
- in the “then” case pc = true ∧ (α2 == 10) ∧ (α1 == 0) and an error is raised
- in the “else” case pc = true ∧ (α2 == 10) ∧ (α1 != 0) and a division is computed
the “else” case pc = true ∧ (α2 != 10) and we return x, which is set to α1

At this point, you might be wondering how such obvious properties could be useful?

Godefroid, Klarlund and Sen [4] use randomized program testing paired with symbolic execution to efficiently generate concrete test inputs with complete code coverage. The idea is to run the program on randomized input while keeping track of the path conditional. This execution will cover one path through the program, but they are seeking complete path coverage, so they backtrack to the last place where a constraint was added to the pc, negate that constraint and generate new test input that satisfies the new pc. When the program is run on this new test input, it will take go down the other branch of the last conditional, covering an additional path. Repeating this process and generating concrete test inputs for each path through a program results in a minimal set of path covering test inputs.

Cousot, Cousot, Fähndrich and Logozzo [5] use abstract interpretation, a technique similar to symbolic execution, to infer necessary preconditions to functions in a 1st order setting. By necessary precondition, we mean the smallest possible precondition, which when violated will definitely result in an error. The general idea behind their approach, which can be described in terms of symbolic execution, is to identify the set of path conditions that lead to errors, negate them and set them as preconditions to the function. Take for example the symbolic execution of divCond, the only error occurs when the pc is (α2 == 10) ∧ (α1 == 0), thus the necessary precondition for divCond is ¬((y == 10) ∧ (x == 0)).

In general, symbolic execution is used widely in model checking and test case generation. Now let's move on to how it can be leveraged to verify contracts.

3. Contracts as symbolic values

We are looking to be able to verify that the composition of known and unknown program components respects their specifications. To accomplish this, [1] introduces the notion of symbolic values to CPCF, creating SCPCF. These symbolic values, denoted by ●T, represent unknown values of type T. They have arbitrary behavior of type T, which is then refined “by attaching a set of contracts that specify an agreement between the program and the missing component”.

Prevalues      U ::= ●T | λ X:T . E | 0 | 1 | -1 | ... | tt | ff
Values         V ::= U/Ç    where Ç = {C,...}

Prevalues are the normal values of CPFC extended with the symbolic ●T. Values are prevalues extended with a set of contracts. Notationally, V ∙ C will be used to represent U/Ç ∪ {C} where V = U/Ç, and Ç will be omitted from values wherever irrelevant.

The goal is to define the semantics of SCPCF to run with unknown values such that it soundly approximates running the same program with concrete values. More formally, the semantics are sound, or preserve approximation, if given ⊢ V:T ⊑ ●T and ε[V] ⟼* A, then ε[●T] ⟼* A' and A ⊑ A', that is, A' approximates A. Hence, we can run a program with unknown values, and if it doesn't raise blame, then we know that a concrete instantiation of that program will also not raise blame.

4. SCPCF Semantics

We extend the semantics much like described in §2, but with some key differences:

if V E1 E2      ⟼ E1   if δ(false?, V) ∋ ff
if V E1 E2      ⟼ E2   if δ(false?, V) ∋ tt
(λ X:T . E) V   ⟼ E[V/X]
μX:T.E          ⟼ E[μX:T . E/X]
O(V)            ⟼ A if δ(O, V) ∋ A
(●T→T'/Ç) V     ⟼ ●T'/{C2[V/X] | C1 → λ X:T . C2 ∈ Ç}
(●T→T'/Ç) V     ⟼ havocT V

Before δ mapped arithmetic operations to algebraic equations, which satisfy multiple concrete answers. We let the δ of SCPCF map operations and arguments to sets of answers. Concrete arguments map to concrete singleton sets while symbolic arguments map to more abstract sets, for instance δ(+, V1, V2) = {●N}, if V1 or V2 = ●N/Ç

Due to the change of δ, branching becomes non-deterministic in the presence of symbolic values, similar to §2. Unlike before though, we don't add information to the path condition at branches. This is because we aren't really concerned about gathering constraints on symbols at the term level, but rather at the contract level.

That said, since contracts use the same language as terms, the theory in [1] could be made more precise if branches hoisted their predicates into contracts and added them to the path condition upon non-deterministic forks. For instance, given foo = ●N and that sqrt has the contract flat(positive?) → flat(positive?), then (if (positive? foo) (sqrt foo) 0) cannot be verified using [1] unless evaluating (positive? foo) resulted in foo ∙ flat(positive?)

The last two rules of the SCPCF semantics deal with applying symbolic functions. There are two possible outcomes when a argument V is passed to an unknown function ●T→T':

V is used in an unknown manner, but when V is used, it returns with no failures. Hence the result of the use of V is refined by the range of the contracts over the function V.
Alternatively, the usage of V in an unknown context results in the blame of V. Possible blame is found using the havoc function, which explores the behavior of V by iteratively applying it to unknown values.

havocB    = μx.x
havocT→T' = λx:T → T' . havocT'(x ●T)

Values of base type can't raise blame in an unknown context, since their contract has already been satisfied when they were passed into the unknown context. Thus, havocB diverges as a means to not introduce spurious results. At the function type, havoc “produces a function that applies its argument to an appropriately typed unknown input value and then recursively applies havoc at the result type to the output”. This soundly explores possible blame behavior of values in unknown “demonic” contexts.

Let's slowly work through the example presented in [1] to see how havoc will discover blame, if it exists:

Let hoExample be a higher-order function with the contract (any→any)→any and let sqrt have the contract positive? → positive?.

hoExample:(N→N)→N =
  λ f:N→N . mon(any,
                (λ f:N→N . (sqrt (f 0))) (λ x:N mon(any, f mon(any, x))))

We want to check that in any arbitrary context, hoExample cannot be blamed, hence we pass it to havoc((N→N)→N).

1.  havoc(N→N)→N hoExample
2.  havocN (hoExample ●N→N)
3.         mon(any, (λ f:N→N . (sqrt (f 0))) (λ x:N mon(any, ●N→N mon(any, x))))
4.                  (sqrt ((λ x:N mon(any, ●N→N mon(any, x))) 0))
5.                  (sqrt mon(any, ●N→N mon(any, 0)))
6.                  (sqrt mon(any, (●N→N 0)))
7a.  ...     ...    (sqrt havocN 0)
7b.                 (sqrt ●N∙any)
8.                  ((λ x:N . mon(positive?, sqrt mon(positive? x))) ●N∙any)
9.                  mon(positive?, sqrt mon(positive?, ●N∙any))
10. havocN mon(any, blame hoExample)

In step 2, havoc applies ●N→N to hoExample, using havocN to ensure divergence if no blame is found. Step 3 and 4 substitute ●N→N into hoExample, wrapping it in a any→any monitor. 0 passes the any contract in steps 5 and 6. In step 7, the evaluation of the symbolic function ●N→N applied to 0 introduces a fork in the reduction relation: on one branch havocN is applied to 0 to check if it introduces blame, and on the other, (●N→N 0) evaluates with no blame, resulting in ●N∙any. havocN 0 diverges, so we continue with the second branch. In step 8 and 9 ●N∙any is passed to sqrt, but the contract check (positive? ●N∙any) both passes and raises blame, so validation of hoExample fails.

If we change hoExample’s contract to (any→positive?)→any, then in step 7b the symbolic value ●N would be refined with the contract positive?, enabling the contract check of step 9 to pass. This is made possible with the introduction of the ⊢ V : C ✓ relation presented below.

5. Contract checking in SCPCF

Adding non-deterministic branching at if statements in the presence of symbolic values means that the semantics of flat contract checking becomes quite imprecise, if left as is. Using the contract checking semantics of §1 with symbolic values means that the following expression will incorrectly raise blame twice:

(mon f,g h (flat(prime?) → flat(even?), primePlus1)) (mon f,g h (flat(prime?) ●N))

Once when monitoring ●N to see if it satisfies flat(prime?), which results in both success and failure since (prime? ●N) = {tt, ff}, and the other when ●N is checked to satisfy the domain of flat(prime?) → flat(even?), as it’s passed to primePlus1.

To remedy this, we set up the path condition to keep track of when symbolic values satisfy a given contract. With this information, future contract checks can be ruled out, thus eliminating non-deterministic branching and unsound blame.

The path condition in [1] is formulated by refining symbolic values with contracts (V∙C) once a flat contract check passes. This information is then used in the following judgement relation, which says that V provably satisfies the contract C:

   C ∈ Ç
───────────
 ⊢ V : C ✓

With these additions we can modify the contract checking semantics, remembering successful contract checks and avoiding the imprecision introduced by redundant contract checks:

mon f,g h (C,V)       ⟼ V if ⊢ V : C ✓
mon f,g h (flat(E),V) ⟼ if (E V)
                          (V ∙ flat(E))    | where ⊬ V : flat(E) ✓
                          blame f g        |
mon f,g h (C1 ⟼ λ X:T . C2, V) ⟼ λ X:T . mon f,g h (C2, V mon g,f h (C1, X))
                                   where ⊬ V : C1 ⟼ λ X:T . C2 ✓

6. Soundness of SCPCF

As noted in §3, we are striving for a symbolic execution semantics that soundly approximates the concrete execution semantics. By sound approximation we mean that if the symbolic semantics doesn't raise blame during an execution, then the concrete semantics is guaranteed to also not raise blame. Of course it is an approximation, so it is acceptable for the symbolic semantics to find blame when it doesn't actually exists.

In order to prove that the symbolic execution semantics soundly approximates the concrete execution semantics when concerned with blame, we develop an approximation relation ⊑ where A ⊑ A' means that A' approximates, or is less precise than A:

                  ⊢ V : T
                 ──────────        ───────────────────
                  ⊢ V ⊑ ●T          mon(C, E) ⊑ ● ∙ C

                          ⊢ V : C ✓             mon(C, E) ⊑ E'
     ───────────         ───────────       ────────────────────────
      V∙C ⊑ V              V ⊑ V∙C          mon(C, E) ⊑ mon(C, E')

           ──────────────────────────────────────────────────
            (λ X . mon(D, (V mon(C, X)))) ⊑ ● ∙ C → λ X . D

From top to bottom, left to right, the rules are as follows:

unknown values of type T approximate concrete values of type T
unknown values refined by a contract approximate expressions monitored with the same contract
contract refinements may be introduced on the value being approximated
refinements may be eliminated from the approximating value when that value already proves the contract
a contract monitor may be introduced to the approximation expression if the expression already approximates a value wrapped in the same contract monitor
unknown values refined with function contracts approximate a value monitored by a partially evaluated function contract

The soundness theorem:

E ⊑ E' ∧ E ⟼* A implies ∃ A' . E' ⟼* A' ∧ A ⊑ A'

is proved by case analysis on the reduction semantics and utilizing havoc's completeness, as well as auxiliary lemmas that ensure substitution and basic operations preserve approximation. By havoc's completeness, we mean ε[V] ⟼* ε'[blame l] where l is not in ε, then havoc V ⟼* ε''[blame l].

As a small aside, last semester I used Coq to prove that a non-deterministic semantics for a simple, typed language with unknown values (●T) soundly approximates its concrete counterpart (coq code on github).

And that is pretty much it! The symbolic semantics are not guaranteed to terminate, so the authors of [1] integrate the orthogonal technique of abstracting abstract machines [8]. Additionally, [1] goes on to extend their technique to an untyped core model of Racket, which makes things more complicate in the absence of type information.

8. Conclusion

In this post I've presented the fundamental ideas from “Higher-Order Symbolic Execution via Contracts” by Tobin-Hochstadt and Van Horn.

This paper introduced the idea of a modular analysis for programs with higher-order contracts that enables contract verification in the presence of unknown components. Such a verification enables the safe omission of select contract checks and also provides an analysis where individual components can be verified for later composition. Symbolic execution with contracts as symbol refinements is the secret weapon, with other key components being demonic contexts (havoc) to discover blame, the approximation relation (⊑) to prove soundness, and abstract interpretation to guarantee termination.

Possible future work:

Adapting the work on necessary precondition inference by [5] to work in a higher-order setting. This could be used to strengthen the contracts of known components so that their composition with unknown components can be successfully verified.
Hoisting arbitrary predicate checks to the contract level and adding them to the path condition, or contract set of a value, as described in §4.

Questions:

How does symbolic execution traditionally deal with non-termination?
Can havoc be made more precise by integrating path covering test case generation similar to [4]?
Would weakening of contracts ever increase the ability of the analysis to verify a component's contracts?

An Introduction to Compiler Verification

2013-01-28T00:00:00Z

Last semester I started working on a compiler verification project with Amal Ahmed. We've been investigating whether the CPS transformation presented in [1] continues to preserve equivalence when the source and target languages are extended with recursive types, and hence, non-termination. On a high level, it looks like the equivalence preservation property should hold, yet after some initial progress we've become stuck on proving a continuation shuffling lemma. I plan to write more on that soon, but for now I'd like to give an introduction to compiler verification.

What are we verifying about compilers?

Compilers are software that transform programs from a source language into programs of a target language. For instance, GHC compiler transforms Haskell programs into native machine code. Of course the difference between a language such as Haskell and assembly is very large, so compilers break up the work into multiple transformation passes between several intermediate languages.

How do we know that compilers, such as the GHC compiler, are “correct”? That is, how do we know that the meaning we encode in our source program is preserved in the compiler-emitted target program?

This is where compiler verification comes into play. The goal of compiler verification is to prove that a compiler preserves key properties of the program it is transforming. In specific, we are concerned with preserving the semantics, or behavior of a program, as well as the equivalence between two programs.

Semantics preservation

A semantics preserving transformation from a program $s$ , of source language $S$ , to a program $t$ , of target language $T$ , ensures that $t$ behaves as prescribed by the semantics of $s$ . This means that we expect the same observable behavior from running $t$ that we get by running $s$ . Formally, the transformation $⇝$ is semantics preserving when $\forall B . s ⇓ B \Leftrightarrow t ⇓ B$ , where $B$ is an observable behaviour and $s ⇝ t$ .

Semantics preservation is what one would intuitively expect from a compiler: to transform a program while maintaining its observable behaviours. The need for equivalence preservation is more subtle, and becomes apparent when we start thinking about module compilation and linking target-level code that has been compiled from various source languages.

Equivalence preservation

An equivalence preserving transformation ensures that if two source components are in some way equivalent in the source language, then their target language translations are also equivalent. The canonical notion of program equivalence for many applications is observational, or contextual equivalence. Two programs are contextually equivalent if no well-typed context can make the programs exhibit observably different behaviors.

A non-equivalence preserving transformation would mean that there exists a target context $C_{T}$ where $C_{T} [t_{1}] ≉_{T} C_{T} [t_{2}]$ given that $s_{1} \approx_{S} s_{2}$ , $s_{1} ⇝ t_{1}$ and $s_{2} ⇝ t_{2}$ . Ensuring that such a context can't arise is important for programmers, enabling them to work at the level of abstraction provided by $S$ and not worry about the mechanics of $T$ .

Let's take an example out of Kennedy's overview of how compiling C# to the .NET Intermediate Language is not equivalence preserving [13]: At the C# level, true and false are the only values that inhabit the type bool. On the other hand, bool and byte are interchangeable at the IL level. Hence, while the WriteLine statement below would never execute at C# level, some tricky IL code could pass NotNot a byte value not representing true (1) or false (0) to execute the WriteLine statement.

public void NotNot(bool b) {
  bool c = !b;
  if (!c != b) { Console.WriteLine("This cannot happen"); }
}

This means that equivalence of the following C# code is not preserved after compilation to the IL.

public bool Eq1(bool x, bool y) { return x==y; }
public bool Eq2(bool x, bool y) { return (x==true)==(y==true); }

Full abstraction

Traditionally researchers were concerned whether the denotational model of Plotkin's PCF was fully abstract. Here “a model is called fully abstract if two terms are semantically equal precisely when they are observationally equal, i.e., when they may interchange in all programs with no difference in observable behaviour.” [6]

In the context of compiler verification, the full abstraction of a transformation means that it preserves both semantics and equivalence, or rather it is equivalence reflecting and preserving.

To show that $⇝$ is fully abstract, we will need to show it is both equivalence preserving and reflecting. Consider $s_{1} ⇝ t_{1}$ and $s_{2} ⇝ s_{2}$ :

Equivalence reflection means that $t_{1} \approx_{T} t_{2}$ implies $s_{1} \approx_{S} s_{2}$ . Reflection captures the notion of compiler correctness, or the preservation of semantics. That is, a compiler translation is incorrect if it maps non-equivalent source terms to equivalent target terms.

Equivalence preservation means that $s_{1} \approx_{S} s_{2}$ implies $t_{1} \approx_{T} t_{2}$ . Preservation is harder to prove than reflection, hence it is often proven indirectly by assuming $t_{1}$ and $t_{2}$ are not equivalent and deriving a contradiction.

Note that if the set of possible contexts at the target level is restricted to those that can be obtained by translating source contexts, then proving full abstraction becomes easy since equivalence preservation falls out of proving equivalence reflection. Such a restriction would rule out the ability to reason about most modern languages, for instance compiled Java code is often linked with code from other source languages at the JVM byte-code level.

How do you verify compilers?

There are two main approaches to compiler verification: logical relations and bisimulation.

Logical relations are a tool for generalizing, or strengthening an induction hypothesis as a means to make a proof go through. They can be based on either denotational or operational semantics, and the basic idea behind them is to define relations on well-typed terms via structural induction on the syntax of types. Logical relations were originally introduced to prove strong normalization of the simply typed $λ$ -calculus but have since been adapted to work with various exotic language features. For instance, step-indexing allows for reasoning about non-terminating languages [5] and indexing with Kripke worlds allows for reasoning about mutable references [7].

Bisimulations, roughly put, are relations over the operational semantics of source and target languages. They “define a relation R between pairs of configurations, such that if two configurations are related by R, and each takes a step, then the resulting configurations are related by R, and if two final configurations are related by R, then they represent the same answer” [2]. According to Dreyer et al. [9] there are two main types of bisimulations used for equivalence related compiler verification: environmental bisimulations [2, 3, 12] and normal form bisimulations [4].

Both techniques have their advantages and disadvantages, though I'll leave deeper explorations of these techniques to a future post.

Closed vs. Open world compilers

One important distinction when verifying a compiler is whether to assume a closed or open world.

With a closed world, compilers must have access to entire, closed programs, which includes linked libraries. Projects like CompCert utilize this closed world assumption to do compiler verification. In the case of CompCert, a simulation proof is set up, where the source and compiled programs are run and deemed semantically equivalent if they produce the same trace of observable events [10]. Often while dealing with compilers in the real world, we cannot assume a closed world. For instance, it would be extremely useful to be able to verify that the compilation of an unlinked library is correct, which does not fit in a closed world.

On the other hand, an open world enables compilers to work with open programs that can then later be linked together. In order to work in an open world, Benton and Hurr [8] set up a cross-language logical relation to prove semantics preservation of a transformation from a purely functional language to the SECD machine. Using their technique, given $s_{1} ⇝ t_{1}$ , in order to prove the correctness of the linking of $t_{1}$ with $t_{2}$ , one must provide a corresponding $s_{2}$ and show $s_{2} \approx t_{2}$ . Given a large $t_{2}$ , the requirement to provide a corresponding $s_{2}$ such that $s_{2} ⇝ t_{2}$ becomes too onerous for this proof technique to be practically applicable.

This brings us to exciting new research by Perconti and Ahmed that attempts to remedy the need to provide a corresponding $s_{2}$ by utilizing interoperability semantics (originally introduced by Matthews and Findler [11]). As presented by Perconti at HOPE 12, they are attempting to prove semantics preservation of a ML to assembly compiler by embedding source and target languages, $S$ and $T$ , into a larger language, $S T$ (in reality they define multiple such languages for each compiler pass). Corresponding semantic boundary wrappings ( $ST$ and $TS$ ) are provided, enabling a wrapped $TS (t)$ term to be used as a $S$ term in proofs. Their technique for proving semantics preservation is then to use a logical relation over the $S T$ language to prove contextual equivalence between $s$ and $ST (t)$ terms.

Conclusion

In this post we've covered what properties one looks for when verifying a compiler. Semantics preservation, also known as compiler correctness, refers to a compiler's ability to preserve the meaning of a program throughout a transformation. Equivalence preservation ensures that programmers can think in terms of source level abstractions, and not worry about how modules and libraries interact at the target level. A compiler that exhibits both properties can be said to be fully abstract.

From there, I briefly introduced to common techniques for compiler verification: logical relations and bisimulations.

Lastly, I covered the difference between closed and open world compiler verifications and alluded to some ongoing research on open world compilers.

All of this brings many more questions and things to explore:

I'd like to better understand bisimulation techniques and am curious whether they scale to open world compilers?
Logical relations can have some pretty crazy formulations, for instance the logical relation presented in [9] is indexed by a state transition systems. How else can logical relations be indexed?
The interoperability semantics technique used by Perconti and Ahmed, as well as in previous work [1], seems too good to be true. How do interoperability wrappers eliminate the need for the decompilation of a $T$ term into an $S$ term?
Are there other useful notions of equivalence for compiler verification other than observational equivalence?

Modeling "The Essence of Compiling with Continuations"

2012-12-06T00:00:00Z

Understanding technical papers is difficult. It is easy to convince yourself that you understand a paper, but even easier to shatter this illusion by attempting to run through the paper’s figures by hand. Figures often contain subtleties, omissions, and unfortunately typos that need to be digested and worked through.

This became apparent to me while working through the classic paper “The Essence of Compiling with Continuations” for Matthias Felleisen's Principles of Programming Languages course. As a means to better comprehend this paper, my project partner Erik Silkensen and I mechanically modeled the paper's figures in Redex.

In this post I will present the paper and link to the corresponding executable models that can be found on GitHub as I go.

Continuation passing style (CPS)

Compilers transform high level source languages, such as Java or Haskell into lower level target languages, such as JVM or LLVM. Intermediate language forms are often used to enable the application of generalized optimizations. A popular intermediate form is continuation passing style (CPS), in which procedures don't return but pass their result to a caller-provided continuation or call-back function. Take for instance, a function add1 = λ x . (+ x 1), in CPS it would look like add1' = λ x k . (k (+ x 1)), where k is the provided continuation. Aside from enabling many optimizations, CPS makes control flow of programs explicit and is easy to translate into assembly. For more intuition on CPS, Matt Might has several excellent articles on CPS by example, compiling with CPS and implementing language features using CPS.

The Essence of Compiling with Continuations

An optimizing CPS transformation usually takes multiple passes. Flanagan et. al. show in “The Essence of Compiling with Continuations” that one can forgo a standard 3-pass CPS transformation while still capturing the essence of compiling with continuations by doing a single source level transformation to A-Normal Form (ANF).

To show this, the authors present a Scheme-like language CS, CPS convert it into the CPS(CS) language. Then they incrementally optimize an abstract machine that operates over the CPS(CS) language, arriving at the $C_{c p s} E K$ machine. To close they prove its equivalence to the $C_{a} E K$ machine, which operates over ANF programs in the A(CS) language.

Below is the initial Scheme-like CS Language, which has expressions M, values V and operations O.

M ::= V | (let (x M) M) | (if0 M M M) | (M M ...) | (O M ...)
V ::= number | x | (λ (x ...) M)
O ::= + | - | * | /

CPS transformations

To convert the CS language into CPS, we begin with the naive CPS translation F that adds continuation to terms in the language.

F[V]               = λ k . (k Φ[V])
F[(let (x M1) M2)] = λ k . (F[M1] (λ t . (let (x t) (F[M2] k))))
F[(if0 M1 M2 M3)]  = λ k . (F[M1] (λ t . (if0 t (F[M2] k) (F[M3] k))))
F[(M M1)]          = λ k . (F[M] (λ t . (F[M1] (λ t1 . (t k t1)))))
F[(0 M)]           = λ k . (F[M] (λ t . (O' k t)))))

Φ[c]       = c
Φ[x]       = x
Φ[λ x . M] = λ k x . (F[M] k)

It is naive because it introduces many administrative λ terms that can be simplified away by applying β-reductions. For instance, F(λ x . x) will result in:

(λ k3 . (k3 (λ k2 x . ((λ k1 . ((λ k . (k x)) (λ t . (t k1)))) k2))))

which can then be simplified to:

(λ k3 . (k3 (λ k2 x . (x k2))))

by applying β-reductions to λs introduced by the F function wherever possible.

The resulting language after applying the F function to CS terms is CPS(CS), where there are expressions P, values W and the CPS equivalent of operators found in CS O':

P  ::= (k W) | (let (x W) P) | (if0 W P P)
     | (W k W ...) | (W (λ (x) P) W ...)
     | (O' k W ...) | (O' (λ (x) P) W ...)
W  ::= number | x | (λ (k x ...) P)
O' ::= +' | -' | *' | /'

Optimizing abstract machines for CPS(CS)

Continuations are explicitly present in terms of the CPS(CS) language, hence control and environment are the only components needed to create an abstract machine capable of evaluating CPS(CS) programs. The authors begin with such a machine, the $C_{c p s} E$ machine, but quickly notice that continuation variables are often redundantly bound in the environment. There is no need when evaluating ((λ k1 x . P) k2 y) to bind k1 := k2 in the environment since they point to the same continuation. Addressing this, a control stack is added, resulting in a $C_{c p s} E K$ machine in which continuations are no longer stored in the state's environment. From there they notice that since the machine is keeping track of the program's continuations, continuation variables found in CPS(CS) terms are not used to lookup continuations in the environment. Hence an unCPS transformation is designed to remove explicit continuations from terms and a subsequent $C_{a} E K$ machine is formulated.

UnCPS

In the $C_{c p s} E$ machine, the continuation variables found in terms are used to lookup continuations in the environment. Migrating to the $C_{c p s} E K$ machine means that these continuations are already managed by the state's control stack. This leaves continuation variables in terms useless, so let's drop them. What about the remaining terms that have with continuations of the form λ x . P? These can be transformed into let expressions by binding the current computation to the continuation's free x and putting the continuation body in the body of the let:

(W (λ x . P) W ...) → (let (x (W W ...)) P)

Doing so means that continuations are no longer passed around but become the surrounding context, syntactically linearizing the program. This continuation removal transformation can be formalized as the $U$ metafunction found in our Redex models. For instance applying $U$ to the following CPS expression:

(+' (λ t1 . (let (x 1) (f (λ t2 . (+' k t1 t2)) x))) 2 2)

results in the following CS expression:

(let (t1 (+ 2 2))
  (let (x 1)
    (let (t2 (f x)))
      (+ t1 t2)))

The range of this function is the A(CS) language, which is a stricter subset of the CS language.

A-Normal Form (ANF) language

M ::= V | (let (x V) M) | (if0 V M M)
    | (V V ...) | (let (x (V V ...)) M)
    | (O V ...) | (let (x (O V ...)) M)
V ::= number | x | (λ (x ...) M)

The A(CS) language that results from unCPSing the CPS(CS) language holds some interesting properties. It maintains the sequential structure of CPS code implicitly, without continuations. Programs in the A(CS) language are constrained such that non-value M terms (computations) must be let-bound or appear in tail position. These properties enable certain optimizations, such as being able to write abstract machines that take larger steps.

Direct ANF translation

So far, in order to translate into the A(CS) language we've had to convert to CPS, simplify administrative λs, and translate out of CPS via the unCPS function $U$ . One must wonder, can we forgo CPS completely and translate CS to A(CS) in one source-level pass? Flanagan et. al. show this is possible by presenting a strongly normalizing A-reduction relation which takes CS terms to A(CS) terms using the evaluation context ε. The evaluation context ε is defined such that the holes represent the next reducible expression according to the original CS language's $C_{C S} E K$ machine. For instance, when the $C_{C S} E K$ machine sees a term (if0 (- 1 2) 2 3) it will reduce (- 1 2) first.

ε ::= [] | (let (x ε) M) | (if0 ε M M) | (F V ... ε M ...)
      where F = V or F = O

The idea is to create an reduction relation, A-Reduction, where we bind intermediate computations found in context holes to variables and replace these computations with their binding variables. This results in programs with a linearized control flow that only require one continuation structure in the corresponding abstract machine implementation ( $C_{a} E K$ ).

A-Reduction:

(A1)  ε[(let (x M) N)] →a (let (x M) ε[N])
                          where ε ≠ [] and x ∉ FreeVariables(ε)
(A2)  ε[(if0 V M1 M2)] →a (if0 V ε[M1] ε[M2])
(A3)  ε[(F V ...)]     →a (let (t (F V ...)) ε[t])
                          where F = V or F = O, ε ≠ [], t ∉ FreeVariables(ε),
                                ∀ ε' . ε ≠ ε'[(let (z []) M)]

The A-Reduction is comprised of 3 rules:

(A1) moves let-bindings out of the current context, as long as we make sure to pick a fresh x and substitute it in N. This rule helps us lift let terms out of computational locations so that we no longer need the lt continuation found in the $C_{C S} E$ machine.
(A2) moves if terms out of the current context, duplicating the context for each branch. This rule enables us to get ride of the if continuation found in the $C_{C S} E$ machine.
(A3) binds the result of intermediate computations to variables and lifts them out of the current context. We need to make sure that the hole in ε isn't of the shape ε'[(let (x []) M)] for some arbitrary ε', or else we would be unnecessarily be introducing let-bindings. This rule ensures that applications are only found in the body of let expressions or in tail position.

All the rules enforce the side-condition that ε isn't empty because if it was the reduction relation would never terminate.

The A-Reduction is a reduction relation over the entire program term, and it turns out that once we apply any of the →a rules, we are left with a term that can't be matched by the ε context. This is due to the fact that the ε context only matches terms not yet in ANF and the →a rules introduce terms in ANF. In order to model this in Redex, we were forced to add a ψ context to let us match over terms already in ANF:

ψ ::= [] | (let (x V) ψ) | (let (x (F V ...)) ψ) | (if0 V ψ M) | (if0 V M ψ)
      where F = V or F = O

The A-Reduction rules can then be wrapped in a ψ context, allowing for terms already in ANF to be put in the ψ context so reductions on non-ANF terms can be made. This results in the Adapted A-Reduction which we were able to model in a straight forward manner in Redex:

ψ[M1] → ψ[M2] when M1 →a M2

Typos

The typos we found in “The Essence of Compiling with Continuations” are quite minor, the only reason I mention them is to motivate Redex's LaTeX exporting features.

In the CPS transformation figure of the paper (Figure 3), the auxiliary Φ function reads

Φ[λ x . M] = λ k x . F[M]

failing to use the k variable bound by the λ, and should hence read

Φ[λ x . M] = λ k x . (F[M] k)

Additionally the $U$ function calls an auxiliary Ψ function that unCPSs values. The λ case of Ψ reads Ψ[λ k x . P] = λ x . U[M] but clearly should be Ψ[λ k x . P] = λ x . U[P].

If Redex was available in 1993, such typos could have have been avoided by creating Redex models and then exporting them as LaTeX figures, in the flavor of Run Your Research.

Resources and Further Reading

Our Redex Models on GitHub
Matt Might’s blog post on A Normalization
Matt Might’s blog posts on CPS: CPS by example, compiling with CPS and implementing language features using CPS.
The paper: “The Essence of Compiling with Continuations”
A preceding paper that introduces unCPS: “Reasoning about programs in continuation-passing style”
Semantics Engineering With PLT Redex textbook

Monads: How do they work?

2012-03-05T00:00:00Z

I volunteered to give a presentation on monads for the Practical Functional Programming course at the U of U. This post is a product of me preparing for that talk. It is a general overview of monads, as well as what types of problems they can solve. I'm not concerned about providing anything novel here, in fact most of the content found here is a rehash of other online resources (see Further Reading section).

For readers interested in gaining a solid understanding of monads, I highly recommend trudging through chapters 14, 15, and 18 of Real World Haskell.

Motivation for Monads

How do you code up doubly nested for-loops in a purely functional language? Is there an elegant way to pass around program state without explicitly threading it in and out of every function? How do you encode sequential actions, such as reading and writing files?

It turns out that all of these problems can be solved using the monad abstraction.

What is a Monad?

Monads are a general way to encode sequential actions. I like to think of them as containers that wrap values of a given type and expose a framework enabling action composition.

When we work with monads we want to be able to wrap values in containers (return), as well as compose together containers (bind or >>=).

In Haskell code, a monad is a typeclass. Typeclasses are a way of defining ad hoc polymorphism.

-- Monad Typeclass
class Monad m where
    -- Chain together containers
    (>>=) :: m a -> (a -> m b) -> m b

    -- Inject value into a container
    return :: a -> m a

    -- additional helper functions that aren't necessary:
    -- >> is like >>= but throws away the first result
    (>>) :: m a -> m b -> m b

    -- fail is a technical necessity used for pattern match failures
    -- in do notation
    fail :: String -> m a

If we want to make a type an instance of the Monad typeclass, we just have to define >>= and return for that type. Let's try it out with a data type you may have seen before.

Maybe Monad

Did you know that the Maybe datatype is a monad used for anonymous exception handling? Let's define it and its monadic operators here:

data Maybe a = Nothing
             | Just a

instance Monad Maybe where
    -- bind together operations
    Just x >>= k  =  k x
    Nothing >>= _ =  Nothing

    -- inject value
    return x      =  Just x

    -- then
    Just _ >> k   =  k
    Nothing >> _  =  Nothing

    fail _        =  Nothing

How does extending Maybe to be an instance of the Monad typeclass make our lives easier? Let's take a look at the animalColorLookup function, which doesn't utilize Maybe as a monad:

animalFriends :: [(String, String)]
animalFriends = [ ("Pony", "Lion")
                , ("Lion", "Manticore")
                , ("Unicorn", "Lepricon")
                ]

-- Explicitly chaining Maybes to find ponys friends friends friend
animalFriendLookup :: [(String, String)] -> Maybe String
animalFriendLookup animalMap =
  case lookup "Pony" animalMap of
       Nothing -> Nothing
       Just ponyFriend ->
         case lookup ponyFriend animalMap of
              Nothing -> Nothing
              Just ponyFriendFriend ->
                case lookup ponyFriendFriend animalMap of
                     Nothing -> Nothing
                     Just friend -> Just friend

The chaining of lookup statements wrapped in case gets out of hand quickly. To remedy this we can make use of the bind (>>=) operator defined by the Maybe monad along with some anonymous functions to writer a cleaner version:

-- Use Bind to chain lookups
monadicAnimalFriendLookup :: [(String, String)] -> Maybe String
monadicAnimalFriendLookup animalMap =
      lookup "Pony" animalMap
  >>= (\ponyFriend -> lookup ponyFriend animalMap
  >>= (\ponyFriendFriend -> lookup ponyFriendFriend animalMap
  >>= (\friend -> Just friend)))

Lastly, most Haskellers are used to seeing do blocks when dealing with monads. do blocks are syntactic sugar to make monadic code look cleaner, as well as giving it an imperative feel.

-- Use Do-Block sugar magic
sugaryAnimalFriendLookup :: [(String, String)] -> Maybe String
sugaryAnimalFriendLookup animalMap = do
  ponyFriend <- lookup "Pony" animalMap
  ponyFriendFriend <- lookup ponyFriend animalMap
  friend <- lookup ponyFriendFriend animalMap
  return friend

While do block syntax is nice, it sugar coats a lot details that monad beginners should be exposed to. Hence, I'm going to stick with the more explicit syntax for now, but later on I'll explain how do block code is desugared.

The Maybe monad offers some nice abstractions to clean up code, but let's explore some examples that really show off the power of monadic abstractions.

Threading State

Say you want to code up some sort of abstract syntax tree (AST) transformation program that traverses an AST, inserting unique symbols every once and a while.

To do this in a pure setting, one must pass a counter in and out of every function call that needs to create unique symbols:

-- for simplicity we'll use a string instead of an Abstract Data Type for ASTs
type Sexpr = String

-- Add unique symbol to Sexpr using naive threading of program state
transformStmt :: Sexpr -> Int -> (Sexpr, Int)
transformStmt expr counter = (newExpr, newCounter)
  where newExpr = "(define " ++ uniqVarName ++ " " ++ expr ++ ")"
        newCounter = counter + 1
        uniqVarName = "tmpVar" ++ (show counter)

This is fine, but there is potential for a lot of boilerplate. In addition, if we decide we also need to pass around an environment set, we'll have to manually change all of our function type signatures.

This problem is screaming to be abstracted, so let's see if monads help…

Like the Internet, Monads are a series of pipes

It turns out we can generalize the threading of state seen in transformStmt. Let's chop off Int -> (Sexpr, Int) from the transformStmt type signature and replace it with the State type constructor defined below. This leaves us with transformStmt :: Sexpr -> State Int Sexpr. They are of the exact same type but we've used the type constructor to abstract away the fact that we take in a state value and output a state value.

-- State type constructor with the runState record syntax
-- to extraction or 'run' the state
newtype State s a = State {
      runState :: s -> (a, s)
    }

Now that we have a type constructor we can start to think about making an instance of the Monad typeclass. With return we want to take a normal value and make it accept a piece of state and return that state along with the original value.

instance Monad (State s) where
  -- return :: a -> State s a
  return a = State $ \s -> (a, s)

  -- (>>=) :: State s a -> (a -> State s b) -> State s b
  m >>= k = State $ \s -> let (a, s') = runState m s
                          in runState (k a) s'

And to compose state carrying functions (using >>=), we want to take in a piece of state, evaluate the first function with that state, and store the resulting state and value. Then we pass those as parameters to the second function. This results in the program state getting threaded in and out of both functions.

Let's solidify this a bit by applying it to our AST transformer idea:

-- create a type for the state we want to pass around
type SexprState = State Int

-- let's wrap an Sexpr in the State monad
sexprWithState :: SexprState Sexpr
sexprWithState = return "(foo bar)"

-- Now let's run it with the initially state of 0
ghci> runState sexprWithState 0
("(foo bar)", 0)

-- wrap Sexpr in parenthesis
wrapSexpr :: Sexpr -> SexprState Sexpr
wrapSexpr exp = return $ "(" ++ exp ++ ")"

-- wrap Sexpr in qux
addQux :: Sexpr -> SexprState Sexpr
addQux exp = return $ "(qux " ++ exp ++ ")"

ghci> runState (sexprWithState
           >>= (\exp -> wrapSexpr exp
           >>= (\exp2 -> addQux exp2))) 0
("(qux ((foo bar)))",0)

So now we are doing all of our S-Expression manipulation while passing around state in the background. But wait, we aren't doing anything with the state!

Accessing and Modifying State

get :: State s s
get = State $ \s -> (s, s)

put :: s -> State s ()
put s = State $ \_ -> ((), s)

-- example of getting and modifying state
ghci> runState (sexprWithState
           >>= wrapSexpr
           >>= (\exp' -> get
           >>= (\counter -> (put (counter+1))
           >> (return exp')
           >>= addQux))) 0
("(qux ((foo bar)))",1)

transformStmt revisited

With our new found friend the State monad, we can rewrite the original transformStmt function in a monadic style. This enables us to abstract away all the explicit threading of state.

transformStmt' :: Sexpr -> SexprState Sexpr
transformStmt' expr =
  -- grab the current program state
  get
  -- increment the counter by 1 and store it
  >>= (\counter -> (put (counter+1))
  -- do the sexpr transformation
  >> (return $ "(define tmpVar" ++ (show counter) ++ " " ++ expr ++ ")"))

-- And again using do block sugar
transformStmtDo :: Sexpr -> SexprState Sexpr
transformStmtDo expr = do
  counter <- get
  put (counter+1)
  return $ "(define tmpVar" ++ (show counter) ++ " " ++ expr ++ ")"

List is a Monad

Haskell is known for making easy things really hard. In fact, I was never really able iterate over a doubly nested list until I realized that lists are monads in Haskell. Specifically, lists are used in a monadic style to model nondeterminism, most commonly in Haskell's crazy list comprehension sugar.

As before, let's try and define return and >>= for the [] type constructor. In the Monad typeclass, return takes type a and wraps it in a type constructor m to give the type m a. So in the case of list, a will be wrapped in the type constructor [], resulting in [] a, or more easily read as [a].

To formulate >>= for lists, let us look at the type signature and see if we can find something that matches.

ghci> :type (>>=)
(>>=) :: (Monad m) => m a -> (a -> m b) -> m b

ghci> :type map
map :: (a -> b) -> [a] -> [b]

ghci> :type flip map
flip map :: [a] -> (a -> b) -> [b]

flip map looks close, but we really want a type of [a] -> (a -> [b]) -> [b]. Hence we can substitute [b] for b, resulting in a type of [a] -> (a -> [b]) -> [[b]], which can then be massaged using concat :: [[a]] -> [a].

The resulting formulation is:

instance Monad [] where
    return x = [x]
    xs >>= f = concat (map f xs)

    xs >> f = concat (map (\_ -> f) xs)
    fail _ = []

What does this give us?

Well, list comprehension:

-- monadic powerset
ghci> powerset = [1,2]
             >>= (\i -> [1..4]
             >>= (\j -> [(i, j)]))
[(1,1),(1,2),(1,3),(1,4),(2,1),(2,2),(2,3),(2,4)]


-- or the same using do sugar
do i <- [1,2]
   j <- [1..4]
   return (i,j)

-- or as list comprehension
[(i, j) | i <- [1,2], j <- [1..4]]

For a more detailed explanation and formulation of the list monad, see Chapter 14 of Real World Haskell.

Desurgaring Do Blocks

Ahhh, the sugary sweetness of do blocks. To the novice Haskeller it makes monads look like magic. We've avoided them thus far, but once the desugaring of do blocks is demystified, using do syntax is much more convenient.

So how are do blocks desugared into monadic operators?

-- this do block notation
do x <- foo
   bar
-- desugars into:
foo >>= (\x -> bar)

-- successive actions
do act1
   act2
   ...
-- desugar into:
 act1 >>
 act2 >>
 ...

-- this do block notation
do let x = expr
       x1 ...
   act1
   act2
   ...
-- desugars into:
let x = expr
    x1 ...
in do act1
      act2
      ...

Parametric Guarantees

One thing to note is that only types of the kind * -> * (type constructors of arity 1) can be made instances of the Monad typeclass. This means that the values monads wrap are parametrically polymorphic, ensuring that monadic functions act uniformily on them. That is, >>= and return cannot directly manipulate the values they wrap because Haskell doesn't support type introspection. This doesn't mean that these functions cannot however manipulate other non-polymorphic values present in the type constructor. For instance this we can construct a monad that counts the number of times >>= is called:

data BindCounter a = BindCounter Int a

instance Monad BindCounter where
  -- inject
  return a = BindCounter 0 a
  -- bind
  (BindCounter x y) >>= k = BindCounter (x+1) y'
                              where BindCounter _ y' = (k y)

fooRun :: IO ()
fooRun = do
      -- inject "foo" into the BindCounter Monad
  let x = return "foo" :: BindCounter String
      -- do some monadic string appends
      BindCounter count val = x
        >>= (\y -> BindCounter 0 (y ++ " bar"))
        >>= (\y -> BindCounter 0 (y ++ " baz"))
        >>= (\y -> BindCounter 0 (y ++ " qux"))
  putStrLn $ "bind count: " ++ (show count) ++ ", resulting value: " ++ val

ghci> fooRun
bind count: 3, resulting value: foo bar baz qux

Monadic Laws

Monads should also follow 3 laws: right identity, left identity and associativity. Haskell doesn't enforce these laws, but if your monads don't follow them, people will probably get confused. In addition, these laws are required in order for a monad to form a mathematical category, which is where the name monad came from [ref].

-- Right Identity:
--   no need to wrap an already wrapped value
m >>= return === m

-- do-notation version:
do { x' <- return x            do { f x
     f x' }            ===        }

-- Left Identity:
--   no need to wrap and unwrap a pure value
return x >>= f === f x

-- do-notation version:
do { x <- m            ===     do { m
     return x }                   }

-- associativity:
--   preservation of ordering
(m >>= f) >>= g  ===  m >>= (\x -> f x >>= g)

-- do-notation version:
do { y <- do { x <- m          do { x <- m
               f x                  do { y <- f x
             }         ===               g y
     g y                               }
   }                              }

                               do { x <- m
                                    y <- f x
                       ===          g y
                                  }

We can also define these laws using the monad composition operator defined in Control.Monad [ref]:

(>=>) :: Monad m => (a -> m b) -> (b -> m c) -> a -> m c
m >=> n = \x -> do { y <- m x; n y }

-- Left identity
return >=> g === g

-- Right identity
f >=> return === f

-- Associativity
(f >=> g) >=> h === f >=> (g >=> h)

Resources

Corresponding Monad Slides PDF & Beamer
Example Code: [maybe monad, state monad]