Set Constraint-Based Program Analysis

Presented by Ethan Aubin
Transcribed by Daniel Silva

operators: U: union, &: intersection, ~: complement

Set Expression calculus:

SE := c(SE ...) | X | SE U SE | SE & SE | ~SE

FORTRAN example:

Two statements:

s: x <- u + v
r: y <- v + l

We can use sets to define relevant info.
Problem: register allocation -- find live and dead variables.

operator: A IN> B: A is contained in B
operator: A <IN B: B is contained in A (B contains A)

Given statement S,
Sdef is the set of variables defined by S.
Suse is the set of variables used by S.
Sin is the set of variables live before S.
Sout is the set of variables live after S.

succ(S) is the set of statements immediately after S.

Sin = Suse U (Sout & ~Sdef) Sout = {Xin | X in succ(S)}

Back to our example with s and r:

sdef <IN {x}
suse <IN {u, v}

rdef <IN {y}
ruse <IN {v, l}

rout = {}
rin = {v, l} U ({} & ~{y}) = {v, l}

sout <IN rin = {v, l}
sin <IN {u, v} U ({v, l} & ~{x}) = {u, v} U {v, l} = {u, v, l}

Example 2:

A x <- cons(a, cons(b, cons(c, nil)))
B while (car(x) != c)
C x <- cdr(x)
D end

Estimate the values:

XA <IN [a, b, c]
XB <IN XA & ~cons(c, TOP)
XB <IN XC & ~cons(c, TOP)
XC <IN cdr(XB)
XD <IN XC & cons(c, T)
XD <IN XA & cons(c, T)

To deconstruct cons, we need to introduce set deconstruction operators:
SE := ... | c-1(SE ...)

Set-Based Analysis of ML Programs
Example 3:
let f(u, v) = [u, v] in
  f(1, 2)
  f(3, 4)
The first time f is applied,
[u -> 1, v -> 2]
The second time f is applied,
[u -> 3, v -> 4]

For possible values of the expression, we get:
[1,2], [1,4]
[3,2], [3,4]

Example 4:
let f 1 = 2
    f 2 = 3 in
  f 1
Analysis yields {2, 3}.

To define a constraint-generating scheme, we first modify our calculus:
SE := c(SE ...) | X | SE U SE | SE ? SE | apply(SE, SE)

e1 -> (X1 C1)                           e2 -> (X2 C2)
e1 e2 -> (Y, {Y <IN apply(X1, X2)} U C1 U C2)

                          e -> (X1 C1)                           
λ x. e -> (Y, {Y <IN λ x. e, ran(λ x. e) <IN X} U C)

Consider the term: e0 = e1 e2
e1 = λ f. c(f a, f b)
e2 = λ x. x
a, b, and c are constants.

We derive e0 -> (X1, C) where C is the constraints
X1 <IN apply(X2, X3)
X2 <IN e1
X3 <IN e2
ran(e1) <IN c(X4, X5)
ran(e2) <IN Xx
X4 <IN apply(Xf, a)
X5 <IN apply(Xf, b)

Logic Applications

Find out what terms X can assume.
p(X) :- q(X), r(X).
Set of terms coming out:
Retp <IN p(X)
Retq <IN q(a) U q(b)
Retr <IN r(c) U r(b)
X <IN q-1(X) & r-1(X)
   (a U b) & (b U c)

ML-style type inferencing for Scheme-style languages
Soft Typing

Start with dynamically typed languages (like λ) and infer types.
λ y. case y of
       true  : one
       false : zero
α -> one ? (α & true) U zero ? (α & false)

μ = Exp x Env -> D D+ = (D+ -> D) U c(D ...)
D = D U BOTTOM U wrong

Added wrong for termination and erroneous computation.

type in D:
ρ(0) = {BOTTOM}
ρ(1) = D - wrong
ρ(τ1 ? τ2) = ρ(τ1) if ρ(τ2) != {}
BOTTOM otherwise

Inference rules:
Note 1: read "A, S | e : τ" as "if the free variables of e have the types given by assumptions A, then e has type ρ(τ) for any solution ρ of the constraints S." (Aiken, p. 3)
Note 2: using the symbol "=" to mean "contains" instead of "equals"

A, S | e, τ1,   e2 : τ2
A, S U {τ1 = τ3 -> τ4, τ2 = τ3} | e1 e2 : τ4

A, S | e : τ
A U {vars}, S | p : τ'1, e : τ1
A, S U {τ = {τ'i, i in {1, 2}} | case e of p1 : e1, p2 : e2 : {τi ? (τ & pi), i in {1, 2}}

(true false)
true <IN α -> β
false <IN α

This is not correct (true is not an α -> β), so we can't prove that it is correct. But we don't halt analysis. Rather, insert a runtime check.

check1 -> 1 e = e in 1->1, or

μ((check1 -> 1 true) false) -> BOTTOM

To recover such programs, introduce an error term.
μ(true false) x {} -> wrong
true <IN (α -> β) U (γ & ~(1 -> 1))
false <IN α

Setting γ to BOTTOM gets the original type back.