where {c_1,...,c_m} is the set
of clauses in CNF(P_i <=> phi_i).
That is we add the proposition Pi with the weight of w_i, and then add
all of the clauses in the CNF of the double implication Pi <=> phi_i,
with weight w+. This conversion provides a correspondence between
maximal models of the resulting WPKB and the original. In particular,
let M be any maximal model of CLAUSAL(KB), and let M' be identical to
M only it does not include truth assignments for P_1,...,P_n. We have
then that M is a maximal model of KB. Likewise for any maximal model
M of KB, there is a corresponding maximal model of CLAUSAL(KB) that is
identical to M ignoring P_1,...,P_n.
You will be asked to give a proof of the above correspondence in your
homework. A rough sketch is as follows, for which you will need to
fill in the details. First, note that the w+ weights act as hard
constraints, forcing the each formula P_i <=> phi_i to be satisfied in
any maximal model of CLAUSAL(KB). This means that P_i will be true in
a maximal model of CLAUSAL(KB) exactly when phi_i is also satisfied in
the model. This can be used to yield the desired relationship between
KB and CLAUSAL(KB).
One interesting aspect of this conversion is that it produces a WPKB
such that the only non-hard constraints are those involving unit
propositions. An unfortunate aspect of this conversion is that it can
dramatically increase the number of propositional symbols, as it adds
a new symbol for each formula in the original WPKB.
Weight Learning:
----------------
As mentioned above, humans are often quite good at providing rules of
thumb in the form of logical rules or formulas. But they are not so
good at assigning appropriate weights to formulas, representing the
relative preferences. Here we will describe an approach to learning
appropriate weights given a set of formulas and training examples.
In order to define our learning task we will first introduce the
concept of "maximal model completion". Let KB be a WPKB over the set
of propositions {p_1,...,p_n}, and P' be a partial truth assignment,
which assigns a truth value to a subset of the propositions. A
"maximal model completion" relative to KB and P' is a model that is
consistent with P' and has maximum weight compared to all other such
models. We will denote the maximal model completion as MAX-SAT(KB |
P'). When P' is empty then the result is simply a MAX-SAT solution. If
there are multiple maximal model completions, then we assume that
MAX-SAT(KB | P') returns one of them according to a lexicographical
ordering. As an example, consider the KB
q, 10>
__ not(q), 5>
and the partial truth assignment,
P' = {p = true, u = true}
In this case, we have
MAX-SAT(KB | P') = {p = true, u = true, q = true}
which has a weight of 10. Note, that for this example MAX-SAT
solutions have a weight of 15, e.g. the model where all propositions
are false. This shows that maximal model completions can have lower
weight than a MAX-SAT solution, which makes sense since a maximal
model completion must satisfy more constraints.
Note that it is straightforward to use a MAX-SAT solver to compute
MAX-SAT(KB | P'). One can simply augment KB with literals that have
very large weights (essentially infinite) that force the truth values
specified in P'. Using the above example, we could form a new WPKB
KB',
__

__ q, 10>
____ not(q), 5>
__

__
for which all MAX-SAT solutions must set p=true and u=true and hence
will correspond to possible solutions to MAX-SAT(KB | P'). (Make sure
you understand why this works.)
We now define our learning problem formulation. We will divide the set
of propositions into an input set X = {x_1,..,x_n} and an output set Y
= {y_1,...,y_m}. Intuitively, we would like to learn a set of weighted
formulas KB over X and Y such that given a truth assignment X' for the
input propositions, MAX-SAT(KB,X') assigns the "correct" values to the
output propositions.
As an example, consider the problem of selecting agent actions in a
multi-agent real-time strategy (RTS) game. Here, X might correspond to
a set of propositions that describe the current state of the game
(giving e.g. the position of all agents, the health, etc), and Y might
correspond to a set of propositions that assign actions to each agent
(e.g. "agent1 should attack enemy1"). Given an assignment X' to the
state propositions, we would like a WPKB KB such MAX-SAT(KB,X')
returns an assignment Y' to the set of action propositions that lead
to good performance.
The input to our learning problem is a set of propositional formulas
{phi_1,...,phi_v} over X and Y, and a set of training examples,
{,...,} where each X_i is an input with desired
output Y_i. Our goal is to learn a set of weights {w_1,...,w_v} giving
us a WPKB KB = {,...,} such that, for each
training example, MAX-SAT(KB,X_i) is consistent with Y_i. In our RTS
example, we might obtain the pairs by observing a human
expert play the game, or alternatively via "reinforcement learning"
(reinforcement learning is an area of AI taught in CS533).
To learn weights we can use the generalized perceptron algorithm
(Collins, 2002). Below we will introduce a feature function f_i for
each formula phi_i. The value of f_i given a model (X,Y) is,
f_i(X,Y) = 1, if phi_i is true in (X,Y)
= 0, otherwise
You should verify that WEIGHT((X,Y),KB) = sum_i w_i*f_i(X,Y).
The generalized perceptron algorithm iterates through training
examples and adjusts weights until either all training
examples are correctly predicted, or a maximum number of iterations is
met. For each example, the algorithm computes the
= MAX-SAT(KB,X_i)
using the current weights to define KB. Here Y' is the current
prediction that the KB makes for the input X_i. If the prediction is
correct, then we do not change the weights. Otherwise we change the
weights so that the correct model gets more weight and the
incorrect model gets less weight. By repeating this process
the hope is that the correct models will eventually have higher weight
than all other alternatives, making = MAX-SAT(KB,X_i) as
desired. The pseudo-code is given below.
for each i=1 to v,
w_i = 0
repeat for some number of iterations
for each i = 1 to N,
KB = {,...,}
;; compute the best Y' according the current weights
= MAX-SAT(KB | X_i)
if not(Y' = Y_i)
for j = 1 to v,
w_j = w_j + alpha*[f_j(X_i,Y_i) - f_j(X_i,Y')]
The critical step is the weight update,
w_j = w_j + alpha*[f_j(X_i,Y_i) - f_j(X_i,Y')]
that occurs after an incorrect prediction. If the formula phi_i is
true in the correct model (X_i,Y_i) but false in the incorrect model
(X_i,Y') then the update will increase the weight for phi_i which will
increase the weight on the correct model as desired. If phi_i is false
in the correct model and true in the incorrect model then we decrease
the weight for phi_i which will decrease the weight on the incorrect
model as desired. Finally if phi_i is the same in both the incorrect
and correct model we do not adjust the weight. The single parameter
alpha is the "learning rate". Often alpha is set to 1.
While it is not obvious, it can be shown that if there exists a set of
weights such that = MAX-SAT(KB,X_i) for all training
examples, then the above algorithm with alpha=1 will eventually find
such a set of weights.
The primary computational bottleneck of the above algorithm is the
MAX-SAT calculation for each training example in each iteration. As we
noted earlier, MAX-SAT is computationally hard in the worst case. One
solution to dealing with this problem is to use fast approximate
MAX-SAT solvers.
The above formulation assumed that we were given an initial set of
formulas with no weights. Clearly it would be desirable to develop
algorithms that can also induce new formulas. There are a number of
heuristic approaches that could be tried for this purpose, though we
will not discuss them in this course.
As an important side note, the above algorithm is straightforward to
generalize to any type of weighted constraints. That is, instead of
using propositional formulas as constraints on models, we can consider
arbitrary constraint languages on arbitrary structures. For any such
language we can attach weights to those constraints and provide an
algorithm for the corresponding MAX-SAT problem---i.e. an algorithm
that finds a structure that maximizes the weight of satisfied
constraints. With the MAX-SAT solver in hand, the above algorithm can
be applied directly.
Weighted First-Order Logic: A Template-based Approach
-----------------------------------------------------
You have now seen approaches to inference and learning for
propositional weighted logic. The key idea was to use WPKBs to assign
a weights to models that represent a preference relationship over
models. Here we would like to extend that idea to first-order
models. This will provide us with the ability to compactly specify
preference knowledge that generalizes over objects.
Let begin with an example of what we might like to represent, but
can't easily represent with WPKBs. Recall our "pacifist example" from
above. In that case we used the propositions JonIsPacifist,
JonIsQuaker, and JonIsRepublican. If we wanted to have the same rules
of thumb for another individual, e.g. Nixon, we would need to
explicitly add three additional propositions, and add the
corresponding propositional formulas. Rather, it would be more
convenient if we could simply write down weighted formula template
that could be instantiated for any individual. For example,
<(Republican(x) => not(Pacifist(x))), 10>
<(Quaker(x) => Pacifist(x)), 20>
are weighted formula templates where x is a free variable that serves
as the template parameter. We can instantiate these templates for any
number of individuals, yielding a set of ground or propositional
weighted formulas. For example, if we are interested in Jon and Nixon,
we would get the WPKB,
<(Republican(Jon) => not(Pacifist(Jon))), 10>
<(Quaker(Jon) => Pacifist(Jon)), 20>
<(Republican(Nixon) => not(Pacifist(Nixon))), 10>
<(Quaker(Nixon) => Pacifist(Nixon)), 20>
which can be used to reason about Jon and Nixon, e.g. via a MAX-SAT
solver. Intuitively the weight associated with each template can be
viewed as a cost that must be paid for each instantiation of the
template that is not satisfied. Thus, if a particular model violates
both ground instances of the first template (one for Jon and one for
Nixon) then the weight of that model will be 20 = 2x10 less than if
the instances were satisfied.
Fixed-Domain Semantics:
Before we define the syntax of weighted templates and how they are
used to assign weights to first-order models, we first introduce the
idea of "fixed-domain semantics", which will serve as our formalism
for upgrading from propositional to first-order models.
Recall that for a given set of propositions there are a finite number
of propositional models. This made it straightforward to define the
notion of model weight and maximal model. The general first-order case
is not so simple as there are an infinite number of first-order
models, some of which have an infinite number of domain objects. This
makes defining the weight of a model with respect to a set of weighted
templates problematic. For example, we must worry about the
possibility of models with infinite weight and the possibility of
non-existence of maximal models.
We will avoid the above difficulties by stepping back from the full
generality of first-order logic and restrict our attention to finite
sets of objects. We will refer to this restriction as the
"finite-domain semantics". The idea is that at any moment in time we
will only be concerned with a finite set of "objects of interest" and
will only consider models over those objects. Since there are only a
finite number of such models (assuming a fixed set of predicates), we
will see below that it is relatively straightforward to define the
semantics.
Indeed, the finite-domain semantics typically correspond well with
application goals. For example, in the RTS domain described above, at
any point in the game there will only be a finite number of objects
(e.g. friendly/enemy units, buildings, gold mines, etc). Our system
need only be concerned with models involving those objects. e.g. to
select the "MAX-SAT" actions for the friendly troops. However, over
time the set of game objects will change and we would like our
knowledge base to naturally generalize to these new situations. The
ability to generalize in this way is the primary advantage of using a
first-order rather than propositional formalism.
More formally, given a finite set of constants C (representing our
objects) and a set of predicate symbols P, we define MODELS(P,C) as
the set of all first-order models with domain C (i.e. an object for
every constant) involving only predicates in P. Note that we can view
MODELS(P,C) as specifying a set of propositional models where the
propositions correspond to all possible ground atoms constructed from
P and C. For example, consider C = {Jon, Nixon} and P = {Republican,
Pacifist, Quaker}. Here each model in MODELS(P,C) will specify a truth
assignment to the following propositions,
{Republican(Jon), Republican(Nixon),
Pacifist(Jon), Pacifist(Nixon),
Quaker(Jon), Quaker(Nixon)}
there are a total of 2^6 such models. In general, there will be on the
order of |C|^v propositions in MODELS(P,C) where v is the maximum
arity of any predicate in P. This means that in general MODELS(P,C)
will contain on the order of 2^(|C|^v) models.
The idea of defining semantics relative to a given finite domain of
objects is quite common. For example, as we will see later in the
course, this idea is the essence behind most work on probabilistic
relational modeling.
Weighted Formula Templates:
A "formula template" is simply a first-order logic formula with free
variables. Free variables in a formula template should not be treated
as universally quantified, as in first-order logic. Rather, we will
think of the variables as template parameters. We will use these
parameters in order to compile a template to a set of ground formulas
relative to a set of constants. Given a set of constants C and a
formula template T we define the "compilation of T relative to C",
denoted by COMPILE(T | C), as the set of ground formulas that results
from the following steps:
Let C = {c_1,...,c_n}
1) For each universally quantified subformula (\forall x phi(x)) in
T replace it by the conjunction
(phi(c_1) & phi(c_2) & ... & phi(c_n)). This yields a new
formula template T'.
2) Starting from T', replace each existentially quantified
subformula (\exists x phi(x)) by the disjunction
(phi(c_1) v phi(c_2) v ... v phi(c_n)). This yields a new
formula template T''.
3) Return the set of all ground formulas that are a result of
any way of substituting the free variables of T'' with constants
from C. If there are m free variables in T'', then we will
return |C|^f formulas.
Essentially the above procedure assumes that the only objects in the
world are those that will be denoted by the constants. The procedure
then replaces universal quantification by explicit conjunctions, and
existential quantification with explicit disjunctions. Consider the
following example.
C = {A,B}
T = [\exists z (R(x,z) & R(z,y))] => (\forall w Q(x,y,w))
where T is a formula template with free variables x and y (note that z
is existentially quantified and hence is not free). The steps for
computing COMPILE(S | C) are as follows:
1) T' = [\exists z (R(x,z) & R(z,y))] => [Q(x,y,A) & Q(x,y,B)]
2) T'' = [(R(x,A) & R(A,y)) v (R(x,B) & R(B,y)]
=> [Q(x,y,A) & Q(x,y,B)]
3) { SUBST(S'',{x/A,y/A})
SUBST(S'',{x/A,y/B})
SUBST(S'',{x/B,y/A})
SUBST(S'',{x/B,y/B})
}
So we see that COMPILE(T | C) results in a set of 4 ground
formulas. Recall that we can think of first-order ground formulas as
equivalent to propositional formulas where the propositions are the
ones in MODELS(P,C) where P is the set of predicates in T. For
example, the propositions in the above formulas include the following
set of 2^2 + 3^2 = 13 ground atoms.
{R(A,A),R(A,B),R(B,A),R(B,B),
Q(A,A,A),Q(A,A,B),Q(A,B,A),....,Q(B,B,B)}
Now that we have defined formula templates and compilation, it is
straightforward to define weighted formula templates. A "weighted
formula template" is simply a pair of a formula template and an
integer weight. Given a weighted formula template and a set of
constants C, we define compilation as follows,
COMPILE(,C) = { | phi \in COMPILE(T,C)}
That is, COMPILE(,C) is a WPKB whose formulas are those in
COMPILE(S,C) and all have the same weight w.
A "weighted template knowledge base" (WTKB) KB is a set of weighted
formula templates. The compilation of a WTKB KB relative to C is the
union of weighted formulas that result from compiling any template in
KB relative to C. For example, consider the following WTKB KB,
1) <(Republican(x) => not(Pacifist(x))), 10>
2) <(Quaker(x) => Pacifist(x)), 20>
3) <((Friend(x,y) & Quaker(x)) => Quaker(y)), 30>
and constants C = {Jon, Nixon}. This is similar to our previous
pacifist examples, only now we have a new preference that the friend
of a quaker is a quaker. The compilation of KB denoted by COMPILE(KB,C)
yields the following WPKB.
<(Republican(Jon) => not(Pacifist(Jon))), 10>
<(Republican(Nixon) => not(Pacifist(Nixon))), 10>
<(Quaker(Jon) => Pacifist(Jon)), 20>
<(Quaker(Nixon) => Pacifist(Nixon)), 20>
<((Friend(Jon,Nixon) & Quaker(Jon)) => Quaker(Nixon)), 30>
<((Friend(Jon,Jon) & Quaker(Jon)) => Quaker(Jon)), 30>
<((Friend(Nixon,Jon) & Quaker(Nixon)) => Quaker(Jon)), 30>
<((Friend(Nixon,Nixon) & Quaker(Nixon)) => Quaker(Nixon)), 30>
We can now use any propositional MAX-SAT solver to reason about the
individuals in C. In general, given a WPKB KB and a set of constants C
we define the "maximal models relative to C" to be the maximal models
of COMPILE(KB,C). This model is guaranteed to be in MODELS(P,C) as
intended under our finite-domain semantics. It is conceptually
straightforward to compute such a maximal model by creating
COMPILE(KB,C) and running a propositional MAX-SAT solver.
Continuing with the above example then, we see that a maximal model or
MAX-SAT solution of KB relative to C is,
Friend(Jon,Nixon) = true
Friend(Nixon,Jon) = false
Friend(Jon,Jon) = false // could also be true
Friend(Nixon,Nixon) = false // could also be true
Republican(Jon) = true
Quaker(Jon) = true
Pacifist(Jon) = true
Republican(Nixon) = true
Quaker(Nixon) = true
Pacifist(Nixon) = true
This agrees with the preferences indicated in the above rules. If
Nixon was not a friend of Jon then the MAX-SAT solution would conclude
that Nixon was not a pacifist since he is a republican. But the fact
that Jon is a friend of Nixon causes the MAX-SAT solution to conclude
that Nixon is a quaker and hence also a pacifist. Note that if the
weight on rule 3 were less than that on rule 1, then the MAX-SAT would
conclude that Nixon was not a quaker and also not a pacifist. (You
should verify this for yourself.)
We see that the use of weighted templates can allow for knowledge to
be expressed very compactly. However, one practical concern with the
general compilation approach is that it can result in very large sets
of weighted ground formulas. In general, the number of ground formulas
will be on the order of |C|^v, where v is the maximum number of free
variables in any template. In practice, if an application is time
critical, we can limit the number of free variables to help keep the
number of ground formulas under control. There are also various
"partial compilation" tricks that can sometimes be used that avoid
creating the entire WPKB for COMPILE(KB | C) while guaranteeing
correct inference. We will not discuss such approaches in this course.
Learning the Weights of First-Order Templates:
----------------------------------------------
Suppose now that we are given a set of first-order templates, along
with training data. We now describe an approach to learning the
weights. It turns out that we can use an algorithm that is almost
identical to the propositional case.
We define the learning problem as follows. The input is:
1) A set of first-order formula templates {phi_1,...,phi_v} over a set
of input predicates Px and output predicates Py.
2) A set of training examples {,...,} where
each C_i is a set of constants, and is a model in
MODELS(C_i,Px \union Py). Here we think of X_i as listing truth
assignments to atoms involving input predicates, and Y_i listing
truth assignments for atoms involving output predicates.
The output is a set of weights {w_1,...,w_v} giving a WTKB KB =
{,...,} such that, for each training example,
MAX-SAT(KB_i | X_i) is consistent with Y_i, where KB_i is the WPKB
COMPILE(KB | C_i). That is, our goal is to find a set of weights such
that when the WTKB is compiled relative to C_i for each example we are
able to compute the target Y_i facts using MAX-SAT. Note that the set
of constants C_i need not be the same across examples. In our RTS
example, this means that the training data can come from different
situations involving a different sets of game entities.
The main difference between the learning problem here versus the
propositional case is that here the X_i and Y_i vary in size across
the examples. Rather in the propositional setting the X_i and Y_i were
truth assignments over a fixed set of propositions. Nevertheless we
can adapt the generalized perceptron algorithm to our new setting in a
straightforward way.
To describe the learning algorithm we will redefine our previous
notion of feature function. The feature function f_i for formula
template phi_i assigns a positive integer to any given example (C,X,Y)
as follows:
f_i(C,X,Y) = |{phi' true in (X,Y) | phi' \in COMPILE(phi_i | C)}|
That is, f_i(C,X,Y) is the count of how many formulas in COMPILE(phi_i
|C) are true in the model (X,Y). Thus f_i will have a high value if
the template phi_i is typically true in the example, and will have a
small value if the template is frequently violated in the model. This
definition of feature function has the property that,
WEIGHT((X,Y),COMPILE(KB | C)) = sum_i w_i*f_i(C,X,Y)
that is the weight of the model (X,Y) with respect to the compiled
knowledge base is given by the weighted sum of feature functions. Just
as in the propositional case, we can now use the perceptron algorithm
to adjust the weights in a direction that increases the weight of the
target models and decreases the weight of incorrect models. The
pseudo-code is below.
for each i=1 to v,
w_i = 0
repeat for some number of iterations
for each i = 1 to N,
KB = {,...,}
KB' = COMPILE(KB | C_i) ;; WPKB relative to C_i
;; compute the best Y' according the current weights
= MAX-SAT(KB' | X_i)
if not(Y' = Y_i)
for j = 1 to v,
w_j = w_j + alpha*[f_j(C_i,X_i,Y_i) - f_j(C_i,X_i,Y')]
Again the critical step here is the weight update,
w_j = w_j + alpha*[f_j(C_i,X_i,Y_i) - f_j(C_i,X_i,Y')]
that occurs after an incorrect prediction. This update makes intuitive
sense. If phi_i is satisfied more frequently in the correct target Y_i
than in the incorrect prediction Y', then the weights are
increased. Otherwise if phi_i is violated more often in Y_i than in
Y', then the weights are decreased.
Despite the simplicity of this update rule, we again have the
theoretical guarantee that the algorithm will converge to weights that
correctly classify all of the training data if such weights exists.
Again here we have assumed that we are given formula templates. One can
also consider learning templates based on the training data. This is
akin to the structure learning problem in graphical models, or feature
discovery in more traditional machine learning. FOIL-like techniques
have been used for this purpose with good results.
__