Monday, August 9, 2004

Definite Clause Grammars


Definite Clause Grammars

Not Just for Parsing Anymore

Abstract

Definite Clause Grammars (DCG) have proved to be a
system to build parsing systems simply and
effectively. Unfortunately, DCG are so effective as a parser
building tool, that their other uses are too often ignored -- it
seems that DCG have been pigeon-holed into a parser building
niche. This paper demonstrates the common usage of DCG and then
expands these common usage patterns into areas other than
parsing, showing that DCG are effective means of increasing
programming efficiency while simultameously maintaining (or even
clarifying) declarative semantics.



Introduction to DCG



Definite Clause Grammars (DCG) are syntactic extentions to
Prolog that allow a subset of definite programs
1 to
be written as Prolog programs. DCGs, and the definite programs
they represent, facilitate, among other things, an almost
unchanged translation of grammars defined in Bacchus-Naur Form
(BNF) into (parts of) Prolog programs -- DCG is a useful tool
for generating parsers and scanners.



Take, for example, a simple command-control language for,
e.g. a robot-arm,
2 issuing
commands as per the following:



up down up up down


An example representation of the above grammar in BNF for the
above language is as follows ...




< move >
::= < step > < move >

< move >
::= < step >

< step >
::= up

< step >
::= down



... and the equivalent Prolog program using DCG is the
following:




move --> step, move.

move --> step.

step --> [up].

step --> [down].



This nearly direct translation is simply amazing, especially as
compared to other programming language families. It is
inconceivable in other programming languages to
construct a scanner for the above language so concisely in the
native representation. The other amazing facet, not discussed
here, is that the scanners and parsers scale with the grammar's
complexity linearly ... writing scanners and parsers in
other programming languages becomes much more difficult with
increasing complexity, because, usually, such scanners and
parsers increase in complexity polymonially or even
exponentially to the complexity of the grammar.



Unfortunately, perhaps, because the above scanner and
associated parsers are so facilitated by DCGs, the other uses
of DCGs are too often overlooked. Let's step back from the
application of DCGs and examine the essentials -- DCGs provide a
system: accepting some atoms and updating the state of the world
or rejecting others and backtracking to find a match, so DCGs
provide (in other words, guarantee) a limited subset symbols and
associated actions on those symbols.



Type systems can be viewed in a similar fasion: they accept
some inputs (compiling and executing on accepted instantiated
values) and reject others (causing a compilation failure). The
advantages of typeful systems has been covered exhaustively in
the literature, and programming systems developed recently
accept programming with types as a matter of course. Viewing
DCGs in a similar light of types can convey the advantages of
types in a programming language without types.



One of the oft-trumpeted advantages of programming with
(static) types is that programs with type declarations execute
faster than equivalent programs without types (or with dynamic
typing). Why? Types form a duality: those things acceptable
to perform computations, and those other things that are not to
be used in the computation. Given that duality, the system can
perform the computation in the safety that all the values are
valid (in that they conform to the formal type). Under a
dynamic typing system, the system must first check the type of
each value, ensuring that the value's type is acceptable, before
it can perform any computation. In a statically typed system,
every check is compiled away (verified) before execution
occurs.



Definite Clause Grammars can provide the same service that
static typing does. In the usual case, a scanning/parsing
system receives a language (set of data) that (we hope) conforms
to the grammar. However, we can build a system where we control
the grammar definition AND the input data set. A good example
to examine where this approach yields effective results is in
the domain of generate-and-test problem-solvers.



SEND + MORE = MONEY


(Cryptarithmic problem solvers)



Cryptarithmic problem solvers take a trivially encrypted
mathematical problem and determine what that problem
represents. The most general case is that a symbol stands for
any digit at each position, e.g.:



       6 . .
x # . .
-------------
# . .
# . . .
+ # 5 . 5
-------------
# . 5 . 4 .











where # stands for any digit other than 0, and
. stands for any digit




The above problem solved in the usual manner would take the
generate-and-test approach:



digit(0). digit(1). digit(2). digit(3). digit(4).
digit(5). digit(6). digit(7). digit(8). digit(9).

first_digit(X) :- digit(X), X > 0.


And would provide ways to construct representations of
(composite) numbers and translations to and from these
representations and the "actual" numbers they represent (so, for
example, the system can use the standard operators to perform
arithmetic):



as_number(Digits, Number) :- as_number_aux(Digits, 0, Number).

% an auxilary pred to make the above interface pred tail recursive3
as_number_aux([], Num, Num).
as_number_aux([Digit|Digits], Num, Result) :-
NewSum is Digit * 10 * Num,
as_number_aux(Digits, NewSum, Result).

as_list(0, []).
as_list(N, [Digit|Rest]) :-
N > 0,
Digit is N mod 10,
Remainder is N // 10,
as_list(Remainder, Rest).


Then, using the above, one converts the problem as written into
the equivalent representations and asks the system to find the
solution (note that the above list representation puts numbers
in reverse order: units are the first element of the list):



dots([SixNum, Multiplier, Ones, Tens, Hundreds,  Solution]) :-
digit(A), digit(B),
as_number([B, A, 6], SixNum),
digit(C), digit(D), first_digit(E),
as_number([C, D, E], Multiplier),
Ones is SixNum * C, Ones < 1000,
Tens is 10 * D * SixNum, Tens < 100000,
Fivers is SixNum * E, as_list(Fivers, [5, _, 5, _]),
Hundreds is Fivers * 100,
Solution is Ones + Tens + Hundreds,
as_list(Solution, [_, 4, _, 5, _, _]).


A Pentium Windows system using SWI Prolog
solves this problem
in 0.79 seconds, or 476,919 inferences in 0.79 seconds (602827
Lips). Pretty impressive, and, for this
general type of problem there's no need to improve on this
generic generate-and-test methodology.



What happens with a cryptarithmic problem with more stringent
constraints, such as uniqueness? To solve the equation ...



 SEND
+ MORE
------
MONEY


... one could use the generate-and-test approach, and enforce
that each letter represents an unique digit:



send_more_money([Send, More, Money]) :-
first_digit(S), first_digit(M),
digit(E), digit(N), digit(D), digit(O), digit(R), digit(Y),
all_different([S, E, N, D, M, O, R, Y], []),
Send is S * 1000 + E * 100 + N * 10 + D,
More is M * 1000 + O * 100 + R * 10 + E,
Money is M * 10000 + O * 1000 + N * 100 + E * 10 + Y,
Money is Send + More.

all_different([], _).
all_different([Num|Rest], Diffs) :-
not(member(Num, Diffs)),
all_different(Rest, [Num|Diffs]).


And such an approach, as above, is simple to implement, and
easy to understand, but the runtime cost is enormous -- it took
1,571,755,688 inferences in 1817.34 seconds (864865 Lips) to
find the solution. Instead, let us
reexamine the problem constraint (uniqueness) and use it to
provide information to the problem solver so as to facilitate
its effort, and at the same time retain the clarity of the
declarative syntax.



The uniqueness constraint tells us that each letter must
represent an unique digit. This can be described inductively:





  • S can be any first_digit, or [1..9];

  • E can be any digit except the digit captured by S,
    or [0..9] - [S];

  • N can be any digit except the digits captured by S and E,
    or [0..9] - [S, E];

  • ... etc.



Looking at the above inductive description of the unique
constraints shows a consumer pattern very much like how a DCG
system consumes symbols from a list. So, we'll implement just
such a system; in this particular case the symbols of the list
are digits. As we rewrite the system to use DCG to replace the
generate-and-test predicates, the digit/1 predicate
becomes a definite predicate that consumes a digit from the
(input) digit list and the first_digit/1 predicate
becomes definite, as well:



takeout(X, [X|Y], Y).
takeout(X, [H|T], [H|R]) :- takeout(X, T, R).

digit(X) --> takeout(X).4
first_digit(X) --> digit(X), { X > 0 }.


Then the solution predicate changes in that it now operates
under the structure of DCG with the input list being the
enumerated ten digits (and we also include an interface
predicate so that the user is not burdened with providing the
digits):



send_money_quickly([Send, More, Money]) :-
do_send_moola([Send, More, Money], [0, 1, 2, 3, 4, 5, 6, 7, 8, 9], _).

do_send_moola([Send, More, Money]) -->
first_digit(S), first_digit(M),
digit(E), digit(N), digit(D), digit(O), digit(R), digit(Y),
{ Send is (S * 1000) + (E * 100) + (N * 10) + D,
More is (M * 1000) + (O * 100) + (R * 10) + E,
Money is (M * 10000) + (O * 1000) + (N * 100) + (E * 10) + Y,
Money is Send + More }.


Did you notice that the above code now has no uniqueness
check
? By translating the code into a definite
predicate, we guarantee that each logical variable
holds an unique digit at unification! What is occuring here, by
using DCG to assign unique digits is equivalent to creating a
type for each logical variable, and each successive type is more
restrictive than its predecessor. The code (algorithm) changed
very little structurally, but what does this DCG-as-types
modification buy us?



Quite a bit. The reduction in cost at runtime is astounding!
This new system solves the equation using only 8,215,467
inferences in 36.91 seconds (222562 Lips). This is nearly
fifty times faster than the fully nondeterministic
version presented first. By iteratively restricting the search
space with DCG we have simulated a type system that allows the
system to find a solution much more efficiently without
sacrificing the clarity of the declarative semantics of the
problem description
.



Summary for DCGs as Types



The above example demonstrates when one needs to search a
restricted solution space, DCG can provide runtime benefits over
pure nondeterminism without sacrificing clarity of code (as so
many other "optimization" techniques do). The above example
also demonstrates that if the search space becomes more
restrictive in a predictable way over the duration of the
search, DCG can provide enormous runtime benefits.



DCG Implementation of Dynamic Programming



DCGs are effective parser generators and are commonly used as
such, we've also seen that DCGs can simulate types and reap the
benefits of static type systems in a dynamically typed
environment. DCGs have other uses as well. One such use is to
simulate dynamic programming,
5 which
computes solutions from an iterative bottom-up approach instead
of (usual for Prolog and other languages that rely on recursion)
the recursive top-down approach.



Dynamic programming shines where the solution must be
approached incrementally from a known-good set of states,
usually these are situations were a declarative top-down
description of the solution leads to a combinatorial explosion
of attempted solutions that can overwhelm any computational
resource. An excellent example of such a problem is computing
the nth Fibonacci number.



Only some Fibonacci numbers declaratively



Below is the formula to find the nth Fibonacci number:



fib n | n == 0    = 1
| n == 1 = 1
| otherwise = fib (n-1) + fib (n-2)


Or, as translated (naively) into Prolog:



naive_fib(0, 1).
naive_fib(1, 1).
naive_fib(X, Y) :-
X > 1,
Idx1 is X - 1,
Idx2 is X - 2,
naive_fib(Idx1, A),
naive_fib(Idx2, B),
Y is A + B.


This is all very well and good, but, unfortunately, the above
specification is exponentially recursive ... not only does it
take a long time to find a solution, but it can only find the
first few decades of solutions before running out of available
computational resources (memory). A query of
naive_fib(20, X) takes 65,671 inferences in 4.54
seconds (14476 Lips) to find the
solution, but a call for the 30th
Fibonacci number causes an "out of local stack" error.



The problem with the top-down approach is that it attempts
to find the solution by finding the two previous (unknown)
solutions, until it locates a known answer. Unfortunately, the
bottom is too far down to be reached with the resources
available, and the problem does not make conversion into a
tail-recursive solution practicable.



Any Fibonacci, quickly, with DCGs



DCGs solve this problem by simulating dynamic programming
... approach the solution from a set of known states, and keep
building from that known state until the system finds the
solution. This approach has the additional advantage that since
it is iterative, it eliminates the exponential recursive
explosion that the previous declarative approach suffered.



The idea is use a list6
to memoize
the result set as the solution space grows, until the system
reaches the desired solution;
7 that is
where DCGs assist us in maintaining a declarative description
while controlling resources. Declaratively, then, if the
requested Fibonacci number is at the head of the solution list,
then return it:



fibonacci(X, [X|_]) --> [].


Otherwise, if we're at then end of the list and still have not
found the solution, push the next two Fibonacci numbers onto the
result set and keep searching:



fibonacci(X, [A, B]) -->
next_fib(X, A, B, Fib1),
next_fib(X, B, Fib1, Fib2),
fibonacci(X, [Fib1, Fib2]).


Finally, if we're in the middle of the result set and have not
found the solution, keep searching:



fibonacci(X, [A, B|Rest]) -->
succ_push(X, A),
fibonacci(X, [B|Rest]).8


The DCG housekeeping methods are what one would expect:



% Ensures the current fib is not the one we need, generates a new
% one, and pushes the current one onto the result set.
next_fib(X, fib(Idx1, Num1), fib(Idx2, Num2), fib(Idx3, Num3)) -->
succ_push(X, fib(Idx1, Num1)),
{ Idx3 is Idx2 + 1,
Num3 is Num1 + Num2 }.

% add a definite branch to gt that pushes the compared-to element
% onto the end of the list (transformed into a difference list)
succ_push(A, B, Old, New) :-
gt(A, B),
concat([Old|T1] - T1, [B|T2] - T2, New).9


As before, we retain the declarative nature using DCGs to solve
the problem, but we also obtain the benefit of much more
powerful computational ability with much less runtime cost,
10 to wit:
quick_fib(1450, X) finds the
solution using 12,327 inferences in
0.02 seconds (615464 Lips) -- out-of-reach and impossibly fast
for the top-down standard predicate.12



Conclusion



Definite Clause Grammars (DCGs) have long been known to be an
effective tool to generate scanners and parsers. In this
article, we explore alternative uses for DCGs, such as emulating
static type system and implementing dynamic programming system.
We have seen that, unlike other "optimization" techniques, DCGs
maintain or even improve the clarity of declarative
specifications while at the same time providing the benefits
of these other systems. These benefits are manifested as
improvements by orders of magnitude in both computational power
and the speed at which the runtime achieves desired
solutions.






Endnotes



































1
Definite programs and Prolog programs using DCG
to model definite programs is discussed in detail in
[DM93].

2
This example is developed in
[Bratko01], chapter 21, with
two alternative BNF representations offered.

3

A naive version of as_number/2 would be ...


as_number([], 0).
as_number([Digit|Digits], Result) :-
as_number(Digits, Rest),
Result is Digit + 10 * Rest.

... but this is not tail recursive, an so may not benefit from
optimization from the Prolog system. In general, this kind of
transformation from a "naive" recursive system to one that uses an
accumulator is known as folding, and is the result of the fruits of
research in the functional programming community.


4
We use takeout/3 instead of
delete/3 because delete/3 is not
reversible, eliminating the desired backtracking to
explore alternate solutions. The definition, use, and a
wonderful explanation of takeout/3 come from
[Fisher99], § 2.7.

5
Dynamic programming is discussed in detail in
[RL99], which provides an
implementation of a generic dynamic programming algorithm in the
Haskell programming language.

6
Actually, [RL99] uses a
dictionary, not a list, as the data structure to memoize
the known state, but the difference list approach was so
effective that it was unnecessary to use a different data
structure.

7
As each Fibonacci number is computed from the previous
two Fibonacci numbers, I found it most convenient
to use (specifically) a difference
list
as the data structure to memoize the
results.

8
We have two iteration branches for this predicate. The first
adds two more Fibonacci numbers when we've come to the final two
Fibonacci numbers in the result set; the second continues
iteration because there are more than two remaining Fibonacci
numbers. As we need two Fibonacci numbers to compute the next
Fibonacci number, these two branches guarantee that we perform
that computation when we reach the last two numbers,
and not before. This approach forms the basis of this dynamic
programming system.

9
The implementation of concat/3 is straight
from [Bratko01], § 8.5.3. (concat(A1-Z1, Z1-Z2, A1-Z2)).

10
Not only that, but this solution also provides the
history of all the Fibonacci numbers prior to the requested one
in a difference list by calling fibonacci/4
directly;11
the traditional top-down solution provides no such history.

11

The resulting difference list can be converted into
human-readable form with the following predicate:


simple_list([]) --> [].
simple_list([H, Datum|Diff] - Diff, L0, List) :-
simple_list(H, [Datum|L0], List).

12
An implementation using monads in
Haskell for the fibonacci
computer is available at logicaltypes.blogspot.com


Works Consulted



[Bratko01]
Prolog Programming for Artificial Intelligence, 3rd
ed.
Ivan Bratko, Pearson Education Limited, Essex,
England, 2001.
[DM93]
A Grammatical View of Logic Programming, Pierre
Deransart and Jan Mal/uszyn'ski, MIT Press, Cambridge,
Massachusetts, 1993.
[Fisher99]
prolog :- tutorial, J. R. Fisher,
http://www.csupomona.edu/~jrfisher/www/prolog_tutorial/contents.html, 1999.
[RL99]
Algorithms: A Functional Programming Approach,
Fethi Rabhi and Guy Lapalme, Addison-Wesley, Essex,
England, 1999


Definitions



backtrack(ing):
exploring alternate solutions by attempting new values
before the point of failure.
BNF:
Bacchus
Naur
Form is a system to describe
grammars of (programming) languages.
DCG: Definite
Clause
Grammar(s)

difference list:
Difference lists are list that allow access to elements
within the list directly; they avoid the need to "cdr
down" (a Lisp term mean to walk the list from its first
element to the desired element) the list to reach the
desired elements.
Lips: Logical
Inferences
Per
Second

memoize:
To memoize a result is to keep that result locally
active so that it is not necessary to recompute
it.


Solutions to Problems



The dots problem has the following
(only) solution:



    645
x 721
-----
645
12900
+ 451500
--------
465045


The solution to SEND + MORE =
MONEY
is 9567 + 1085 = 10652.



The 20th Fibonacci number is
10946 (computed from fibonacci(fib(20, X), [fib(1,1), fib(0,1)], A, B)).



The 1450th Fibonacci number is
7.79078e+302; the 1500th Fibonacci number causes a
floating-point overflow error.





Last modified: Mon Aug 09 16:15:25 Eastern Daylight Time 2004
Copyright © 2003, 2004, Cotillion Group, Inc. All rights reserved.

No comments:

Post a Comment