Wednesday, December 14, 2005

Aspect-Oriented Programming in Prolog

Aspect-Oriented Programming in Prolog


The concept of aspect-oriented programming (AOP) has recently
gathered steam in the general software engineering community as a
novel approach to problem-solving. Although aspects are useful
for functional and imperative object-oriented programming, do
they add value to rule-based programming systems? We argue that
developing systems in rule-based systems expose similar issues
that aspects address well. This manual
reviews AOP's
history and utility
, describes the
practice of programming with aspects in Prolog
(using the
protocol of the aspects module),
demonstrates the
system with an extended example
, and finally
production AOP usage


Recently there has been a groundswell of interest in a new style
of programming called Aspect-Oriented Programming
Contrary to the prevalent popular "wisdom" that claims that
encapsulation is an essential practice for software engineering,
AOP claims that an "open implementation" is a
key developing useable software.2
AOP originated from a subset of
Common Lisp's metaobject
3 that
allows methods of objects to be intercepted so that auxiliary
code could be executed before, after or around the method's
execution. These auxiliary methods allow "cross-cutting"
concerns, such as debugging, to be collected into one area,
separate from the method body. This allows the programmer to
focus on implementing the functionality without the need to
consider housekeeping issues.

As mentioned above, aspects have their origin from Common Lisp.
Common Lisp's object system, unlike those of traditional
object-oriented programming languages, is
-- methods are not tied to a particular class, but
belong to their own general class of methods (this class is called
'generic-function'), this change allows aspects to interact with
methods freely, adding advice or altering method behavior as
needed. Java, and other programming languages with traditional,
covariant, object-oriented systems do not
have this built-in capability
so the AOP movement was launched to address this

Just as AOP has its roots in Common Lisp's contravariant
metaobject protocol, so, too, contravariance has its own origins.
Contravariance was developed to provide object-oriented
functional programming languages the ability to express
alternates and induction much in the style of predicate logic,
and, to a large degree, it does this task well. There is,
however, a much simpler, more powerful, and more direct way to
express these features: predicates. Logic, or rule-based,
programming languages solve problems using predicate logic
directly. This begs the question: since aspects are a subset of
contravariance which, itself, is a subset of predicate logic, is
there any need for aspects in rule-based languages?

We argue in the affirmative -- rule-based system face the same
design challenges as other systems: these systems have concerns
that cut across multiple areas of functionality, and
implementation decisions (even ones made implicitly or by
default) correct for one resolution among competing goals are
not always the best decision when new goals are introduced or old
ones change. By introducing aspects into rule-based languages
(specifically Prolog for this manual), implementations become
even more declarative and flexible in nature. We demonstrate
this simplification in the example
following the protocol


The specific implementation of aspects for Prolog flows in the
following manner: code is developed as usual. When the
programmer identifies a cross-cutting concern, that programmer
develops the aspect using the specification
. An aspect
should be declared as immediate (i.e.: aspect/3's
goal should be called immediately with :-/1, not
implied in a clause's proof) and in the user
module: usually it is best to have all aspects defined in one
file for a library or executable. If the aspects themselves
depend on user-defined syntax or require some initialization at
compilation-time, then the programmer should use the immediate

Once the program and aspects are defined, the programmer weaves
the aspects into the program by using the aopc
compiler (see program/1 for the
artifacts created).

The programmer may then opt to run the program without aspects by
compiling the modules of the program with qpc
without aspect advice and linking with qld without
the aspect assertions. To
compile in the aspects, the program modules are compiled with
qpc with the -i

initialization flag, and linked with the
aspects and the
aspect assertions with


Now that we have a description of utility of aspects and the
practice of use for aspects in Prolog, we now specify their





Declaration <callable> A declaration to insert into the aspect-artifact file
This predicate provides any
additional information needed to be able to compile the aspect
artifacts. So, for example, any op/3 declarations
used in declaring the aspects should also be captured in an
:- aspect(+Type,
+ForGoal, +Aspect)


Type<atom> one-of([before, after, around])
ForGoal <term> the goal to be advised by this aspect
Aspect <callable> The goal to execute at the point of
Type in

This declares aspects for the
program. When the time at Type
for the goal ForGoal is
reached in the program, then
Aspect is called. For
before and after aspects,
ForGoal is automatically
called at the appropriate time (for a before aspect, it is
called after Aspect exits;
for after aspects,
Aspect is called after
ForGoal exits).

Around aspects replace the call to
ForGoal entirely, but
this is not always desireable, so around aspects
have two special forms that may be used in the body of

that calls ForGoal
with its original arguments; and

that calls ForGoal
with the arguments specified. prove_goal/N must agree
with the arity of



Module<atom> The name used to create the three aspect-artifact


Takes the aspects and aspect-declarations loaded into memory
and weaves them into the predicates that are their subjects
by producting three artifact files:

  1. <aspect-module> an
    initialization file for qpc that rewrites all modules
    that are affected by aspects; and

  2. <aspect-module>_asserts.qof
    : a series of
    assertions that redirect all direct predicate calls to their
    corresponding aspect-managed predicates; and

  3. <aspect-module>.pl:
    the translation of the
    aspect declarations to inlined- (normal-) predicates.

For the program to behave normally (that is, without aspect
oversight) the program modules are compiled and linked without
these artifacts. To activate the aspects the program's
modules are now compiled with the
-i <aspect-module>
initialization flag and the executable is linked with the
<aspect-module>_asserts.qof and the
<aspect-module>.pl files. For aspect
oversight on an interactive system, one follows a similar set
of steps:

  1. load the aspect-artifact files into the Prolog listener
    in the following order:

    1. <aspect-module>_asserts.qof

    2. <aspect-module>.pl

    3. <aspect-module>

  2. load the modules of the program into the listener


A Whirl Interpreter

Here we develop a working example from the original concept to
one that uses simple aspects to report a trace of program flow
(useful for debugging) to one that has integrated production
aspects that manage cross-cutting concerns. We present a
interpreter. The language
has a terse syntax (with a somewhat richer semantics) that allows
one to complete an interpreter in an afternoon.

The Problem: Debugging

Unfortunately, as it takes many repetitions of instructions and
commands to accomplish anything useful, the task of debugging
using traditional means is rather tedious: alleviating the
burden of the programmer in the debugging is the first task we
put to the aspect methodology. The first iteration of the
interpreter seemed to do everything just fine -- it ran the
simple Whirl programs without any problems. When it came to
the first non-trivial program ("Hello,
it failed unexpectedly (this is for the
source code version before
the release of the Whirl document
mentioned above). Doing a top-level profile analysis of
the program run showed it failed around instruction 379
("Hello, World!" has over 1300 instructions):

?- read_file_to_codes('hello.wr', Hello, []), profile(whirl(Hello)).
Total time: 0.05 seconds
Predicate Box Entries = Calls+Redos Time
list_utils:takeout/3 1,506 = 947+559 66.7%
command/3 243 = 106+137 33.3%
switcheroo/2 105 = 105+0 0.0%
interpret_next_instruction/2 1 = 1+0 0.0%
history/4 238 = 238+0 0.0%
initialize_ops_ring/1 1 = 1+0 0.0%
whirl/1 1 = 1+0 0.0%
initialize_program/4 54 = 1+53 0.0%
instruction/3 379 = 379+0 0.0%
rotate/2 141 = 141+0 0.0%
switch_wheel/2 105 = 105+0 0.0%
initialize_memory/1 1 = 1+0 0.0%
execute/3 379 = 379+0 0.0%
history_is/4 378 = 378+0 0.0%
direction/2 171 = 141+30 0.0%
initialize_math_ring/1 1 = 1+0 0.0%
reverse_spin/2 238 = 238+0 0.0%
adjust/2 271 = 141+130 0.0%
arithmetic/4 29 = 29+0 0.0%
execute_command/2 370 = 238+132 0.0%
get_command/3 106 = 106+0 0.0%
spin_reverse/2 358 = 238+120 0.0%
interpret/2 379 = 379+0 0.0%
comparison/4 2 = 1+1 0.0%
syntax:<-/2 1,817 = 1,647+170 0.0%

Because the program fails somewhat in depth, debugging by traditional
(Prolog) methods (setting a spypoint and tracing execution), become
onerous rather quickly:

read_file_to_codes('hello.wr', Hello, []), length(Hello, X), spy(instruction), whirl(Hello).
% Spy point on instruction/3
Call: (10) instruction(_L387, program(ops, history(quescient, last_instruction(1)), [ring(ops, [0-noop, 1-exit, 2-one, 3-zero, ... -...|...], clockwise spin, value(0)), ring(math, [0-noop, 1-load, 2-store, ... -...|...], clockwise spin, value(0))], memory([cell(0, value(0))]), program([0-1, 1-1, 2-0, 3-0, 4-1, ... -...|...])), _L388) ? leap
Exit: (10) instruction(1, program(ops, history(quescient, last_instruction(1)), [ring(ops, [0-noop, 1-exit, 2-one, 3-zero, ... -...|...], clockwise spin, value(0)), ring(math, [0-noop, 1-load, 2-store, ... -...|...], clockwise spin, value(0))], memory([cell(0, value(0))]), program([0-1, 1-1, 2-0, 3-0, 4-1, ... -...|...])), program(ops, history(quescient, last_instruction(1)), [ring(ops, [0-noop, 1-exit, 2-one, 3-zero, ... -...|...], clockwise spin, value(0)), ring(math, [0-noop, 1-load, 2-store, ... -...|...], clockwise spin, value(0))], memory([cell(0, value(0))]), program([0-1, 1-1, 2-0, 3-0, 4-1, ... -...|...]))) ? leap
Call: (12) instruction(_L564, program(ops, history(quescient, last_instruction(1)), [ring(ops, [1-exit, 0-noop, 2-one, 3-zero, ... -...|...], clockwise spin, value(0)), ring(math, [0-noop, 1-load, 2-store, ... -...|...], clockwise spin, value(0))], memory([cell(0, value(0))]), program([1-1, 0-1, 2-0, 3-0, 4-1, ... -...|...])), _L565) ? leap
Exit: (12) instruction(1, program(ops, history(quescient, last_instruction(1)), [ring(ops, [1-exit, 0-noop, 2-one, 3-zero, ... -...|...], clockwise spin, value(0)), ring(math, [0-noop, 1-load, 2-store, ... -...|...], clockwise spin, value(0))], memory([cell(0, value(0))]), program([1-1, 0-1, 2-0, 3-0, 4-1, ... -...|...])), program(ops, history(quescient, last_instruction(1)), [ring(ops, [1-exit, 0-noop, 2-one, 3-zero, ... -...|...], clockwise spin, value(0)), ring(math, [0-noop, 1-load, 2-store, ... -...|...], clockwise spin, value(0))], memory([cell(0, value(0))]), program([1-1, 0-1, 2-0, 3-0, 4-1, ... -...|...]))) ? leap
Call: (14) instruction(_L762, program(ops, history(quescient, last_instruction(1)), [ring(ops, [2-one, 1-exit, 0-noop, 3-zero, ... -...|...], clockwise spin, value(0)), ring(math, [0-noop, 1-load, 2-store, ... -...|...], clockwise spin, value(0))], memory([cell(0, value(0))]), program([2-0, 1-1, 0-1, 3-0, 4-1, ... -...|...])), _L763) ? leap
Exit: (14) instruction(0, program(ops, history(quescient, last_instruction(1)), [ring(ops, [2-one, 1-exit, 0-noop, 3-zero, ... -...|...], clockwise spin, value(0)), ring(math, [0-noop, 1-load, 2-store, ... -...|...], clockwise spin, value(0))], memory([cell(0, value(0))]), program([2-0, 1-1, 0-1, 3-0, 4-1, ... -...|...])), program(ops, history(quescient, last_instruction(1)), [ring(ops, [2-one, 1-exit, 0-noop, 3-zero, ... -...|...], clockwise spin, value(0)), ring(math, [0-noop, 1-load, 2-store, ... -...|...], clockwise spin, value(0))], memory([cell(0, value(0))]), program([2-0, 1-1, 0-1, 3-0, 4-1, ... -...|...]))) ? leap
Call: (16) instruction(_L893, program(ops, history(quescient, last_instruction(0)), [ring(ops, [2-one, 1-exit, 0-noop, 3-zero, ... -...|...], counterclockwise spin, value(0)), ring(math, [0-noop, 1-load, 2-store, ... -...|...], clockwise spin, value(0))], memory([cell(0, value(0))]), program([3-0, 2-0, 1-1, 0-1, 4-1, ... -...|...])), _L894) ?

The above trace represents only the first three instructions, and does not
show any of the predicates called to support the execution of those
instructions (which is unreasonable, given that the programmer is trying
to ascertain the cause of failure of instruction interpretation). What we
need is a 'meta-debugger', a system will bring us immediately to the
problem and shows us the root cause. Writing meta-debuggers is easy: the
traditional approach is to hack and slash the code with debugging
statements and assertations. The problem with doing that is that the
meta-debugger is tightly integrated into the production code, causing the
inevitably eventuality -- the debugging code become inseparable from the
production code. This is the host of a whole new set of problems, such
as the debugging code introducing errors into the system (by manipulating
production state, for example) and also the debugging code obscures the
meaning of the production code as all the debugging statements surrounding
the production statements tend to drown the latter in the sea of the
former. Put simply: traditional meta-debugging techniques cause more
problems than they solve.

Debugging: the Solution

Round 1: Full and Fast Trace

Traditional meta-debugger are not the solution, but this is where an
aspect-oriented meta-debugger provide a viable alternative. Our first
aspect-advised system traces program flow over what we suspect to be
the problem area, and when it observes the predicate fail, report that
failure with the current state:

:- use_module(library(aspects), [aspect/3]).

:- aspect(around, interpret(In, _Out),
(format('** going into interpret/2 with ~w~n', [In]),
format('** okay~n', []))).
:- aspect(around, interpret(In, _Out),
format('!! FAILED ON CMD: ~w~n', [In])).

Results (Round 1)

When weaving the aspects into the program, it does the job; unfortunately,
it does its job too well. A log capturing all the debugging statements
generated by the aspect-advised system weighs in at over 2 megabytes of
text, and the fail statement itself 10 kilobytes and occurs at the midpoint
of the debugging log -- impressive? yes; "informative [for those with
the fortitude to wade through the statement]"? maybe; useful? no. For,
although we eliminate the drudgery of manually stepping through the program
one instruction at a time, and we avoid introducing debugging artifacts
into the production code, we do introduce a large product as the output
that must be scanned to locate the problem, and once located the output
statement, in particular, is cumbersome, providing too much information
to be at all useful. We must refine our use of aspects to help us locate
the problem.

Round 2: State Representation

The major generator of text in the debugging output statements is the
program/5 term itself; most of which is historical data we have little
interest in beyond the immediate history. Given this, a major space
saver will be to display only the parts of the program in which we are
interested. The program/5 term has the following structure:

program(Active, History, Rings, Memory, Program)

see the formal language definition
for the meanings of each of program/5's arguments

Of program/5's arguments, Active and History are small enough, and
interesting enough, to retain, and Memory, although possibly infinite in
size, in most cases is compact, so may also be passed along unmolested.
The last two arguments, Rings and Program, do contain data that are not of
interest in the interpretation of the current instruction. For the rings,
they contain 12 immutable commands (the only interesting one is at the
head). For the program, the current instruction is at the head, and all
others are unordered. If we eliminate the tails of these arguments' lists,
we should reduce the size of the output by 99%. We do this by asserting a
portray/1 rule into the aspect definition and changing the format commands
to use the print ('~p') directive instead of the write ('~w') one:

:- use_module(library(aspects), [aspect/3, aspect_declaration/1]).

:- aspect_declaration(asserta((user:portray(ring(Type, [C0|_], Spin, Value)) :-
format('ring(~w, [~w|_], ~w, ~w)',
[Type, C0, Spin, Value])))).
:- aspect_declaration(asserta((user:portray(program(Active, History, [R1, R2],
Memory, program([I0|_]))) :-
format('program(~w, ~w, [~p, ~p], ~w, ~w)',
[Active, History, R1, R2, Memory,

:- aspect(around, interpret(In, _Out),
(format('** going into interpret/2 with ~p~n', [In]),
format('** okay~n', []))).
:- aspect(around, interpret(In, _Out),
format('!! FAILED ON CMD: ~p~n', [In])).

Results (Round 2)

This works much better -- the output log file is
now only 60 kilobytes (not 2M), and the failed message is now only 310
characters (not over 10K, as was the last one). The
smaller failed message is much easier to
review. It indicates that the error was at instruction 209 which was a
math command: the 'neg' command. This is all very well and good,
but we shouldn't be required to dig through a log file to find errors.
A better scenario would be to be presented with a description of the state
when an error occurs.

Round 3: Only Debugging for Errors

A better approach is to build an aspect system that only activates when
an erroneous condition occurs. To do that, we need only modify the
implementation of the aspect definitions (dropping the output on
nominal behavior), so the first aspect/3 clause becomes:

:- aspect(around, interpret(_In, _Out), (prove_goal, !)).

Results (Round 3)

With this modification, the program debugging output occurs where we think
the error is...

!! FAILED ON CMD: program(math, history(quescient, last_instruction(0)), [ring(math, [11-neg|_], clockwise spin, value(3)), ring(ops, [7-dadd|_], counterclockwise spin, value(1))], memory([cell(5, 0), cell(4, value(3)), cell(3, 0), cell(2, value(220)), cell(1, value(44)), cell(0, value(10))]), program(209-0))

...which is the same message we received before, but now we get it
immediately, as opposed to it being buried 200 lines into the debugging
information. It tells us that the instruction that failed was a "0" that
translates into the math command "neg". Examining the math neg command
doesn't seem to indicate any particular reason that it should fail:

command(MathOp, program(math, Hist, Rings, M, P),
program(math, Hist, [ring(math, I, S, value(Value))|Ops], M, P)) :-
MathOp <- [add, mult, div, neg],
takeout(ring(math, I, S, value(A)), Rings, Ops),
M = memory([cell(_, value(B))|_]),
arithmetic(MathOp, A, B, Value).

As the neg command does the same things that the (e.g.) mult command does,
so it seems something else is the issue.

Round 4: Rooting out the real error

Since the neg command seems correct, we look elsewhere in the program
state, and we see irregularities in the memory layout: the memory cell
values should be typed with the value/1 term, but cells 5 and
3 are not typed this way:

memory([cell(5, 0), cell(4, value(3)), cell(3, 0), cell(2, value(220)), cell(1, value(44)), cell(0, value(10))])

In this light, it is not the neg command's failure that concerns us so
much as that a prior command's action that improperly set these cell
values. This is no a failure condition when this happens, but an aspect
can detect this untyped assignment. We do this by replacing the above
aspects with the following one:

:- aspect(after, command(Type, ProgramIn, ProgramOut),
(ProgramIn = program(_, _, _, memory([cell(_, value(_))|_]), _),
ProgramOut = program(_, _, _, memory([cell(Idx, Num)|_]), _),
(number(Num) =>
format('!! Command ~a assigned an untyped number to memory cell ~d~n',
[Type, Idx])))).

The special operator used above declares a ("forward")
9 This
relation is found in the syntax
, and so must be imported and then its syntax declared
in order to be used by the aspect:

:- use_module(library(syntax)).
:- aspect_declaration(use_module(library(syntax))).

:- op(920, xfy, =>).
:- aspect_declaration(op(920, xfy, =>)).

Results (Round 4)

With this new aspect framework, when we run the aspect-advised system,
we observe the following output:

!! Command dadd assigned an untyped number to memory cell 1
!! Command dadd assigned an untyped number to memory cell 1

Examining the implementation for the dadd command, we see the

command(dadd, program(ops, Hist, Rings, memory(Memory), P),
program(ops, Hist, Rings, memory([cell(NewIdx, NewV)|Mem]), P)) :-
% Here we jump to a new memory address, given that that cell exists.
% If the cell doesn't exist, create it and add it to the memory store;
% thereby "lazily" growing the memory store as needed.
% Remember, it's not a memory leak, it's lazy growth of the memory store
% (not in any respect resembling a memory leak ... *cough*).
Memory = [cell(Idx, _)|_],
ring(ops, _, _, value(X)) <- Rings,
floor(X, Y),
NewIdx is Y + Idx,
(takeout(cell(NewIdx, SomeNewV), Memory, SomeMem) ->
NewV = SomeNewV,
Mem = SomeMem
NewV = 0,
Mem = Memory).

The highlighted goal is incorrect: it should wrap the number in the
value/1 term. We make that correction...

NewV = value(0),

... and rerun the system:

?- read_file_to_codes('hello.wr', Hello, []), whirl(Hello).
Hello, World!
eeh, Whked!


So, we do print out "Hello, World!" but then it seems the program doesn't
know how to stop.

Round 5: Hello, World!

Looking at how the interpreter works, we see that the
command is executed, and then interpret_next_instruction/2 is called.
We can therefore, after several
,10 develop
an aspect to follow the exit command, to see if there is an issue with
that command:

:- aspect(after, command(Cmd, _, _),
(Cmd = exit =>
(format('Calling interpret_next_instruction/2 with exit:~n', []),

Results (Round 5)

When we run the whirl interpreter with the above aspect advice, we see the
following debug trace:

?- read_file_to_codes('hello.wr', Hello, []), whirl(Hello).
Hello, World!
Calling interpret_next_instruction/2 with exit:
% Spy point on interpret_next_instruction/2
Call: (2,708) interpret_next_instruction(program(math, history(executed, last_instruction(0)), _G2841984, _G2841985, program([])), _L324) ? skip
Fail: (2,708) interpret_next_instruction(program(math, history(executed, last_instruction(0)), _G2841984, _G2841985, program([])), _L324) ? retry
Call: (2,708) interpret_next_instruction(program(math, history(executed, last_instruction(0)), _G2841984, _G2841985, program([])), _L324) ? creep
Call: (2,709) program(math, history(executed, last_instruction(0)), _G2841984, _G2841985, program([]))=program(_G2842026, _G2842027, _G2842028, _G2842029, program([_G2842037-_G2842038|_G2842035])) ? skip
Fail: (2,709) program(math, history(executed, last_instruction(0)), _G2841984, _G2841985, program([]))=program(_G2842026, _G2842027, _G2842028, _G2842029, program([_G2842037-_G2842038|_G2842035])) ? creep
Fail: (2,708) interpret_next_instruction(program(math, history(executed, last_instruction(0)), _G2841984, _G2841985, program([])), _L324) ?

Looking at the implementation of implement_next_instruction/2, the problem
becomes obvious: I have a cut in the main (only) clause, but no "otherwise"
clause to catch failed tests in that predicate.

interpret_next_instruction(P0, P) :-
% Loops until we run out of tokens to interpret
P0 = program(Active, Hist, Rings, Mem, program([Idx - Op|Codes])),
NewIndex is Idx + 1,
(takeout(NewIndex - Code, [Idx - Op|Codes], Program) ->
interpret(program(Active, Hist, Rings, Mem,
program([NewIndex - Code|Program])), P)
P = P0).

My comment for the clause indicated my intent (but not what actually
happens), so it appears to be a slip in rigor. A simple catch-all
secondary clause fixes this issue:

interpret_next_instruction --> [].

Running the new interpreter shows we've found the errors preventing
"Hello, World!" from working:

?- read_file_to_codes('hello.wr', Hello, []), whirl(Hello).
Hello, World!

Hello = [49, 49, 48, 48, 49, 49, 49, 48, 48|...]


Debugging: Wrap Up

So, we have a working "Hello, World!" program; all is right with
the world again (after all, a computer without a "Hello, World!"
program would certainly precipitate the final trumpet call). Reviewing
the development of the debugging aspects, it is easy to see their
usefulness, but there are tradeoffs to consider.

In reality, I would begin a comprehensive review of the source code
after round 3. From that review
and from the program state I determined that I was pushing untyped
values into the memory. So the aspects were useful in helping
to fix the interpreter, but there comes a point where developing
a specific debugging aspect is more costly than manual search
and correction.

Also, using aspects do not guarantee correctness or completeness.
The aspects we developed uncovered one obvious error (the insertion
of untyped values into memory) and only hinted at another
(a non-determinate interpret_next_instruction/2 predicate where
a determinate one was needed). There were two more untyped
memory insertions in the code, both occuring on the input modes
of the IO commands, but since "Hello, World!" never exercised those
modes, the aspects did not uncover these errors. Comprehensive unit
testing is still necessary, even with debugging aspects.

The outcome from the above debugging process is a correct (as far as
we can tell) Whirl interpreter.

The Problem: Spacetime

From Debugging Aspects to Production-tuning Aspects

So, an obvious use for aspects are to reduce the effort required
to debug program. Unlike tracing, however, their usefulness does
not end there. Aspects open the implementation of systems: this
is useful, for example, in "consequence-free"
exploration of alternatives in the implementation (prototyping and
We will explore this facet of aspects in this section.

First let us examine the profiling information on the (now working)
"Hello, World!" run:

?- read_file_to_codes('hello.wr', Hello, []), profile(whirl(Hello)).
Hello, World!
Total time: 1.81 seconds
Predicate Box Entries = Calls+Redos Time
$garbage_collect/1 6 = 6+0 60.8%
takeout/3 5,384 = 5,384+0 39.2%
quintus:module_expansion_argument/1 3 = 3+0 0.0%
quintus:meta_predicate/1 2 = 2+0 0.0%
quintus:floor/2 648 = 648+0 0.0%
quintus:meta_predicate/2 2 = 2+0 0.0%
lists:member/2 3 = 2+1 0.0%
$start_consult/1 1 = 1+0 0.0%
$style_check/2 2 = 2+0 0.0%
var/1 6 = 6+0 0.0%
$load_file/5 2 = 1+1 0.0%
$dochk_file/4 3 = 1+2 0.0%
$extend_file/3 1 = 1+0 0.0%
$set_predicate_attribute/3 3 = 3+0 0.0%
$execute_directive/2 12 = 4+8 0.0%
preprocessor/2 1 = 1+0 0.0%
with_mutex/2 2 = 2+0 0.0%
load_files/2 1 = 1+0 0.0%
$load_file/6 1 = 1+0 0.0%
$expand_directive/2 4 = 4+0 0.0%
statistics/2 4 = 4+0 0.0%
repeat/0 68 = 2+66 0.0%
$member/2 2 = 1+1 0.0%
atom_concat/3 4 = 4+0 0.0%
assert/1 1 = 1+0 0.0%

Hello = [49, 49, 48, 48, 49, 49, 49, 48, 48|...]


What we see here is the majority of time is taken by the takeout/3
predicate and the resulting garbage collection. The takeout/3
predicate is used pervasively throughout the interpreter, so it
may not be its implementation that needs improvement (as the
profiling information seems to indicate), but it may be how
the data are structured that causes takeout/3 to grab our
attention. In order to make this determination, we should be
clear in the code what we are using takeout/3 for. This requires
some refactoring of the source code to make the intentions

After refactoring the source code, we have
version 0.03 of the
which changes the interpret/2 from indirect recursion (via
interpret_next_instruction/2) to a directly recursive predicate. We
also add a layer of description on top of takeout/3 to explicate the
particular use for the predicate (e.g.: ring selection, command
selection, or instruction selection). After this very simple
refactoring, the profiling information improves dramatically ...

?- read_file_to_codes('hello.wr', Hello, []), profile(whirl(Hello)).
Hello, World!
Total time: 0.69 seconds
Predicate Box Entries = Calls+Redos Time
list_utils:takeout/3 3,469 = 3,468+1 86.5%
bump_program_counter/2 1,351 = 1,351+0 2.7%
get_selected_command/3 327 = 327+0 2.7%
history_is/4 1,351 = 1,351+0 2.7%
switch_wheel/2 327 = 327+0 2.7%
$garbage_collect/1 36 = 36+0 2.7%
arithmetic/4 62 = 62+0 0.0%
change_active_ring_value_by/6 64 = 64+0 0.0%
command/3 351 = 327+24 0.0%
initialize_program/4 54 = 1+53 0.0%
whirl/1 1 = 1+0 0.0%
switcheroo/2 328 = 327+1 0.0%
instruction/3 1,350 = 1,350+0 0.0%
rotate/2 583 = 583+0 0.0%
execute_command/2 1,207 = 767+440 0.0%
initialize_ops_ring/1 1 = 1+0 0.0%
get_command/3 583 = 583+0 0.0%
spin_reverse/2 767 = 767+0 0.0%
execute/3 1,350 = 1,350+0 0.0%
reverse_spin/2 767 = 767+0 0.0%
initialize_math_ring/1 1 = 1+0 0.0%
selected_memory_cell/3 124 = 124+0 0.0%
initialize_memory/1 1 = 1+0 0.0%
adjust/2 1,137 = 583+554 0.0%
interpret/2 2 = 1+1 0.0%

Hello = ["110011100111000001111100000001000011111000011111100000000010"|"..."]


... so much so that the "Hello, World!" program is no
longer a viable target to use aspects to improve the interpreter's
.12 We need a
more demanding program to expose obvious optimization opportunities.

100 Bottles of Hefe Weizen of the Wall

Fortunately, there is a larger program than "Hello, World!", by
Kang Seonghoon, from the same author: beer.wr!
This program has approximately 10 times the number of instructions
and uses more commands to accomplish its goal. Running the program
with the profiler shows firstly that it takes an excessive amount
of time to execute, and that, of course, takeout/3 is the culprit:

?- read_file_to_codes('beer.wr', Hefe, []), profile(whirl(Hefe)).
99 bottles of beer on the wall,
99 bottles of beer.
Take one down, pass it around,
98 bottles of beer on the wall.

98 bottles of beer on the wall,
2 bottles of beer on the wall,
2 bottles of beer.
Take one down, pass it around,
1 bottle of beer on the wall.

1 bottle of beer on the wall,
1 bottle of beer.
Take one down, pass it around,
No bottles of beer on the wall.

Total time: 916.43 seconds
Predicate Box Entries = Calls+Redos Time
takeout/3 838,549 = 838,548+1 71.5%
$garbage_collect/1 8,886 = 8,886+0 27.8%
is/2 865,980 = 865,980+0 0.1%
=/2 2,010,585 =2,010,585+0 0.1%
rotate/2 148,885 = 148,885+0 0.1%
bump_program_counter/2 330,798 = 330,798+0 0.1%
interpret/2 2 = 1+1 0.1%
<-/2 622,614 = 622,612+2 0.1%
update_ring/3 340,101 = 340,101+0 0.0%
reverse_spin/2 181,912 = 181,912+0 0.0%
execute_command/2 287,739 = 181,912+105,827 0.0%
history_is/4 330,798 = 330,798+0 0.0%
get_active_ring/3 341,419 = 341,419+0 0.0%
command/3 88,575 = 76,085+12,490 0.0%
put/1 10,889 = 10,889+0 0.0%
</2 148,885 = 148,885+0 0.0%
instruction/3 330,797 = 330,797+0 0.0%
history/4 181,912 = 181,912+0 0.0%
get_selected_command/3 76,085 = 76,085+0 0.0%
adjust/2 294,806 = 148,885+145,921 0.0%
execute/3 330,797 = 330,797+0 0.0%
switch_wheel/2 76,085 = 76,085+0 0.0%
initialize_program/4 7,222 = 1+7,221 0.0%
quintus:floor/2 178,538 = 178,538+0 0.0%
direction/2 148,885 = 148,885+0 0.0%

Hefe = ["This amazing implementation of "99 Bottles of Beer" was done"|"..."]


Wow! That profile amassed quite a bit of information; information that
does not tell us directly which takeout/3 user is consuming most of the
resources. There are two traditional approaches: diving right in,
which we do not cover
,13 and "working
harder, not smarter". That is, find all of takeout/3's callers, and
map the numbers to the operations.

Those are the traditional approaches; aspects afford an alternative.
The takeout/3 predicate can be wrapped in an aspect as well as its
callers. The callers can establish a specific activation which the
takeout/3 aspect then completes.

Without the profiling information, the aspect approach would be the
best one, but given the wealth of information provided by the profile,
we need not take the extra step of developing an aspect system that
would be much like the one we used to debug the previous version of
the interpreter.

Coalescing the profiling information shows the following takeout/3 or
<-/2 usage:

instruction (program) manipulation
bump_program_counter - 330,798
ring changes
update_ring - 340,101
rotate - 148,885
get_active_ring - 341,419
get_command (not recorded)
get_selected_command - 76,085
command - 88,575

So, the biggest user of takeout/3 or <-/2 are the ring predicates; it is fortuitous that the ring structure is simple and therefore can be easily changed by aspects to prototype improvements. We will now develop a change to the ring structure and predicates using aspects to affect this change.

Minor Surgery

Since we are changing the structure that holds the rings, that means we
are changing part of the program state data structure. This change
requires we introduce a slight change in the interpreter to construct
that structure from a builder predicate instead of manually. In this way,
we can write an aspect around the builder predicate to prototype a term
structure for the rings container. So, the first whirl/1 rule changes

whirl([Whirl|Instructions]) :-
initialize_program([Whirl|Instructions], 0, Program, []),
interpret(program(ops, history(quescient, last_instruction(1)),
[Ops, Math], Memory, program(Program)), _).


whirl([Whirl|Insts]) :-
initialize_program([Whirl|Insts], 0, Opcodes, []),
build_program_state(ops, history(quescient, last_instruction(1)),
Ops, Math, Memory, Opcodes, Program),
interpret(Program, _).

build_program_state(Active, Hist, Ops, Math, Mem, Opcodes,
program(Active, Hist, [Ops, Math], Mem, program(Opcodes))).

Now that we have this change in place, we must also make sure every
ring selection uses the get_active_ring/3 predicate (there remain several
places the active ring is extracted from the Rings list directly). With
this change and the above program state builder predicate, we have
version 0.04 of the interpreter.
To change the representation of the rings data structure, we write aspects
around the program state builder predicate, and all predicates using that
structure directly. It turns out that there are only two such predicates:
get_active_ring/3 and update_ring/3, so the task of writing the aspects
is rather simple.

:- use_module(library(aspects), [aspect/3, aspect_declaration/1]).

:- use_module(library(ring_utils)).
:- aspect_declaration(use_module(library(ring_utils))).

:- aspect(around, build_program_state(A, H, Ops, Math, Mem, Opcodes, Program),
Program = program(A, H, rings(Ops, Math), Mem, program(Opcodes))).

:- aspect(around, get_active_ring(A, Rings, Ring),
get_active_ring_proxy(A, Rings, Ring)).

:- aspect(around, update_ring(Ring, Old, New),
update_ring_proxy(Ring, Old, New)).

The above aspects change the implementation of the rings structure from
a list to a rings/2 term where the Ops ring is the first argument and the
Math ring the second. Since each ring proxy predicate has more than one
clause, we provide the implementation in a utility module:

:- module(rings_utils, [get_active_ring_proxy/3, update_ring_proxy/3]).

get_active_ring_proxy(ops, rings(Ops, _), Ops).
get_active_ring_proxy(math, rings(_, Math), Math).

update_ring_proxy(ring(ops, I, S, V), rings(_, Math),
rings(ring(ops, I, S, V), Math)).
update_ring_proxy(ring(math, I, S, V), rings(Ops, _),
rings(Ops, ring(math, I, S, V))).

With the aspect definition file
( and the utility file
containing the ring data access proxies
(, we weave the aspects into
the interpreter and profile the running system:

Total time: 916.18 seconds
Predicate Box Entries = Calls+Redos Time
list_utils:takeout/3 498,447 = 498,447+0 71.6%
$garbage_collect/1 8,863 = 8,863+0 27.6%
is/2 865,980 = 865,980+0 0.1%
bump_program_counter/2 330,798 = 330,798+0 0.1%
=/2 2,010,586 =2,010,586+0 0.1%
get_active_ring/3 447,157 = 447,157+0 0.1%
tintin:around_user_get_active_ring_aspect/3447,157 = 447,157+0 0.0%
rotate/2 148,885 = 148,885+0 0.0%
ring_utils:update_ring_proxy/3 340,102 = 340,101+1 0.0%
command/3 88,575 = 76,085+12,490 0.0%
interpret/2 2 = 1+1 0.0%
reverse_spin/2 181,912 = 181,912+0 0.0%
syntax:<-/2 175,455 = 175,455+0 0.0% history_is/4 330,798 = 330,798+0 0.0% instruction/3 330,797 = 330,797+0 0.0% execute/3 330,797 = 330,797+0 0.0% adjust/2 294,806 = 148,885+145,921 0.0% execute_command/2 287,739 = 181,912+105,827 0.0% tintin:around_user_update_ring_aspect/3340,101 = 340,101+0 0.0% ring_utils:get_active_ring_proxy/3 447,157 = 447,157+0 0.0% spin_reverse/2 181,912 = 181,912+0 0.0% update_ring/3 340,101 = 340,101+0 0.0% switch_wheel/2 76,085 = 76,085+0 0.0% get_selected_command/3 76,085 = 76,085+0 0.0% get_command/3 148,885 = 148,885+0 0.0%

How disappointing! The "optimization" we made, replacing ring selection
from list element
access to term argument access, had no noticeable effect on program
runtime, even with the large numbers of calls to these predicates.
The list contains only two elements, however, and this may be the reason
that this "optimization" yielded no results, in spite of the fact that
term argument access occurs in constant time and list element access
occurs in linear time. Although this was a mistaken effort, it was not
a destructive one (as it would be if aspects were not used to develop it).
If we choose, we can simply discard the aspect and utility file, and the
target source code "automagically" reverts to its original implementation.
Without aspects, these changes would need to be removed from the target
source code manually, either by reverting to the previous revision under
source code control or by user intervention. Aspects create a safety net:
the previous version is immediately present once the aspects are removed.

Major Surgery...

There are larger lists being processed by the interpreter: the program,
itself, can be large (in the case of
beer.wr, there are over 15k instructions), and
each ring has a set of 12 command. Program instruction access occurs
more often then command access, and the instructions can be deep in the
list, so we will prototype changes to the program instruction handler
next. This change is not so trivial; the
Whirl interpreter document recommends
that we compile the instructions into a set of instruction/2 facts,
instead of storing the instructions in a list, as the interpreter does.
This change will allow Prolog's speedier indexing of clauses in a
predicate to improve runtime performance. This change,
like the one for ring access, requires the
same steps: modifying the program state data structure and then modifying
clauses that access the instructions from the state, but it also requires
that we change the instruction scanner so that it pushes the instructions
as facts into the Prolog database. As assert/1 is an expensive operation,
doing so in bulk, by compiling these facts from a file, is probably the
best approach, this compilation is also a change from the previous
optimization attempt.

As before, we need to modify the target source code slightly so that
we can use aspects to modify the system externally. Specifically,
we need to externalize the point when the instruction scanning routines
push the instructions onto the internal representation:

initialize_program([Op|Codes], Index) -->
({ Code is Op - 48,
Code <- [0, 1] } ->
[Index - Code],
{ NewIndex is Index + 1 }
{ NewIndex = Index }),
initialize_program(Codes, NewIndex).
initialize_program([], _) --> [-1 - error].


initialize_program([Op|Codes], Index) -->
({ Code is Op - 48,
Code <- [0, 1] } ->
push_instruction(Index, Code),
{ NewIndex is Index + 1 }
{ NewIndex = Index }),
initialize_program(Codes, NewIndex).
initialize_program([], _) --> push_instruction(-1, error).

push_instruction(Index, Opcode) --> [Index - Opcode].

This change and by rolling the previous ring
modification into the interpreter
begets version
of the interpreter, which we can modify with aspects to compile
a Whirl's program instructions into memory. The first set of aspects
change how the initialize_program/4 predicate stores the scanned

:- aspect(after, initialize_memory(_),
write('% auto-generated: instruction/2 contains the whirl program'),
nl, nl)).

:- aspect(before, build_program_state(_, _, _, _, _, _, _),
(told, compile(the_whirl_program))).

:- aspect(around, push_instruction(Index, Opcode, A, B),
(format('instruction(~d, ~w).~n', [Index, Opcode]), A = B)).

Interestingly, none of these aspects even work with the
initialize_program/4 predicate directly: the after and before aspects
manage the file into which the instruction/2 facts are written, and the
around aspect uses the newly created push_instruction/4 helper predicate
to do the task of writing those instructions into the file. Since the
instructions are stored in memory now, not in a list, we need to modify
the program state data structure to match this change...

:- aspect(around, build_program_state(A, H, Ops, Math, Mem, Opcodes, Program),
Program = program(A, H, rings(Ops, Math), Mem, instruction_index(0))).

... and how instructions are obtained (including manipulating the program

:- aspect(around, build_program_state(A, H, Ops, Math, Mem, Opcodes, Program),
Program = program(A, H, rings(Ops, Math), Mem, instruction_index(0))).

:- aspect(around, bump_program_counter(P0, P1),
(P0 = program(A, H, R, M, instruction_index(Index)),
NewIndex is Index + 1,
P1 = program(A, H, R, M, instruction_index(NewIndex)))).

:- dynamic instruction/2.
:- aspect_declaration(dynamic instruction/2).

:- aspect(around, instruction(Instruction, P, P),
(P = program(A, H, R, M, instruction_index(Index)),
instruction(Index, Instruction))).

Finally, we change any commands that manipulate the program counter.
The Whirl language only has two such commands: exit and padd.

:- aspect(around, command(Command, P0, P1),
(Command = exit,
P1 = program(_, _, _, _, instruction_index(-999)))).

:- ensure_loaded(library(math)).
:- aspect_declaration(ensure_loaded(library(math))).

:- aspect(around, command(Command, P0, P1),
(Command = padd,
P0 = program(ops, H, Rings, M, instruction_index(Index)),
get_active_ring(ops, Rings, ring(_, _, _, value(X))),
floor(X, Y),
NewIndex is Index + Y,
(instruction(NewIndex, _) -> true;
raise_exception(sigsegv(no_address(NewIndex), from(Index)))),
NewerIndex is NewIndex - 1,
P1 = program(ops, H, Rings, M, instruction_index(NewerIndex)))).

:- aspect(around, command(_, _, _), prove_goal).

We may now weave these instruction
in place and profile the new interpreter:

?- read_file_to_codes('beer.wr', Hefe, []), profile(whirl(Hefe)).
% the_whirl_program compiled into insies 0.14 sec, 1,002,980 bytes
99 bottles of beer on the wall,
Take one down, pass it around,
No bottles of beer on the wall.

Total time: 9.37 seconds
Predicate Box Entries = Calls+Redos Time
$garbage_collect/1 208 = 208+0 29.9%
list_utils:takeout/3 167,055 = 167,055+0 13.9%
=/2 2,179,320 =2,179,320+0 7.4%
is/2 865,981 = 865,981+0 3.7%
rotate/2 148,885 = 148,885+0 3.5%
insies:around_user_bump_program_counter_aspect/2330,797 = 330,797+0 3.3%
insies:instruction/2 331,096 = 331,096+0 2.9%
reverse_spin/2 181,912 = 181,912+0 2.9%
syntax:<-/2 175,453 = 175,453+0 2.9% interpret/2 2 = 1+1 2.7% history_is/4 330,797 = 330,797+0 2.7% update_ring/3 340,101 = 340,101+0 2.7% get_active_ring/3 447,157 = 447,157+0 2.2% instruction/3 330,798 = 330,798+0 2.0% bump_program_counter/2 330,797 = 330,797+0 2.0% insies:around_user_instruction_aspect/3330,798 = 330,798+0 1.4% history/4 181,912 = 181,912+0 1.4% execute/3 330,797 = 330,797+0 1.2% get_command/3 148,885 = 148,885+0 1.2% execute_command/2 287,739 = 181,912+105,827 1.0% adjust/2 294,806 = 148,885+145,921 1.0% command_aspect_target/3 88,473 = 75,985+12,488 1.0% read_clause/2 15,558 = 15,558+0 1.0% $record_clause/3 15,557 = 15,557+0 1.0% insies:around_user_command_aspect/3 228,353 = 76,085+152,268 0.6% Hefe = ["This amazing implementation of "99 Bottles of Beer" was done"|"..."] Yes ?-

Now this is the kind of return on our efforts that we are looking for --
we sped up the running Whirl program by a factor of nearly 100 fold.
This code improvement, prototyped using aspects, can be rolled into the
source code base, as this implementation choice seems to be a clear
winner for the interpreter.

... and a bit of patch-up

Reviewing the most recent runtime profile information, we see that
the percentages have dropped considerably from the first iteration of the
interpreter. This is good: in general, optimization should only be
attempted on predicates that consume more than 10% of the runtime
resources. There is still one area that could be improved: the commands
are stored in a list and use takeout/3 and <-/2 for selection and
rotation. Converting the list representation to a set of facts in a
predicate will improve the runtime efficiency of the interpreter. This
effort is justified, as it is a more modest effort than that of the
program instruction reorganization.

As before, we roll the aspect prototype code into the target source
code, which gives us version 0.06
of the Whirl interpreter. With this new version, we can apply our
changes, prototyping them with aspects. Up to now, the commands were
stored as a list in each ring where the command at the head of the list
is the selected one. We will change this representation: the commands
will be ordered not numerically, but by relation in a set of facts in
a predicate:

:- module(command_utils, [command/2]).

:- op(200, xfx, ::).

% the circle of ops commands:
command(ops::noop, ops::exit).
command(ops::exit, ops::one).
command(ops::one, ops::zero).
command(ops::zero, ops::load).
command(ops::load, ops::store).
command(ops::store, ops::padd).
command(ops::padd, ops::dadd).
command(ops::dadd, ops::logic).
command(ops::logic, ops::if).
command(ops::if, ops::intIO).
command(ops::intIO, ops::ascIO).
command(ops::ascIO, ops::noop).

% the circle of math commands:
command(math::noop, math::load).
command(math::load, math::store).
command(math::store, math::add).
command(math::add, math::mult).
command(math::mult, math::div).
command(math::div, math::zero).
command(math::zero, math::'<').
command(math::'<', math::'>').
command(math::'>', math::'=').
command(math::'=', math::not).
command(math::not, math::neg).
command(math::neg, math::noop).

(We adhere to the ring-command separator syntax used in the
Whirl programming community

This new utility module allows us to replace the unordered-but-indexed
commands stored in a list in each ring with aspects that use the above
relations to select the appropriate command. First, we will change the
structure of the rings storing the command set; they will now store the
command selected:

:- op(200, xfx, ::).
:- aspect_declaration(op(200, xfx, ::)).

:- op(200, yf, spin).
:- aspect_declaration(op(200, yf, spin)).

:- aspect(around, initialize_ops_ring(Ring),
Ring = ring(ops, ops::noop, clockwise spin, value(0))).
:- aspect(around, initialize_math_ring(Ring),
Ring = ring(math, math::noop, clockwise spin, value(0))).

Once we change the ring structure, we need to change how predicates
interact with them, specifically when choosing or manipulating
commands. It turns out that there are only two
predicates that do this (get_selected_command/3 and

:- use_module(library(command_utils), [rotate_in_direction/3]).
:- aspect_declaration(use_module(library(command_utils), [rotate_in_direction/3])).

:- aspect(around, rotate(P0, P1),
(P0 = program(Active, Hist, Rings0, Mem, Index),
P1 = program(Active, Hist, Rings1, Mem, Index),
get_active_ring(Active, Rings0, ring(_, Active::Cmd, Spin, Val)),
rotate_in_direction(Spin, Active::Cmd, NewCmd),
update_ring(ring(Active, NewCmd, Spin, Val), Rings0, Rings1))).

:- aspect(around, get_selected_command(Cmd, P0, P1),
(P0 = program(Active, _, Rings, _, _),
P1 = P0,
get_active_ring(Active, Rings, ring(_, Active::Cmd, _, _)))).

The rotate/2 aspect calls for something more than a simple command/2
relation expressed in the command_utils
(I am referring to the rotate_in_direction/3
call), so we add a few DCGs to the command module to finish out this

rotate_in_direction(clockwise spin) --> clockwise_command.
rotate_in_direction(counterclockwise spin) --> counterclockwise_command.

clockwise_command --> command.
counterclockwise_command(A, B) :- command(B, A).

... and we are done! The profile of the running system yields the
following results:

Total time: 7.13 seconds
Predicate Box Entries = Calls+Redos Time
$garbage_collect/1 163 = 163+0 32.0%
list_utils:takeout/3 18,170 = 18,170+0 10.1%
bump_program_counter/2 330,797 = 330,797+0 6.9%
=/2 2,048,350 =2,048,350+0 6.3%
interpret/2 2 = 1+1 4.0%
instruction/2 331,096 = 331,096+0 4.0%
is/2 419,326 = 419,326+0 3.4%
command_utils:command/2 148,885 = 148,885+0 3.4%
chain:around_user_rotate_aspect/2 148,885 = 148,885+0 2.9%
instruction/3 330,798 = 330,798+0 2.6%
reverse_spin/2 181,912 = 181,912+0 2.1%
spin_reverse/2 181,912 = 181,912+0 2.1%
execute/3 330,797 = 330,797+0 1.9%
history_is/4 330,797 = 330,797+0 1.9%
rotate/2 148,885 = 148,885+0 1.6%
chain:around_user_get_selected_command_aspect/376,085 = 76,085+0 1.3%
execute_command/2 287,739 = 181,912+105,827 1.3%
update_ring/3 340,101 = 340,101+0 1.3%
command/3 88,573 = 76,085+12,488 1.3%
get_selected_command/3 76,085 = 76,085+0 1.3%
format/2 15,558 = 15,558+0 1.3%
command_utils:rotate_in_direction/3 148,885 = 148,885+0 1.3%
get_active_ring/3 447,157 = 447,157+0 1.1%
switch_wheel/2 76,085 = 76,085+0 1.1%
history/4 181,912 = 181,912+0 0.8%

This improvement shaved off a second or two from the runtime execution,
but, more importantly, the "weight" of the execution is now more evenly
distributed throughout the interpreter.

Taking Out takeout/3

The only predicate above 10% in the profile now is our long-standing
friend, takeout/3. Is there a way to eliminate it entirely from the
interpreter? Yes, we transform the representation of memory from a
list to a set of dynamic predicates. Let's do that. First, we roll
the changes from above into our interpreter, giving us
version 0.07. In so doing, we
observe that some predicates have become superfluous, so we discard them
(specifically, the different initialize_<type>_ring/1 commands are
reduced to one initialize_ring/2 fact, direction/2 is replaced by
rotate_in_dirction/3, and get_command/3 and adjust/2 are discarded).
Next, we replace the list representation of memory/1 in the program/5
term with a dynamic memory/2 predicate using aspects...

:- dynamic memory/2.
:- aspect_declaration(dynamic memory/2).

:- aspect(around, initialize_memory(Mem),
(Mem = memory_index(0),
retractall(memory(_, _)),
assert(memory(0, value(0))))).

:- aspect(around, selected_memory_cell(Val, P0, P1),
(P0 = program(A, H, R, memory_index(X), P),
memory(X, value(Val)),
P1 = P0)).

...and all the commands that change the memory layout (which are defined
in With that
we are ready to test the takeout/3-free system:

Total time: 6.17 seconds
Predicate Box Entries = Calls+Redos Time
$garbage_collect/1 142 = 142+0 31.0%
=/2 1,963,567 =1,963,567+0 6.8%
bump_program_counter/2 330,797 = 330,797+0 5.2%
rotate/2 148,885 = 148,885+0 4.7%
interpret/2 2 = 1+1 4.7%
instruction/2 331,096 = 331,096+0 4.2%
update_ring/3 340,101 = 340,101+0 3.9%
instruction/3 330,798 = 330,798+0 3.7%
command_utils:command/2 148,885 = 148,885+0 3.4%
history_is/4 330,797 = 330,797+0 2.9%
get_active_ring/3 447,157 = 447,157+0 2.9%
execute_command/2 287,739 = 181,912+105,827 2.9%
mem:around_user_command_aspect/3 321,536 = 76,085+245,451 2.4%
reverse_spin/2 181,912 = 181,912+0 2.4%
is/2 419,326 = 419,326+0 2.4%
execute/3 330,797 = 330,797+0 2.1%
read_clause/2 15,558 = 15,558+0 2.1%
command_utils:rotate_in_direction/3 148,885 = 148,885+0 1.8%
format/2 15,558 = 15,558+0 1.8%
command/3 76,284 = 76,284+0 1.3%
history/4 181,912 = 181,912+0 1.0%
command_aspect_target/3 58,002 = 56,796+1,206 1.0%
switch_wheel/2 76,085 = 76,085+0 0.8%
expand_term/2 46,674 = 15,558+31,116 0.8%
selected_memory_cell/3 17,594 = 17,594+0 0.5%

That did it! This implementation is nearly 150 times faster than the
first working revision
(version 0.02). Rolling in the
changes, including eliminating that pesky takeout/3, gives us an improved
version, using asserts.

Coda: takeout/3

We've "solved" the "problem" of takeout/3, but by introducing
extra-logical features into the interpreter (assert/1, retract/1,
and retractall/1). Time is time, but, if possible, I would prefer to
avoid using facilities that make the program more unwieldy. So, instead
of using dynamic facts, let's explore alternatives. The first, obvious,
one is to replace takeout/3 with the nearly equivalent delete/3. This
approach seems good: as delete/3 is implemented by the Prolog
implementation, it should be faster, taking advantage of
lower-level code available to the Prolog implementation. Replacing
takeout/3 itself with an aspect, however, is not so easy, as the aspects
module itself uses takeout/3 in its implementation, meaning that the
aspects load and compile takeout/3 before they can (in this case, cannot)
modify the predicate. For this special case, a manual search and replace
is required, but, fortunately, as there is only one call to takeout/3 left
in the source code (in version
), this task is not too difficult (resulting in
version 0.07 using delete/3
instead of takeout/3

Profiling the running executable shows surprising results...

Total time: 7.38 seconds
Predicate Box Entries = Calls+Redos Time
$garbage_collect/1 162 = 162+0 29.9%
lists:delete/3 18,170 = 18,170+0 16.8%
=/2 1,692,663 =1,692,663+0 8.9%
bump_program_counter/2 330,797 = 330,797+0 4.1%
instruction/2 331,096 = 331,096+0 3.3%
instruction/3 330,798 = 330,798+0 3.3%
interpret/2 2 = 1+1 3.3%
is/2 419,326 = 419,326+0 3.3%
update_ring/3 340,101 = 340,101+0 3.0%
history_is/4 330,797 = 330,797+0 2.8%
reverse_spin/2 181,912 = 181,912+0 2.3%
execute_command/2 287,739 = 181,912+105,827 2.0%
get_active_ring/3 447,157 = 447,157+0 2.0%
rotate/2 148,885 = 148,885+0 2.0%
command_utils:command/2 148,885 = 148,885+0 1.8%
history/4 181,912 = 181,912+0 1.5%
command_utils:rotate_in_direction/3 148,885 = 148,885+0 1.5%
read_clause/2 15,558 = 15,558+0 1.3%
command/3 106,549 = 76,085+30,464 1.0%
switch_wheel/2 76,085 = 76,085+0 0.8%
format/2 15,558 = 15,558+0 0.8%
initialize_program/2 7,222 = 1+7,221 0.5%
execute/3 330,797 = 330,797+0 0.5%
spin_reverse/2 181,912 = 181,912+0 0.5%
flag/3 15,572 = 15,572+0 0.5%

... execution time actually increased over the takeout/3
implementation! Moral of the story: when working with lists, use
takeout/3; and avoid delete/3.

Another approach is to change the representation of the data structure
storing the memory cells. For the takeout/3 implementation, they are
stored in a list where the head cell is the currently selected one and
the other cells are indexed but unordered. Storing the cells in a list
give O(n), or linear, search time characteristics.
Trees, such as
binary, 2-3, or AVL trees, give better results, attaining O(log n) search
.16 For small,
constantly changing data sets, trees may actually take longer, due to the
extra housekeeping involved. The payoff occurs with larger data sets.
The takeout/3 implementation's profile indicates takeout/3 was called over
18,000 times, so let's see how storing the memory cells in an
AVL tree does in comparison to the takeout/3
implementation. Once again, we change the representation of the program/5
state term. This time, as the root node of the AVL tree is not necessarily
the currently selected node, we need to add an index argument to the
memory store. An aspect serves this need:

:- aspect(around, initialize_memory(Mem),
avl_put(Empty, 0, value(0), AVL),
Mem = memory(0, AVL))).

Now, wherever we access or change memory, we wrap that behavior with an
aspect replacing list access with AVL access:

:- use_module(library(avl), [
:- aspect_declaration(use_module(library(avl), [
:- aspect(around, selected_memory_cell(Datum, P0, P1),
(P0 = program(_, _, _, memory(Index, AVL), _),
avl_get(AVL, Index, value(Datum)),
P1 = P0)).

... and replace the implementations of the commands that access memory
(as we did for the asserts
). This aspect implementation can be found in Weaving these aspects
into the takeout/3
gives us the following profiled result:

Total time: 6.44 seconds
Predicate Box Entries = Calls+Redos Time
$garbage_collect/1 140 = 140+0 27.5%
=/2 1,985,908 =1,985,908+0 9.6%
instruction/2 331,096 = 331,096+0 6.5%
update_ring/3 340,101 = 340,101+0 5.9%
command_utils:command/2 148,885 = 148,885+0 4.8%
is/2 419,778 = 419,778+0 4.5%
interpret/2 2 = 1+1 4.2%
bump_program_counter/2 330,797 = 330,797+0 3.7%
history_is/4 330,797 = 330,797+0 3.4%
reverse_spin/2 181,912 = 181,912+0 3.4%
rotate/2 148,885 = 148,885+0 2.8%
execute/3 330,797 = 330,797+0 2.5%
command/3 76,284 = 76,284+0 1.7%
format/2 15,558 = 15,558+0 1.7%
command_utils:rotate_in_direction/3 148,885 = 148,885+0 1.4%
avli:around_user_command_aspect/3 321,536 = 76,085+245,451 1.1%
spin_reverse/2 181,912 = 181,912+0 1.1%
switch_wheel/2 76,085 = 76,085+0 1.1%
instruction/3 330,798 = 330,798+0 1.1%
initialize_program/2 7,222 = 1+7,221 1.1%
history/4 181,912 = 181,912+0 0.8%
get_active_ring/3 447,157 = 447,157+0 0.8%
command_aspect_target/3 58,002 = 56,796+1,206 0.8%
execute_command/2 287,739 = 181,912+105,827 0.8%
read_clause/2 15,558 = 15,558+0 0.8%

It has a runtime execution speed better than the takeout/3 implementation,
and comparable to the assert/retract version. Since the two
divergent systems are of comparable speed, we are left with a choice:
choose the
memory-as-facts implementation
and lose visible runtime
details and lose confidence that the system is (provably) correct; or,
choose the memory-as-AVL
tree implementation
with an immediate debugging advantage and
knowledge that the model is logically consistent "all the way down". Of
course the way to go is the latter, but it does highlight that
alternatives exist: often software engineers "give in" to compromising
system fundamentals for expediency. This compromise need not be made:
improving algorithmic choices pays off both in code clarity and in
runtime results.

Spacetime Summary

This extended example demonstrated prototyping a series of optimizations
on a working system, iteratively, and safely, rolling in the
improvements until we obtained a new working system significantly
smaller in its memory footprint and significantly faster in its runtime
execution. We saw in this example that aspects do not necessarily
guide one to a correct approach, but we also saw that deadends could be
discarded without the usual recovery costs involved. So, aspects not only
act as excellent meta-debuggers, but also are a workable way to change the
implementation of a working system safely.

Production Aspect Survey

We've seen how aspects can be used in product development from the
perspectives of debugging nascent code and then prototypting optimizations
for working code (that are eventually rolled into the source base when
verified). There is also another use for aspects -- production aspects,
or aspects advising delivered code that are meant to be free-standing.
These aspects have received less attention from the
,17 and,
due to restrictions on systems I advise with aspects, I can only offer
general descriptions, broken down into categories:

  • Security

    Ken Pelletier describes how he eliminated vunerabilities from a content
    rendering system for a web browser in
    "An AOP success story
    from the real world"

  • Notification

    In a rule-based expert system, there is a reporting requirement that
    results in a rule finding based on anomalies detected into the input
    raw data set to the rule manager as it is being parsed into an internal
    representation. As the parse phase preceeds the rule-finding phase,
    implementing this special
    functionality for this one rule using traditional methods (i.e.:
    non-aspect-oriented methods) needlessly
    complicates the entire system, but using a production aspect that observes
    the parse phase and inserts the rule finding into the result set
    keeps the rest of the system simple and clean.

  • Simulation

    It is often the case that working models must be delivered in the context
    of incomplete information or assets. The usual approach (which takes a
    distant second to the most-exercised option: not delivering) is to stand
    up proxies that interact as dummies for the missing resources. This works
    for cleanly-separated sub-systems, but for systems that engage in complex
    dialogues with other assets, the proxy code becomes entangled with the
    production code. Production aspects provide a venue by which proxies may
    be built without emmeshing their simulated functionality into the
    production system.

Another good source for ideas on production aspects is the
, particularly the
Production Aspects section of that manual, which discusses change
monitoring, context-passing, and protocol enforcement.


AOP is liberating! In this document we provided the protocol of the
aspects module and then demonstrated how to use them effectively in
debugging and prototyping with an extended example that yielded
extraordinary improvements to a non-trivial system. Also, the production
aspects we described are essential to the operations of in-production
systems and for other mission critical systems being build from first
principles. AOP is gathering momentum in the software engineering
community using imperative programming languages, with good reason.
Aspects also have relevance and utility with logic programming
languages, and this document and associated module provide a means to
use aspects effectively in rule-based systems.


0 This document, along with the aspects module and
example files are available as a compressed archive at
1 Java seems to be the lead-off technology for
. The first aspect-oriented compiler for Java was created
at Xerox Parc, and then later merged into the Eclipse effort
(Eclipse is a Java IDE) at IBM. AOP's home page is
2 "Open implementation" (oi) was the rallying cry that
launched (what became) the AOP effort at Xerox Parc (The oi home
page is at
). The
manifesto of oi is [Ki96], a delightful
little paper that clearly makes the case for oi -- it should be
required reading for anyone wishing to understand the utility of
3 No surprise there: one of the authors of
[KRB91], Gregor Kiczales, initialed the
AOP movement with his manifesto on oi [Ki96].
The metaobject protocol is a strange, beautiful and powerful
thing -- besides the first reference above [S90],
§ 28.1.8 shows its utility from a reference manual's
4 Programming languages with traditional
(covariant) object-oriented systems
vary a method's
behavior based on the class of the object to which that method
belongs (i.e.: methods are polymorphic based on its containing
class). In contravariant object-oriented systems, method
polymorphism is based on all the participating objects
(contravariance is also known as parametric polymorphism
because method invocation is based on the cartesian product of
the (actual or formal) types of all of its arguments). In
comparing contravariance to covariance, it is possible to
formally speaking contravariance is a
generalisation of covariance.
5 There are several immediate advantages that
gives the programmer (a more esoteric one is
that the metaobject protocol allows users to control method
selection), perhaps the most noticeable advantage over covariance
is that method behavior may be refined by clients simply
by adding, or removing, methods from the generic
function. So, for example, addition (the '+' operator) may be
extended to arrays by adding a '+(<array> a, <array> b)'
method to the '+' generic function. See, e.g.,
for the
adding-arrays example played out, or [F97],
§ 6.1 for a fully-developed and -explained example (for
6 Although Bruce, Cardelli, et al, indicate in
that the surest approach to resolving the types of multiple
arguments is covariant double-dispatch (the paper applauded the
developers of Smalltalk in discovering this approach and using it
to good effect), in practice programmers rarely use
double-dispatch, and in contravariant systems the arguments'
types are usually resolved without difficulty. Besides,
double-dispatch requires thought and effort on the programmer's
part to avoid logic errors (it is facile to juxtapose argument
order mistakenly when they are eventually used); the problems of
juxtaposition do not arise in contravariant languages as the
dispatch is hidden in the language semantics.
7 This is even correct in like of Java 1.5's
addition of annotations
to the language specification.
Annotations are, by their nature, external to (an
object-oriented) language. They also require significantly more
work to implement significantly less functionality that what
aspects provide; so one is lead to wonder at the decision by the
Java team to select annotations over aspects as a language
feature to advise primary constructs of the language...
8 That the "Hello, World!" program is non-trivial
in Whirl should be a very strong hint at the level of effort needed
to do just about anything in the programming language...
9 The implication operator for Prolog (':-', representing
← or ⊂) is actually reverse implication, which for organizing
like clauses together works well. The forward implication operator
used here ('=>', representing → or ⊃) is more often seen in
works using predicate logic.
10 At first I did an aspect around
interpret_next_instruction/2 with the traditional if-then syntax
('->'/2), which worked fine with two clauses for that aspect, each
clause using prove_goal/0 as its last statement. That obviously
should be refectored to a before aspect, but that didn't work
because the test looked for any exit command, even if it wasn't
executed, so I finally switched to an after aspect on command/3 and
used a spypoint/1 to leap to the interpret_next_instruction/2
predicate. Writing aspects is like writing "ordinary" code; it has
its own developement process.
11 I say "consequence-free" because in normal iterative
development one goes from a working (less optimal) system to a working
(more optimal) system often by breaking the implementation. Mistakes
in this iterative development cycle are very costly because it usually
entails rewriting the system's code back to functionality from
scratch. By allowing aspects to modify the implementation, the
developer earns a free safety net:
the underlying implementation works and its source is untouched through
the developement cycle. If an aspect prototype or optimization breaks
the system, it can simply be discarded and the underlying, working,
implementation will resume its role.
12 Optimizing sub-second programs is not something
I need concern myself with. What is unfortunate about the success of the
refactoring is not that it shows how refactoring, qua
refactoring, improves code efficiency, but, instead, it shows that SWI
Prolog handles indirect and direct recursion differently. Converting
indirect recursion to computed jumps is a solved problem (see, e.g.
[Ba95], which inspired the development a complete
Scheme system; but, more to the point,
[Ka99], pp. 56 on, cover Last Call Optimization
(LCO) for Prolog in detail), so it is disappointing that I must code
around what should be a non-issue.
13 It seems that in the software engineering field
this particular approach has received particular attention -- not so much
rigorous theoretical study, but there are large numbers of software
engineers, including this one, who are dillegently applying their best
efforts to explore every possibility of this approach.
14 This change is a good one. Although it does not
improve the runtime execution speed, it does convert the algorithm of ring
access from list processing (which implies all elements are
accessed progressively) to term argument access (which better
fits the non-deterministic (whereever did the word "random" come to be
associated with this kind of element access) nature of accessing the
rings). In the introduction to the
list_utils manual I make this very
recommendation, so it is a good day, indeed, when I follow my own
15 The housekeeping part of the interpreter, which
comprises most of the system, has changed very little through these
iterative improvements.
I attribute this effect of small changes
to the code to accomodate sweeping changes in how the system works to
DCGs, as they allow me to code in the style of expressing intention
without concern for the underlying implementation.
16 Trees are discussed at length in any book on
[Br2001], chapters 9 and 10,
implements these structures in Prolog. I found a better implementation
of the AVL tree in [RL1999], § 5.9.
17 The slow adaptation of production aspects is
after 15 years of research in the functional
programming community, it has taken another 5 years for the general
programmer to become aware of, and to begin to use, aspects. Production
aspects, being the highest risk, require some familiarity with aspects
of lower risk (debugging and prototyping) before being essayed.

Works Consulted

[Ba95] Henry Baker, "CONS Should Not CONS Its Arguments, Part II: Cheney
on the M.T.A.", ACM Sigplan Notices, 30, 9, Nimble Computer
Corp, California, 1995.
[Br2001] Prolog Programming for Artificial Intelligence, 3rd ed., Ivan
Bratko, Addison-Wesley, Great Britain, 2001.
[BC95] Kim Bruce, Luca Cardelli, Giuseppe Castagna, The Hopkins
Object Group, Gary T. Leavens, Benjamin Pierce, "On Binary
Methods", _, _, _, 1995.
[F97] Dylan Programming: An Object-Oriented and Dynamic
, Neal Feinberg, Sonya E. Keene, Robert O. Mathews,
P. Tucker Withington, Addison-Wesley, Massachusetts, 1997.
[Ka99] Warren's Abstract Machine: a Tutorial Reconstruction,
Hassan Aït-Kaci, MIT Press, Massachusetts, 1999,
[Ki96] Gregor Kiczales, "Beyond the Black Box: Open Implementation",
IEEE Software, _, January 1996
[KRB91] The Art of the Metaobject Protocol, Gregor Kiczales,
Jim des Rivières, Daniel G. Bobrow, MIT Press,
Massachusetts, 1991.
[RL1999] Algorithms: A Functional Programming Approach,
Fethi Rabhi, Guy Lapalme, Addison-Wesley, Great Britain, 1999.
[S90] Common Lisp: the Language, 2nd ed., Guy L. Steele,
Jr. Digital Press, Massachusetts, 1990.

author: Douglas M. Auclair (dauclair at hotmail dot com)
date:December 14, 2005
Copyright © 2005, Cotillion Group, Inc. All rights

No comments:

Post a Comment