Monday, December 26, 2005

Archived News: 2005-12

2005-12

`27`:	Mercury compiler for Sparc architecture Solaris 2.8 systems available for download. See the installation instructions to get Mercury running on your SPARC.
`24`:	Mercury compiler for PowerPC architecture Mac OS-X.4 systems available for download. See the installation instructions to get Mercury running on your Mac.

Saturday, December 17, 2005

AVL Module Users' Manual

Purpose

This document⁰ covers the data structure of and the protocol for the module that implements AVL trees .¹ AVL trees are always-balanced binary trees, and, as such, offer better random search-time characteristics than lists. Depending on the implementation and need, AVL trees have comparable access times to that of direct memory access, without incurring the troublesome cost of using extralogical procedures (see Appendix A for implementation options). Quintus Prolog offers an AVL tree module (which is called, module(avl), as is this module) with several differences: it has no documentation (but the source is comprehensible), it embeds the -1,0,+1 height difference into nodes instead of the height of the node, and the Quintus version is somewhat slower in its operations than this module's. This module was developed by Salvador Fandiño on March 21, 2005 under the GPL.

Types and Modality

Throughout this document one will see type and mode declarations. The syntax and semantics of these declarations should be familiar to the Prolog programmer, but for precise meanings, please see the Mercury programming language reference manual [Merc], particularly § 3.2 (user-defined types) for information regarding type, and § 4 (Predicate and function mode declarations) for a discussion of mode and the semantics of modality.

Most of the tokens should have clear meanings, a brief (and imprecise) overview of the determinism categories is as follows:

Maximum number of solutions
Can fail? 0 1 > 0

no erroneous det multi

yes failure semidet nondet

	Maximum number of solutions
Can fail?	0	1	> 0
no	erroneous	det	multi
yes	failure	semidet	nondet

This table is lifted directly from § 6.1 (Determinism categories), which goes into detail on this topic.

Functionality Not Implemented

This AVL module's protocol covers the commonly-used predicates when working with AVL trees. There is some functionality not implemented here, for the reason that this functionality is either an academic curiosity or is rarely used. We mention this functionality here and cover some implementations in Appendix B.

Traversal Options

This module has one nondeterministic approach to collecting elements (which resolves to a minimum index to a maximum index scan of the tree), and one predicate that converts the AVL tree to a list (again, with a smallest-to-largest index ordering). There are other ways to traverse the AVL tree: if one wishes to go from the largest index to the smallest, one need not incur the cost of reverse/2 (a predicate in O(N) time). This alternate traversal protocol gives the user iteration options over the AVL tree:

left_to_right_traversal/2: pseudonym for avl_to_list/2, min-to-max index traversal.
right_to_left_traversal/2: max-to-min index traversal.
top_down_traversal/2: the academic curiosity predicate, a breath-first traversal of the tree.

Element selection

Besides the avl_get/3 and the avl_gen/3 predicates, there are no options for the user to direct the selection of elements. Of course, one could convert the AVL tree to a list and then select any desired element using, e.g. the operations from the list_utils module or from the set_utils module, but then one pays the penalty in space (creating another data structure, a list, to hold the node) and in time (in the transformation from the AVL tree structure to the list structure). The following predicates would extend the module's functionality without the expenses outlined above:

top/2: given a non-empty AVL tree, returns the value of the top node. As an AVL tree is a recursive data structure, this predicate can be used to iterate over a tree, depth-first, nondeterministically.
first/2: given a non-empty AVL tree, returns the left-most node's value in depth that is not a terminal. Returns the value of the smallest key in the tree in O(logN) time.
last/2: given a non-empty AVL tree, returns the right-most node's value.

Functional Operations

As it is with lists, it may be necessary to perform a one-to-one transformation of the elements of an AVL tree, or, to distill the AVL tree into a single result (such as by, e.g., element summation). The former operation is know in the functional programming language community as the map function; the latter, fold. Implementing these functions as predicates would also require implementing the lambda calculus to permit any kind of generality, or utility, of the resulting predicate. Fortunately, as I have an implementation available, both in the lambda module and in the combinators module (which is an accident resulting from my development of set-theory predicates for use by Prolog in the set_utils module), providing this functionality devolves to a derivative exercise:

map_tree/3: takes a lambda-term or phi-term (an anonymous functional) and transforms the input AVL tree to a transformed output AVL tree of the same size and shape.

Dismissal

Although the functionality described above may appear interesting or useful (so much so that I've developed it all in my own AVL module), I've only used one of the above predicates: top/2. And that one only for the very simple reason that I have need of the currently-available value (the top node), and do not wish to have my implementation code become emmeshed with the AVL tree's implementation code (after all, a protocol should encapsulate implementation details). Futhermore, this survey was not comprehesive: I did not include the scorned (for good reason) avl_delete/[3,4] predicate. Whenever this predicate is mentioned in the literature, it is mentioned only in passing, as an exercise for the reader, and I have had no need for this predicate. In sum, the AVL module's protocol is written well: it exposes necessary functionality, and avoids cluttering the namespace with frivolous functionality.

:- module(avl).

Data Structure

An AVL Tree, as used by this module, is a binary type of the following structure...

:- type avl_tree ---> empty_avl_tree | non_empty_avl_tree.

where:

:- type empty_avl_tree ---> t.
:- type non_empty_avl_tree ---> t(key, value, avl_tree, avl_tree, height).

and:

:- type key == comparable.
:- type value == any.
:- type height == integer.

Another type used in this document is the following:

:- type assoc_list ---> [] | [assoc|assoc_list].

where:

:- type assoc ---> key - value.

An AVL tree may be created from a list using the sorted_list_to_avl/2 call or list_to_avl/2 call or by transforming a nascent AVL tree (obtained from the avl_empty/1) with calls to avl_put/4. Elements may be accessed or changed with the predicates that operate on elements of AVL trees (avl_get/3 or avl_replace/4) or by translating the AVL tree back to an ordered list of the elements (avl_to_list/2).

Exported Predicates

The avl module exports the following predicates:

avl_empty/1
avl_to_list/2
avl_put/[3,4]
avl_replace/[3,4]
avl_get/3
avl_has/2
sorted_list_to_avl/2
list_to_avl/2
avl_gen/3
avl_dump/1

Exported Operators

The avl module does not define any operators.

avl_empty/1

Synopsis:

avl_empty(?AVLTree)

Arguments:

AVLTree

<avl_empty_tree>

An empty AVL tree

Description:

Gives the empty AVL tree.

Backtracking:

:- mode avl_empty(out) is det.
:- mode avl_empty(in) is semidet.

Example:

This predicate is normally used to generate a seed AVL tree if items are being inserted non-deterministically (instead of via a list, for example). This example gives the runtime representation of the initial state of memory for a Whirl program:

initialize_memory(memory(0, AVL)) :-
  avl_empty(AVL0),
  avl_put(AVL0, 0, value(0), AVL).

avl_to_list/2

Synopsis: avl_to_list(+AVLTree, ?List)

Arguments:

`AVLTree`	<avl_tree>	The AVL tree to convert to a sorted list
`List`	<assoc_list>	A sorted list of key/value pairs

Description: Converts an AVL tree to a sorted list with Key-Value pairs.

Backtracking:

:- mode avl_to_list(in, out) is det.
:- mode avl_to_list(in, in) is semidet.

See also: sorted_list_to_avl/2
list_to_avl/2

avl_put/[3, 4]

Synopsis: avl_put(+In, +Key, -Out)
avl_put(+In, +Key, +Value, -Out)

Arguments:

`In`	<avl_tree>	The AVL tree to add `Value` to
`Key`	<comparable>	The node's index; must give a comparator as an argument to `compare/3`
`Value`	<term>	The value to insert into the AVL tree at index `Key`
`Out`	<avl_non_empty_tree>	The resulting AVL tree containing node `Key`/`Value`

Description: Inserts a Key/Value pair on an AVL tree. Fails if an element with the same Key already exists on the tree (use avl_replace/4 instead).

Backtracking:

:- mode avl_put(in, in, out) is semidet.
:- mode avl_put(in, in, in, out) is semidet.

Example:

This example demonstrates a lazy memory structure for a Whirl program -- cells are added when accessed, and not before:

  get_active_ring(ops, Rings, ring(ops, _, _, value(X))),
  floor(X, Y),
  NewIndex is Y + Index,
  (avl_has(AVL0, NewIndex) ->
     AVL1 = AVL0
   ;
     avl_put(AVL0, NewIndex, value(0), AVL1)).

I currently have not needed to use avl_put/3; so have no examples available for it.

avl_replace/[3,4]

Synopsis: avl_replace(+In, +Key, -Out)
avl_replace(+In, +Key, +Value, -Out)

Arguments:

`In`	<avl_tree>	The AVL tree to change the value at `Key` to `Value`
`Key`	<comparable>	The node's index; must give a comparator as an argument to `compare/3`
`Value`	<term>	The value to replace in the AVL tree at `Key`
`Out`	<avl_non_empty_tree>	The resulting AVL tree containing new `Value` at `Key`

Description:

Inserts a Key/Value pair on an AVL tree. If an element with the same Key already exists on the tree, it is replaced.

N.B.:
Despite its unfortunate semantic denotation, this predicate is a strict superior to avl_put/[3,4]: it puts in new elements and also changes elements present in the AVL tree. Notationally:
some [AVL0, AVL1, Key, Value0, Value1] (((~ avl_get (AVL0, Key, Value0)) => (avl_put(AVL0, Key, Value1, AVL1) <=> avl_replace(AVL0, Key, Value1, AVL1))) /\ (avl_get (AVL0, Key, Value0) => (avl_replace(AVL0, Key, Value1, AVL1) <=> (~ avl_put(AVL0, Key, Value1, AVL1))))) where: (~ Value0 = Value1), avl_to_list(AVL0, Set0), avl_to_list(AVL1, Set1), Set0 c Set1, (~ Set1 c Set0)
This relationship between this predicate and avl_put/[3,4] and an implementation that shows no advantage of the latter over the former in its execution shows that avl_put/[3,4] is entirely redundant functionally (but may be useful in the declarative sense).

Backtracking:

:- mode avl_replace(in, in, out) is det.
:- mode avl_replace(in, in, in, out) is det.

Examples:

The following example overwrites the current memory cell with the value in the accumulator in a Whirl program:

  get_active_ring(Active, Rings, ring(_, _, _, Value)),
  avl_replace(AVL0, Index, Value, AVL1).

The next example does the same for a user-entered integral value, given that the accumulator is null:

  get_active_ring(ops, Rings, ring(ops, _, _, value(0))),
  !,
  read_term(N, []),
  (integer(N) -> true; raise_exception(integral_expected(received(N)))),
  avl_replace(AVL0, Index, value(N), AVL1).

As is for avl_put/3, I have no examples to demonstrate avl_replace/3.

avl_get/3

Synopsis: avl_get(+AVLTree, +Key, ?Value)

Arguments:

`AVLTree`	<avl_non_empty_tree>	The AVL tree
`Key`	<comparable>	The node's index; must give a comparator as an argument to `compare/3`
`Value`	<term>	The value obtained from the AVL tree at `Key`

Description:

Retrieves Value associated to Key. Predicate fails if no element with Key is found on the tree.

N.B.:
As is the relationship between avl_replace/[3,4] and avl_put/[3,4], avl_get/3 is strictly superior to avl_has/2. The proof in this case is much simpler:

some [AVL, Key, Value]

(avl_get(AVL, Key, Value) <=> avl_has(AVL, Key))

Backtracking:

:- mode avl_get(in, in, in) is semidet.
:- mode avl_get(in, in, out) is semidet.

Example:

Of course, AVL trees shine with regard to "random" element access, for:

avl_get(MemAVL, Index, value(Datum))

has O(logN) performance characteristics, whereas:

(Index - value(Datum)) <- MemList

obtains Datum in O(N) time.

See also: compare/3
avl_has/2
avl_gen/3

avl_has/2

Synopsis: avl_has(+AVLTree, +Key)

Arguments:

`AVLTree`	<avl_non_empty_tree>	The AVL tree
`Key`	<comparable>	The node's index; must give a comparator as an argument to `compare/3`

Description: Checks whether the AVLTree contains an element with the given Key.

Backtracking:

:- mode avl_has(in, in) is semidet.

Example: (see the example for avl_put/4)

sorted_list_to_avl/2

Synopsis: sorted_list_to_avl(+List, -AVLTree)

Arguments:

`List`	<assoc_list>	A sorted list of key/value pairs with the smallest key at the head; all keys must be unique
`AVLTree`	<avl_tree>	The AVL tree resulting from the sorted list

Description: Converts a sorted list of key/value pairs without duplicate keys to an AVL tree in O(N), or linear, time.

Backtracking:

:- mode sorted_list_to_avl(in, out) is det.

See also: avl_to_list/2
list_to_avl/2

list_to_avl/2

Synopsis: list_to_avl(+List, -AVLTree)

Arguments:

`List`	<assoc_list>	An unordered list of key/value pairs (duplicate keys allowed)
`AVLTree`	<avl_tree>	The AVL tree resulting from the list; duplicates overwritten

Description: Converts a list of key/value pairs to an AVL tree in O(N logN) time. As duplicates are encountered in List, the elements are replaced in AVLTree.

Backtracking:

:- mode list_to_avl(in, out) is det.

See also: avl_to_list/2
sorted_list_to_avl/2

avl_gen/3

Synopsis: avl_gen(+AVLTree, ?Key, ?Value)

Arguments:

`AVLTree`	<avl_non_empty_tree>	The AVL tree
`Key`	<comparable>	The node's index; must give a comparator as an argument to `compare/3`
`Value`	<term>	The value obtained from the AVL tree at `Key`

Description:

Enumerates via backtracking the elements on the AVL tree.

N.B.:
This predicate is strictly superior to avl_get/3, as it reduces to that functionality when Key is ground. Q.E.D.

When Key is ground, do use avl_get/3 instead as it does not create choice points.

Backtracking:

:- mode avl_gen(in, in, in) is semidet.
:- mode avl_gen(in, in, out) is semidet.
:- mode avl_gen(in, out, in) is nondet.
:- mode avl_gen(in, out, out) is multi.

See also: compare/3
avl_get/3
avl_has/2

avl_dump/1

Synopsis: avl_dump(+AVLTree)

Argument:

AVLTree

<avl_tree>

The AVL tree to print

Description:

Prints an human friendly representation of the tree to the current stream.

N.B.:
The internal comment says it all:
TODO: use portray/1 instead.

Backtracking:

:- mode avl_dump(in) is det.

Appendix A: Implementation Choices

Different algorithm books give different approaches to implementing AVL trees. We will examine several different approaches in turn.

AVL Trees as a Functional Data Structure

[RL1999] shows the now-standard single/double rotation one-step rebalancing algorithm, but (re)computes the height of every node on demand. This leads to gains over straight list processing, but the gains are dampened by the continuous height recomputation on each rebalance. [B2001] eagerly computes the heights of the nodes, so that the height information is simply a lookup.

I first implemented the AVL tree type in Prolog from [RL1999] and then incorporated immediate height information into the nodes of the tree. This new implementation of the AVL module resulted in a 30-fold gain in execution speed. Its profile characteristics when run through a count test are as follows:

=====================================================================
Total time: 1.18 seconds
=====================================================================
Predicate                       Box Entries =    Calls+Redos     Time
=====================================================================
$garbage_collect/1                        9 =        9+0        46.0%
avl_tree:splice/4                     6,000 =    6,000+0         9.5%
is/2                                 89,861 =   89,861+0         7.9%
avl_tree:height/2                   149,686 =  149,686+0         6.3%
avl_tree:simple_compare/4           267,300 =  139,644+127,656   6.3%
=/2                                 235,462 =  235,462+0         4.8%
@</2                                127,656 =  127,656+0         3.2%
avl_tree:grapht/4                    23,969 =    6,000+17,969    3.2%
count/5                                   2 =        1+1         1.6%
@>/2                                  5,988 =    5,988+0         1.6%
avl_tree:get_branch/3                69,822 =   69,822+0         1.6%
avl_tree:replace_branches/3           5,977 =    5,977+0         1.6%
avl_tree:balance_in_direction/5      17,969 =   17,969+0         1.6%
avl_tree:add_element/3                6,001 =    6,001+0         1.6%
avl_tree:simply_replace_branch/4     45,876 =   45,876+0         1.6%
=</2                                  6,002 =    6,002+0         0.0%
</2                                  53,891 =   53,891+0         0.0%
avl_tree:rotate/3                     5,988 =    5,988+0         0.0%
avl_tree:replace_in_direction/5       6,000 =    6,000+0         0.0%
avl_tree:add_element_aux/5           69,822 =   69,822+0         0.0%
avl_tree:replace_branch_recomputing_height/423,946 =   23,946+0         0.0%
avl_tree:then_rotate/4                5,988 =    5,988+0         0.0%
avl_tree:incr_max/3                  71,844 =   35,922+35,922    0.0%
avl_tree:simple_balance/6            17,969 =   17,969+0         0.0%
avl_tree:replace_element/4           63,834 =   63,834+0         0.0%

Table 1: the author's module(avl_tree); 6000 nodes

This profile was obtained by iteratively inserting and modifying 6000 key/value pairs into a tree. At some number over 6000 elements SWI Prolog runs out of global stack.

Quintus Prolog's AVL Module

I was confident that this approach was the best implementation, but then I observed Quintus Prolog has an (undocumented) AVL module. Its approach is quite different, embedding the rebalancing algorithm directly into the predicate that does node insertion or transformation, and using the compare/3 test to obtain the comparator atom to dispatch to the appropriate rebalancing routine (instead of using the @</2 comparator against indices to determine the branch to follow, as I implemented). Also, it uses a height difference algorithm: the height information in a node of a balanced tree is encoded as +1, 0 or -1. After reading the source code, I built the same test for this module. The performance gain for the same-sized test was impressive...

=====================================================================
Total time: 0.46 seconds
=====================================================================
Predicate                       Box Entries =    Calls+Redos     Time
=====================================================================
avl:avl_change/10                     6,000 =    6,000+0        25.0%
avl:avl_store/10                     69,834 =    6,000+63,834   25.0%
is/2                                169,621 =  169,621+0        25.0%
=:=/2                                69,822 =   69,822+0        10.0%
$garbage_collect/1                       17 =       17+0         5.0%
qui_count/5                               2 =        1+1         0.0%
avl:avl_store/5                      75,823 =   75,823+0         0.0%
avl:avl_store/4                       6,001 =    6,001+0         0.0%
avl:avl_change/5                      6,000 =    6,000+0         0.0%
=</2                                  6,002 =    6,002+0         0.0%
=/2                                  81,810 =   81,810+0         0.0%
compare/3                           133,656 =  133,656+0         0.0%
>=/2                                  5,988 =    5,988+0         0.0%

Table 2: Richard O'Keefe's module(avl); 6000 nodes.
Module copyright © 1989, Quintus Computer Systems, Inc.

... but that was not the only gain: the Quintus version could work with many more nodes before it ran into global stack issues -- I was able to test up to 120,000 nodes:

=====================================================================
Total time: 38.39 seconds
=====================================================================
Predicate                       Box Entries =    Calls+Redos     Time
=====================================================================
$garbage_collect/1                      462 =      462+0        73.8%
is/2                              4,417,861 =4,417,861+0         7.0%
avl:avl_store/10                  1,908,962 =  120,000+1,788,962 6.3%
avl:avl_change/10                   120,000 =  120,000+0         5.1%
=:=/2                             1,908,946 =1,908,946+0         1.4%
compare/3                         3,697,908 =3,697,908+0         1.0%
=/2                               2,148,930 =2,148,930+0         0.9%
avl:avl_change/5                    120,000 =  120,000+0         0.5%
qui_count/5                               2 =        1+1         0.4%
avl:avl_store/5                   2,028,947 =2,028,947+0         0.3%
=</2                                120,002 =  120,002+0         0.1%
avl:avl_store/4                     120,001 =  120,001+0         0.0%
>=/2                                119,984 =  119,984+0         0.0%

Table 3: Richard O'Keefe's module(avl); 120,000 nodes.

Salvador Fandiño's AVL Module

The Quintus module shows excellent performance as compared to lazy AVL trees, but there is a cost of computing relative height difference for each node. Salvador Fandiño reverted to using absolute node heights, saving that computation times. It seems to pay off, for the 6000 node test is an improvement over the other AVL tree implementations:

=====================================================================
Total time: 0.41 seconds
=====================================================================
Predicate                       Box Entries =    Calls+Redos     Time
=====================================================================
=/2                                 267,300 =  267,300+0        23.8%
avl:avl_put/9                        57,853 =    6,000+51,853   14.3%
$garbage_collect/1                       13 =       13+0        14.3%
avl:avl_balance_right/5              35,938 =   17,969+17,969    9.5%
is/2                                 47,927 =   47,927+0         9.5%
avl:avl_replace/9                    63,834 =    6,000+57,834    4.8%
avl:avl_cmp_depth/3                  20,957 =   20,957+0         4.8%
avl:avl_put/4                        75,823 =   75,823+0         4.8%
ther_count/5                              2 =        1+1         0.0%
avl:avl_replace/4                    63,834 =   63,834+0         0.0%
=</2                                  6,002 =    6,002+0         0.0%
compare/3                           133,656 =  133,656+0         0.0%

Table 4: Salvador Fandiño's module(avl); 6000 nodes.
Module copyright © 2005, Salvador Fandiño under the GPL

But, of course, comparing execution speeds of subsecond systems is a pointless exercise. This profile does show fewer garbage collections and fewer redos on avl_store/avl_put, hinting at a more efficient algorithm in this implementation. This hint is bourne out on the larger test:

=====================================================================
Total time: 28.71 seconds
=====================================================================
Predicate                       Box Entries =    Calls+Redos     Time
=====================================================================
$garbage_collect/1                      305 =      305+0        70.6%
=/2                               7,395,800 =7,395,800+0         5.6%
avl:avl_put/9                     1,668,985 =  120,000+1,548,985 4.7%
avl:avl_replace/9                 1,788,962 =  120,000+1,668,962 4.4%
avl:avl_balance_right/5             719,922 =  359,961+359,961   1.8%
is/2                                959,907 =  959,907+0         1.8%
compare/3                         3,697,908 =3,697,908+0         1.6%
ther_count/5                              2 =        1+1         0.8%
avl:avl_cmp_depth/3                 419,945 =  419,945+0         0.3%
avl:avl_replace/4                 1,788,962 =1,788,962+0         0.3%
avl:avl_put/4                     2,028,947 =2,028,947+0         0.1%
=</2                                120,002 =  120,002+0         0.1%

Table 5: Salvador Fandiño's module(avl); 120,000 nodes.

The above test shows this module fully 10 seconds faster, or a 25% improvement over the Quintus implementation.

Direct Memory Access

So, Salvador Fandiño's AVL module performs best in the simple tests given to the various AVL implementations. How does it compare to direct memory access? By using assert/1 and retract/1, is there a significant gain in runtime execution speed?

It turns out that direct memory access has a clear advantage over the AVL implementations, for a 6000-element system, it is the fastest. Again, for sub-second systems, comparisons are rather silly, but for form's sake the results from running the count test are presented here...

=====================================================================
Total time: 0.06 seconds
=====================================================================
Predicate                       Box Entries =    Calls+Redos     Time
=====================================================================
assert/1                             12,001 =   12,001+0        50.0%
retract/1                             6,000 =    6,000+0        50.0%
$garbage_collect/1                        3 =        3+0         0.0%
is/2                                 18,001 =   18,001+0         0.0%
=</2                                  6,002 =    6,002+0         0.0%
memory/2                              6,000 =    6,000+0         0.0%
mem_count/5                               2 =        1+1         0.0%

Table 6: Memory as dynamic facts; 6000 nodes

The next level of comparison is at 120,000 elements, but, in this case, SWI Prolog balks at more than 80,000 dynamic facts. Even at this "limitation", the execution speed is unmatched with 80,000 nodes:

=====================================================================
Total time: 0.88 seconds
=====================================================================
Predicate                       Box Entries =    Calls+Redos     Time
=====================================================================
retract/1                            80,000 =   80,000+0        35.0%
mem_count/5                               2 =        1+1        30.0%
$garbage_collect/1                       45 =       45+0        15.0%
is/2                                240,001 =  240,001+0        15.0%
assert/1                            160,001 =  160,001+0         5.0%
=</2                                 80,002 =   80,002+0         0.0%
memory/2                             80,000 =   80,000+0         0.0%

Table 7: Memory as dynamic facts; 80,000 nodes

The profiles of Quintus' and Salva's AVL modules are pitiful in comparison to the above results...

=====================================================================
Total time: 13.96 seconds
=====================================================================
Predicate                       Box Entries =    Calls+Redos     Time
=====================================================================
$garbage_collect/1                      216 =      216+0        56.0%
avl:avl_store/10                  1,228,962 =   80,000+1,148,962 12.7%
is/2                              2,857,861 =2,857,861+0        11.6%
avl:avl_change/10                    80,000 =   80,000+0         7.1%
qui_count/5                               2 =        1+1         1.9%
compare/3                         2,377,908 =2,377,908+0         1.4%
=/2                               1,388,930 =1,388,930+0         1.3%
=:=/2                             1,228,946 =1,228,946+0         1.0%
avl:avl_store/4                      80,001 =   80,001+0         0.4%
avl:avl_change/5                     80,000 =   80,000+0         0.4%
>=/2                                 79,984 =   79,984+0         0.3%
avl:avl_store/5                   1,308,947 =1,308,947+0         0.1%
=</2                                 80,002 =   80,002+0         0.0%

Table 8: Richard O'Keefe's AVL module; 80,000 nodes

=====================================================================
Total time: 10.34 seconds
=====================================================================
Predicate                       Box Entries =    Calls+Redos     Time
=====================================================================
$garbage_collect/1                      133 =      133+0        44.7%
=/2                               4,755,800 =4,755,800+0        11.3%
avl:avl_replace/9                 1,148,962 =   80,000+1,068,962 8.9%
avl:avl_put/9                     1,068,983 =   80,000+988,983   6.4%
is/2                                639,911 =  639,911+0         4.5%
compare/3                         2,377,908 =2,377,908+0         4.4%
avl:avl_balance_right/5             479,926 =  239,963+239,963   2.9%
salva_count/5                             2 =        1+1         0.9%
avl:avl_replace/4                 1,148,962 =1,148,962+0         0.9%
avl:avl_cmp_depth/3                 279,947 =  279,947+0         0.5%
=</2                                 80,002 =   80,002+0         0.4%
avl:avl_put/4                     1,308,947 =1,308,947+0         0.4%

Table 9: Salvador Fandiño's AVL module; 80,000 nodes

...but these results are not actually showing these modules' protocols in the best light. Both modules have predicates that create AVL trees from sorted lists; since we know in advance the elements to be manipulated, we can call these predicates instead. But, here again, we run into a memory overflow issue with SWI Prolog: lists containing over 20,000 elements overflow the local stack.² Throttling back the number of elements to 20,000 nodes, we end up comparing the AVL modules' replacement predicates against SWI Prolog's retract/1 and assert/1, with just one more change: as we do not test the sorted-lists-to-AVL-trees predicates, we should not test the initial assertations; so we now make the memory predicate an external file:

:- dynamic memory/2.

memory(0, value(0)).
memory(1, value(1)).
memory(2, value(2)).
...
memory(20000, value(20000)).

With this new framework, the count test profile is unsurprisingly still sub-second...

=====================================================================
Total time: 0.15 seconds
=====================================================================
Predicate                       Box Entries =    Calls+Redos     Time
=====================================================================
retract/1                            20,000 =   20,000+0        42.9%
$garbage_collect/1                        1 =        1+0        28.6%
is/2                                 20,000 =   20,000+0        21.4%
mem_ord/3                                 1 =        1+0         7.1%
assert/1                             20,000 =   20,000+0         0.0%

Table 10: Direct memory access; 20,000 nodes

... and the test for the Quintus AVL module still trails behind:

=====================================================================
Total time: 0.69 seconds
=====================================================================
Predicate                       Box Entries =    Calls+Redos     Time
=====================================================================
$garbage_collect/1                        4 =        4+0        50.0%
avl:avl_change/10                    20,000 =   20,000+0        36.0%
compare/3                           267,248 =  267,248+0         6.0%
qui_ord/3                                 1 =        1+0         6.0%
avl:avl_change/5                     20,000 =   20,000+0         2.0%
is/2                                 20,000 =   20,000+0         0.0%

Table 11: Quintus' AVL module; 20,000 nodes

... as well as the test for Salva's AVL module:

=====================================================================
Total time: 0.77 seconds
=====================================================================
Predicate                       Box Entries =    Calls+Redos     Time
=====================================================================
$garbage_collect/1                        3 =        3+0        31.0%
avl:avl_replace/9                   267,248 =   20,000+247,248  25.4%
=/2                                 494,496 =  494,496+0        14.1%
compare/3                           267,248 =  267,248+0         5.6%
avl:avl_replace/4                   267,248 =  267,248+0         1.4%
is/2                                 20,000 =   20,000+0         1.4%
salva_ord/3                               1 =        1+0         1.4%

Table 12: Salva's AVL Module; 20,000 nodes

Note that in these sets of tests the Quintus AVL module performs better than Salva's, owing to the better implementation of the avl_change/5 predicate. Perhaps Salva or an enterprising reader has a way to improve the avl_replace/4 implementation?

Reflections

We've demonstrated that of the AVL implementations Salvador Fandiño's consistently performs better. We have also observed that for any sized data sets direct memory access using assert/1 and retract/1 do even better. Do I then recommend direct memory access over storing elements in an AVL tree? For almost all cases,³ the answer is no. In my experience, using the former reduces programs to intractable messes.⁴ Furthermore, the performance gain, for most cases, is insignificant.⁵ Using AVL trees give a significant gain over list processing, and incurs none of the costs of using extralogical features (such as assert/1 and retract/1) required by direct memory access.

Appendix B: The "Missing" Predicates

Implementations of Traversal Predicates

`left_to_right_traversal/2`

This predicate converts this AVL tree to a sorted list with the "smallest" element (i.e. the leftmost node in the tree) at the front of the list. Note that this implementation uses DCGs; don't you agree that it's prettier this way?

left_to_right_traversal(AVL, List) :-
  left_to_right_traversal(AVL, List, []).

left_to_right_traversal(t(Key, Value, LeftTree, RightTree, _)) -->
  { left_to_right_traversal(LeftTree, Left, []),
    left_to_right_traversal(RightTree, Right, []) },
  Left, [Key-Value|Right].
left_to_right_traversal(t) --> [].

`right_to_left_traversal/2`

This predicate starts with the maximum element and descends to the minimum one.

right_to_left_traversal(AVL, List) :-
  right_to_left_traversal(AVL, List, []).

right_to_left_traversal(t(Key, Value, L, R, _)) -->
  { right_to_left_traversal(R, Right, []),
    right_to_left_traversal(L, Left, []) },
  Right, [Key-Value|Left].
right_to_left_traversal(t) --> [].

`top_down_traversal/2`

The implementation strategy here is to give the top node, then recurse over the list of top nodes of this node's children, etc. There are two approaches to go about the induction: append/3 the new nodes to the list of nodes to traverse or use difference lists and concatenation. The former approach is straightforward, but incurs a recursive linear cost (which can grow exponentially); the latter, more complex, but has only a straight linear cost. We present both implementations (choosing the difference list implementation in the interface predicate).

top_down_traversal(AVL, List) :-
  top_down_traversal_concatenating([AVL|T] - T, List, []).

Top-down traversal using concatenation

This implementation shows the shape of the tree by putting the top-most element first, and continuing in a breath-first, left-to-right traversal. This implementation uses concatenation (concat/3 is a predicate with constant-time access) to update the seed list of branches. Note that the atom '_|_' represents the functional term ⊥ [pronounced: "bottom"] which is the functional programming community's equivalent to fail/0: being able to unify with ⊥ is a contradiction, showing that unified variable is actually free; this test protects the system from falling into infinite regression when unifying the last-element pointer with a term including that same variable.

top_down_traversal_concatenating(List - List) -->
  { \+ \+ List = '_|_' }.
top_down_traversal_concatenating([t(Key, Value, t, t, _)|T] - Q) -->
  [Key-Value],
  top_down_traversal_concatenating(T - Q).
top_down_traversal_concatenating([t(Key, Value, t, Right, _)|T] - Q) -->
  { concat(T - Q, [Right|X] - X, NewRight) },
  [Key-Value],
  top_down_traversal_concatenating(NewRight).
top_down_traversal_concatenating([t(Key, Value, Left, t, _)|T] - Q) -->
  { concat(T - Q, [Left|X] - X, NewLeft) },
  [Key-Value],
  top_down_traversal_concatenating(NewLeft).
top_down_traversal_concatenating([t(Key, Value, Left, Right, _)|T] - Q) -->
  { concat(T - Q, [Left, Right|X] - X, Down1) },
  [Key-Value],
  top_down_traversal_concatenating(Down1).

concat(List - List) -->  { \+ \+ List = '_|_' }.
concat(A1 - Z1, Z1 - Z2, A1 - Z2).

Top-down traversal using `append/3`

This approach is a simpler implementation but updates the branches using append/3 which incurs a linear-time cost.

top_down_traversal_appending([t(Key, Value, t, t, _)|T]) -->
  [Key-Value],
  top_down_traversal_appending(T).
top_down_traversal_appending([t(Key, Value, t, Right, _)|T]) -->
  { append(T, [Right], NewRight) },
  [Key-Value],
  top_down_traversal_appending(NewRight).
top_down_traversal_appending([t(Key, Value, Left, t, _)|T]) -->
  { append(T, [Left], NewLeft) },
  [Key-Value],
  top_down_traversal_appending(NewLeft).
top_down_traversal_appending([t(Key, Value, Left, Right, _)|T]) -->
  { append(T, [Left, Right], Down1) },
  [Key-Value],
  top_down_traversal_appending(Down1).
top_down_traversal_appending([]) --> [].

Element Selection Predicates

`top/2`

This fact simply gives the Key-Value pair of the current node.

top(t(Key, Value, _, _, _), Key - Value).

`first/2`

We "left-cdr" down the tree, retaining the most-recently-visited node until we reach a terminal, returning the last node.

first(t(Key, Value, Left, _, _), Ans) :-
  first(Left, Key-Value, Ans).

first(t(Key, Value, Left, _, _), _, Ans) :- first(Left, Key-Value, Ans).
first(t) --> [].

`last/2`

The same work as first/2, except we "right-cdr" down the tree.

last(t(Key, Value, _, Right, _), Ans) :-
  last(Right, Key-Value, Ans).

last(t(Key, Value, _, Right, _), _, Ans) :- last(Right, Key-Value, Ans).
last(t) --> [].

Functional Tree Transformations

`map_tree/3`

This predicate modifies the Values of the tree based on the phi-transformation. For example, to copy the tree exactly, one would execute the following query:

map_tree(i, In, Out)

Either of the following queries increments each of the tree's values:

map_tree(lambda(X, X + 1), In, Out)
or
map_tree("`+1", In, Out)

The implementation of map_tree/3 depends on the combinators module and simply traverses the tree and copies the transformed elements:

map_tree(Fn) -->
  { phi_equivalent(Fn, Phi) },
  map_aux(Phi).

map_aux(Phi, t(Key, Value, Left, Right, H),
        t(Key, NewValue, NewLeft, NewRight, H)) :-
  compose(Phi, Value, NewValue),
  map_aux(Phi, Left, NewLeft),
  map_aux(Phi, Right, NewRight).
map_aux(_Phi, t, t).

Endnotes

⁰	This document, Salvador Fandiño's AVL module and supporting (testing) files are archived and available at prolog-avl-20051225.tgz.
¹	Any book on algorithms and data structures provides introductory material on AVL trees. For example, [RL1999], § 5.9 or [B2001], § 10.2.
²	I am aware that SWI Prolog allows the user to tune the stack sizes on invocation. I also believe that it is now appropriate to revisit the default values as modern computers come ready to handle much heavier loads. Or, perhaps it is the case that I am always tackling larger-than-average problems. At any rate, I find it a chore, as the user, always being required to inform the system what sizes its stacks should be.
³	The one exception that comes to mind for a valid use of `assert/1` is for a meta-compiler I wrote to implement aspects for Prolog. Actually, for that system, one could avoid using `assert/1` entirely by implementing the system in an explicit two-pass compilation process where the first pass would create an intermediate file that would direct the second stage of compilation. For this case, `assert/1` was simpler, and its use was obvious. I haven't found a good use for `retract/1` to date.
⁴	Intractable mess (noun): the strong sense of befuddlement that overcomes one when staring at a 400,000-line expert system where every third line is either an `assert/1` statement or a query against a dynamically asserted fact. This is a real system, by the way. Oh, yes, after a the system rendered a decision, a block of `retractall/1` calls purged the database of all the assertations made to obtain the result. It is a good thing Prolog has automatic garbage collection; that way one doesn't need to write more than 100,000 lines of allocation and deallocation code unnecessarily. It is a bad thing that word of Prolog's automatic garbage collection has not yet reached all of its users.
⁵	Firstly, in systems that I work with, it is not uncommon to exceed the 80,000 transactions that would cause a stack overflow if using `assert/1` and `retract/1`. Secondly, even given the high volume of transactions in such a system, switching to dynamic facts yields only subsecond gains for the entire system. The Whirl interpreter used as the extended example in the Aspects manual only saw 0.1 second difference between memory as dynamic facts verses memory as an AVL tree. Thought for the day: `assert/1` and `retract/1` -- avoid them.

Works Consulted

[B2001]	Prolog Programming for Artificial Intelligence, 3rd ed., Ivan Bratko, Addison-Wesley, Reading, Massachusetts, 2001.
[Merc]	Mercury Language Reference Manual, Version 0.12.1, Fergus Henderson, Thomas Conway, Zoltan Somogyi, David Jeffery, Peter Schachte, Simon Taylor, Chris Speirs, Tyson Dowd, Ralph Becket, University of Melbourne (Australia), http://www.cs.mu.oz.au/research/mercury. No date of publication given.
[RL1999]	Algorithms: A Functional Programming Approach, 2nd ed., Fethi Rabhi, Guy Lapalme, Addison-Wesley, Reading, Massachusetts, 1999.

Copyrights

This manual copyright © 2005, Cotillion Group, Inc. under the GNU Free Document License, version 1.2
avl.pl copyright © 2005, Salvador Fandiño under the GPL
Quintus Prolog copyright © 2004, SICS AB. All rights reserved.
Quintus module(avl) copyright © 1989, Quintus Computer Systems, Inc. All rights reserved.
SWI Prolog copyright © 1987 under the LGPL
Documents referenced (linked) by this document maintain their own copyrights.

Author:	Douglas M. Auclair (dauclair at hotmail dot com)
Created:	December 17, 2005
Last Modified:	December 25, 2005; MERRY CHRISTMAS!

Wednesday, December 14, 2005

Aspect-Oriented Programming in Prolog

Abstract

The concept of aspect-oriented programming (AOP) has recently
gathered steam in the general software engineering community as a
novel approach to problem-solving. Although aspects are useful
for functional and imperative object-oriented programming, do
they add value to rule-based programming systems? We argue that
developing systems in rule-based systems expose similar issues
that aspects address well. This manual
reviews AOP's
history and utility, describes the
practice of programming with aspects in Prolog (using the
protocol of the aspects module),
demonstrates the
system with an extended example, and finally
surveys
production AOP usage in
general.⁰

Introduction

Recently there has been a groundswell of interest in a new style
of programming called Aspect-Oriented Programming
(AOP).¹
Contrary to the prevalent popular "wisdom" that claims that
encapsulation is an essential practice for software engineering,
AOP claims that an "open implementation" is a
key developing useable software.² AOP originated from a subset of
Common Lisp's metaobject
protocol³ that
allows methods of objects to be intercepted so that auxiliary
code could be executed before, after or around the method's
execution. These auxiliary methods allow "cross-cutting"
concerns, such as debugging, to be collected into one area,
separate from the method body. This allows the programmer to
focus on implementing the functionality without the need to
consider housekeeping issues.

As mentioned above, aspects have their origin from Common Lisp.
Common Lisp's object system, unlike those of traditional
object-oriented programming languages, is
contravariant⁴ -- methods are not tied to a particular class, but
belong to their own general class of methods (this class is called
'generic-function'), this change allows aspects to interact with
methods freely, adding advice or altering method behavior as
needed. Java, and other programming languages with traditional,
covariant, object-oriented systems do not
have this built-in capability;⁷ so the AOP movement was launched to address this
need.

Just as AOP has its roots in Common Lisp's contravariant
metaobject protocol, so, too, contravariance has its own origins.
Contravariance was developed to provide object-oriented
functional programming languages the ability to express
alternates and induction much in the style of predicate logic,
and, to a large degree, it does this task well. There is,
however, a much simpler, more powerful, and more direct way to
express these features: predicates. Logic, or rule-based,
programming languages solve problems using predicate logic
directly. This begs the question: since aspects are a subset of
contravariance which, itself, is a subset of predicate logic, is
there any need for aspects in rule-based languages?

We argue in the affirmative -- rule-based system face the same
design challenges as other systems: these systems have concerns
that cut across multiple areas of functionality, and
implementation decisions (even ones made implicitly or by
default) correct for one resolution among competing goals are
not always the best decision when new goals are introduced or old
ones change. By introducing aspects into rule-based languages
(specifically Prolog for this manual), implementations become
even more declarative and flexible in nature. We demonstrate
this simplification in the example
following the protocol
specification.

Practice

The specific implementation of aspects for Prolog flows in the
following manner: code is developed as usual. When the
programmer identifies a cross-cutting concern, that programmer
develops the aspect using the specification
below. An aspect
should be declared as immediate (i.e.: aspect/3's
goal should be called immediately with :-/1, not
implied in a clause's proof) and in the user
module: usually it is best to have all aspects defined in one
file for a library or executable. If the aspects themselves
depend on user-defined syntax or require some initialization at
compilation-time, then the programmer should use the immediate
aspect_declaration/1.

Once the program and aspects are defined, the programmer weaves
the aspects into the program by using the aopc
compiler (see program/1 for the
artifacts created).

The programmer may then opt to run the program without aspects by
compiling the modules of the program with qpc
without aspect advice and linking with qld without
the aspect assertions. To
compile in the aspects, the program modules are compiled with
qpc with the -i <aspect-module>_term_expansion.pl
initialization flag, and linked with the
aspects and the
aspect assertions with
qld.

Protocol

Now that we have a description of utility of aspects and the
practice of use for aspects in Prolog, we now specify their
protocol.

:-
aspect_declaration(+Declaration)
declaration

where:

Declaration <callable> A declaration to insert into the aspect-artifact file

Description:

This predicate provides any
additional information needed to be able to compile the aspect
artifacts. So, for example, any op/3 declarations
used in declaring the aspects should also be captured in an
aspect_declaration/1.

:- aspect(+Type, +ForGoal, +Aspect) declaration

where:

Type <atom> one-of([before, after, around])

ForGoal <term> the goal to be advised by this aspect

Aspect <callable> The goal to execute at the point of
Type in
ForGoal

Description:

This declares aspects for the
program. When the time at Type
for the goal ForGoal is
reached in the program, then
Aspect is called. For
before and after aspects,
ForGoal is automatically
called at the appropriate time (for a before aspect, it is
called after Aspect exits;
for after aspects,
Aspect is called after
ForGoal exits).

Around aspects replace the call to
ForGoal entirely, but
this is not always desireable, so around aspects
have two special forms that may be used in the body of
Aspect:

prove_goal/0

that calls ForGoal
with its original arguments; and

prove_goal/N

that calls ForGoal
with the arguments specified. prove_goal/N must agree
with the arity of
ForGoal.

program(+Module)

where:

Module <atom> The name used to create the three aspect-artifact
files

Description:

Takes the aspects and aspect-declarations loaded into memory
and weaves them into the predicates that are their subjects
by producting three artifact files:

<aspect-module>_term_expansion.pl: an
initialization file for qpc that rewrites all modules
that are affected by aspects; and

<aspect-module>_asserts.qof
: a series of
assertions that redirect all direct predicate calls to their
corresponding aspect-managed predicates; and

<aspect-module>.pl:
the translation of the
aspect declarations to inlined- (normal-) predicates.

For the program to behave normally (that is, without aspect
oversight) the program modules are compiled and linked without
these artifacts. To activate the aspects the program's
modules are now compiled with the
-i <aspect-module>_term_expansion.pl
initialization flag and the executable is linked with the
<aspect-module>_asserts.qof and the
<aspect-module>.pl files. For aspect
oversight on an interactive system, one follows a similar set
of steps:

load the aspect-artifact files into the Prolog listener
in the following order:

<aspect-module>_asserts.qof

<aspect-module>.pl

<aspect-module>_term_expansion.pl

load the modules of the program into the listener

Example

A Whirl Interpreter

Here we develop a working example from the original concept to
one that uses simple aspects to report a trace of program flow
(useful for debugging) to one that has integrated production
aspects that manage cross-cutting concerns. We present a
Whirl
interpreter. The language
has a terse syntax (with a somewhat richer semantics) that allows
one to complete an interpreter in an afternoon.

The Problem: Debugging

Unfortunately, as it takes many repetitions of instructions and
commands to accomplish anything useful, the task of debugging
using traditional means is rather tedious: alleviating the
burden of the programmer in the debugging is the first task we
put to the aspect methodology. The first iteration of the
interpreter seemed to do everything just fine -- it ran the
simple Whirl programs without any problems. When it came to
the first non-trivial program ("Hello,
World!"),⁸
it failed unexpectedly (this is for the
source code version before
the release of the Whirl document
mentioned above). Doing a top-level profile analysis of
the program run showed it failed around instruction 379
("Hello, World!" has over 1300 instructions):

?- read_file_to_codes('hello.wr', Hello, []), profile(whirl(Hello)).
=====================================================================
Total time: 0.05 seconds
=====================================================================
Predicate                       Box Entries =    Calls+Redos     Time
=====================================================================
list_utils:takeout/3                  1,506 =      947+559      66.7%
command/3                               243 =      106+137      33.3%
switcheroo/2                            105 =      105+0         0.0%
interpret_next_instruction/2              1 =        1+0         0.0%
history/4                               238 =      238+0         0.0%
initialize_ops_ring/1                     1 =        1+0         0.0%
whirl/1                                   1 =        1+0         0.0%
initialize_program/4                     54 =        1+53        0.0%
instruction/3                           379 =      379+0         0.0%
rotate/2                                141 =      141+0         0.0%
switch_wheel/2                          105 =      105+0         0.0%
initialize_memory/1                       1 =        1+0         0.0%
execute/3                               379 =      379+0         0.0%
history_is/4                            378 =      378+0         0.0%
direction/2                             171 =      141+30        0.0%
initialize_math_ring/1                    1 =        1+0         0.0%
reverse_spin/2                          238 =      238+0         0.0%
adjust/2                                271 =      141+130       0.0%
arithmetic/4                             29 =       29+0         0.0%
execute_command/2                       370 =      238+132       0.0%
get_command/3                           106 =      106+0         0.0%
spin_reverse/2                          358 =      238+120       0.0%
interpret/2                             379 =      379+0         0.0%
comparison/4                              2 =        1+1         0.0%
syntax:<-/2                           1,817 =    1,647+170       0.0%

Because the program fails somewhat in depth, debugging by traditional
(Prolog) methods (setting a spypoint and tracing execution), become
onerous rather quickly:

read_file_to_codes('hello.wr', Hello, []), length(Hello, X), spy(instruction), whirl(Hello).
% Spy point on instruction/3
  Call: (10) instruction(_L387, program(ops, history(quescient, last_instruction(1)), [ring(ops, [0-noop, 1-exit, 2-one, 3-zero, ... -...|...], clockwise spin, value(0)), ring(math, [0-noop, 1-load, 2-store, ... -...|...], clockwise spin, value(0))], memory([cell(0, value(0))]), program([0-1, 1-1, 2-0, 3-0, 4-1, ... -...|...])), _L388) ? leap
  Exit: (10) instruction(1, program(ops, history(quescient, last_instruction(1)), [ring(ops, [0-noop, 1-exit, 2-one, 3-zero, ... -...|...], clockwise spin, value(0)), ring(math, [0-noop, 1-load, 2-store, ... -...|...], clockwise spin, value(0))], memory([cell(0, value(0))]), program([0-1, 1-1, 2-0, 3-0, 4-1, ... -...|...])), program(ops, history(quescient, last_instruction(1)), [ring(ops, [0-noop, 1-exit, 2-one, 3-zero, ... -...|...], clockwise spin, value(0)), ring(math, [0-noop, 1-load, 2-store, ... -...|...], clockwise spin, value(0))], memory([cell(0, value(0))]), program([0-1, 1-1, 2-0, 3-0, 4-1, ... -...|...]))) ? leap
  Call: (12) instruction(_L564, program(ops, history(quescient, last_instruction(1)), [ring(ops, [1-exit, 0-noop, 2-one, 3-zero, ... -...|...], clockwise spin, value(0)), ring(math, [0-noop, 1-load, 2-store, ... -...|...], clockwise spin, value(0))], memory([cell(0, value(0))]), program([1-1, 0-1, 2-0, 3-0, 4-1, ... -...|...])), _L565) ? leap
  Exit: (12) instruction(1, program(ops, history(quescient, last_instruction(1)), [ring(ops, [1-exit, 0-noop, 2-one, 3-zero, ... -...|...], clockwise spin, value(0)), ring(math, [0-noop, 1-load, 2-store, ... -...|...], clockwise spin, value(0))], memory([cell(0, value(0))]), program([1-1, 0-1, 2-0, 3-0, 4-1, ... -...|...])), program(ops, history(quescient, last_instruction(1)), [ring(ops, [1-exit, 0-noop, 2-one, 3-zero, ... -...|...], clockwise spin, value(0)), ring(math, [0-noop, 1-load, 2-store, ... -...|...], clockwise spin, value(0))], memory([cell(0, value(0))]), program([1-1, 0-1, 2-0, 3-0, 4-1, ... -...|...]))) ? leap
  Call: (14) instruction(_L762, program(ops, history(quescient, last_instruction(1)), [ring(ops, [2-one, 1-exit, 0-noop, 3-zero, ... -...|...], clockwise spin, value(0)), ring(math, [0-noop, 1-load, 2-store, ... -...|...], clockwise spin, value(0))], memory([cell(0, value(0))]), program([2-0, 1-1, 0-1, 3-0, 4-1, ... -...|...])), _L763) ? leap
  Exit: (14) instruction(0, program(ops, history(quescient, last_instruction(1)), [ring(ops, [2-one, 1-exit, 0-noop, 3-zero, ... -...|...], clockwise spin, value(0)), ring(math, [0-noop, 1-load, 2-store, ... -...|...], clockwise spin, value(0))], memory([cell(0, value(0))]), program([2-0, 1-1, 0-1, 3-0, 4-1, ... -...|...])), program(ops, history(quescient, last_instruction(1)), [ring(ops, [2-one, 1-exit, 0-noop, 3-zero, ... -...|...], clockwise spin, value(0)), ring(math, [0-noop, 1-load, 2-store, ... -...|...], clockwise spin, value(0))], memory([cell(0, value(0))]), program([2-0, 1-1, 0-1, 3-0, 4-1, ... -...|...]))) ? leap
  Call: (16) instruction(_L893, program(ops, history(quescient, last_instruction(0)), [ring(ops, [2-one, 1-exit, 0-noop, 3-zero, ... -...|...], counterclockwise spin, value(0)), ring(math, [0-noop, 1-load, 2-store, ... -...|...], clockwise spin, value(0))], memory([cell(0, value(0))]), program([3-0, 2-0, 1-1, 0-1, 4-1, ... -...|...])), _L894) ?

The above trace represents only the first three instructions, and does not
show any of the predicates called to support the execution of those
instructions (which is unreasonable, given that the programmer is trying
to ascertain the cause of failure of instruction interpretation). What we
need is a 'meta-debugger', a system will bring us immediately to the
problem and shows us the root cause. Writing meta-debuggers is easy: the
traditional approach is to hack and slash the code with debugging
statements and assertations. The problem with doing that is that the
meta-debugger is tightly integrated into the production code, causing the
inevitably eventuality -- the debugging code become inseparable from the
production code. This is the host of a whole new set of problems, such
as the debugging code introducing errors into the system (by manipulating
production state, for example) and also the debugging code obscures the
meaning of the production code as all the debugging statements surrounding
the production statements tend to drown the latter in the sea of the
former. Put simply: traditional meta-debugging techniques cause more
problems than they solve.

Debugging: the Solution

Round 1: Full and Fast Trace

Traditional meta-debugger are not the solution, but this is where an
aspect-oriented meta-debugger provide a viable alternative. Our first
aspect-advised system traces program flow over what we suspect to be
the problem area, and when it observes the predicate fail, report that
failure with the current state:

:- use_module(library(aspects), [aspect/3]).

:- aspect(around, interpret(In, _Out),
         (format('** going into interpret/2 with ~w~n', [In]),
          prove_goal,
          !,
          format('** okay~n', []))).
:- aspect(around, interpret(In, _Out),
   format('!! FAILED ON CMD: ~w~n', [In])).

Results (Round 1)

When weaving the aspects into the program, it does the job; unfortunately,
it does its job too well. A log capturing all the debugging statements
generated by the aspect-advised system weighs in at over 2 megabytes of
text, and the fail statement itself 10 kilobytes and occurs at the midpoint
of the debugging log -- impressive? yes; "informative [for those with
the fortitude to wade through the statement]"? maybe; useful? no. For,
although we eliminate the drudgery of manually stepping through the program
one instruction at a time, and we avoid introducing debugging artifacts
into the production code, we do introduce a large product as the output
that must be scanned to locate the problem, and once located the output
statement, in particular, is cumbersome, providing too much information
to be at all useful. We must refine our use of aspects to help us locate
the problem.

Round 2: State Representation

The major generator of text in the debugging output statements is the
program/5 term itself; most of which is historical data we have little
interest in beyond the immediate history. Given this, a major space
saver will be to display only the parts of the program in which we are
interested. The program/5 term has the following structure:

program(Active, History, Rings, Memory, Program)

see the formal language definition
for the meanings of each of program/5's arguments

Of program/5's arguments, Active and History are small enough, and
interesting enough, to retain, and Memory, although possibly infinite in
size, in most cases is compact, so may also be passed along unmolested.
The last two arguments, Rings and Program, do contain data that are not of
interest in the interpretation of the current instruction. For the rings,
they contain 12 immutable commands (the only interesting one is at the
head). For the program, the current instruction is at the head, and all
others are unordered. If we eliminate the tails of these arguments' lists,
we should reduce the size of the output by 99%. We do this by asserting a
portray/1 rule into the aspect definition and changing the format commands
to use the print ('~p') directive instead of the write ('~w') one:

:- use_module(library(aspects), [aspect/3, aspect_declaration/1]).

:- aspect_declaration(asserta((user:portray(ring(Type, [C0|_], Spin, Value)) :-
                               format('ring(~w, [~w|_], ~w, ~w)',
                                      [Type, C0, Spin, Value])))).
:- aspect_declaration(asserta((user:portray(program(Active, History, [R1, R2],
                                             Memory, program([I0|_]))) :-
                               format('program(~w, ~w, [~p, ~p], ~w, ~w)',
                                      [Active, History, R1, R2, Memory,
                                       program(I0)])))).

:- aspect(around, interpret(In, _Out),
         (format('** going into interpret/2 with ~p~n', [In]),
          prove_goal,
    !,
          format('** okay~n', []))).
:- aspect(around, interpret(In, _Out),
   format('!! FAILED ON CMD: ~p~n', [In])).

Results (Round 2)

This works much better -- the output log file is
now only 60 kilobytes (not 2M), and the failed message is now only 310
characters (not over 10K, as was the last one). The
smaller failed message is much easier to
review. It indicates that the error was at instruction 209 which was a
math command: the 'neg' command. This is all very well and good,
but we shouldn't be required to dig through a log file to find errors.
A better scenario would be to be presented with a description of the state
when an error occurs.

Round 3: Only Debugging for Errors

A better approach is to build an aspect system that only activates when
an erroneous condition occurs. To do that, we need only modify the
implementation of the aspect definitions (dropping the output on
nominal behavior), so the first aspect/3 clause becomes:

:- aspect(around, interpret(_In, _Out), (prove_goal, !)).

Results (Round 3)

With this modification, the program debugging output occurs where we think
the error is...

!! FAILED ON CMD: program(math, history(quescient, last_instruction(0)), [ring(math, [11-neg|_], clockwise spin, value(3)), ring(ops, [7-dadd|_], counterclockwise spin, value(1))], memory([cell(5, 0), cell(4, value(3)), cell(3, 0), cell(2, value(220)), cell(1, value(44)), cell(0, value(10))]), program(209-0))

...which is the same message we received before, but now we get it
immediately, as opposed to it being buried 200 lines into the debugging
information. It tells us that the instruction that failed was a "0" that
translates into the math command "neg". Examining the math neg command
doesn't seem to indicate any particular reason that it should fail:

command(MathOp, program(math, Hist, Rings, M, P),
       program(math, Hist, [ring(math, I, S, value(Value))|Ops], M, P)) :-
 MathOp <- [add, mult, div, neg],
 takeout(ring(math, I, S, value(A)), Rings, Ops),
 M = memory([cell(_, value(B))|_]),
 arithmetic(MathOp, A, B, Value).

As the neg command does the same things that the (e.g.) mult command does,
so it seems something else is the issue.

Round 4: Rooting out the real error

Since the neg command seems correct, we look elsewhere in the program
state, and we see irregularities in the memory layout: the memory cell
values should be typed with the value/1 term, but cells 5 and
3 are not typed this way:

memory([cell(5, 0), cell(4, value(3)), cell(3, 0), cell(2, value(220)), cell(1, value(44)), cell(0, value(10))])

In this light, it is not the neg command's failure that concerns us so
much as that a prior command's action that improperly set these cell
values. This is no a failure condition when this happens, but an aspect
can detect this untyped assignment. We do this by replacing the above
aspects with the following one:

:- aspect(after, command(Type, ProgramIn, ProgramOut),
         (ProgramIn = program(_, _, _, memory([cell(_, value(_))|_]), _),
          ProgramOut = program(_, _, _, memory([cell(Idx, Num)|_]), _),
          (number(Num) =>
    format('!! Command ~a assigned an untyped number to memory cell ~d~n',
                   [Type, Idx])))).

The special operator used above declares a ("forward")
implication.⁹ This
relation is found in the syntax
module, and so must be imported and then its syntax declared
in order to be used by the aspect:

:- use_module(library(syntax)).
:- aspect_declaration(use_module(library(syntax))).

:- op(920,  xfy,  =>).
:- aspect_declaration(op(920,  xfy,  =>)).

Results (Round 4)

With this new aspect framework, when we run the aspect-advised system,
we observe the following output:

!! Command dadd assigned an untyped number to memory cell 1
!! Command dadd assigned an untyped number to memory cell 1

Examining the implementation for the dadd command, we see the
following:

command(dadd, program(ops, Hist, Rings, memory(Memory), P),
       program(ops, Hist, Rings, memory([cell(NewIdx, NewV)|Mem]), P)) :-
%    Here we jump to a new memory address, given that that cell exists.
%    If the cell doesn't exist, create it and add it to the memory store;
%    thereby "lazily" growing the memory store as needed.
%
%    Remember, it's not a memory leak, it's lazy growth of the memory store
%    (not in any respect resembling a memory leak ... *cough*).
 Memory = [cell(Idx, _)|_],
 ring(ops, _, _, value(X)) <- Rings,
 floor(X, Y),
 NewIdx is Y + Idx,
 (takeout(cell(NewIdx, SomeNewV), Memory, SomeMem) ->
    NewV = SomeNewV,
    Mem = SomeMem
  ;
    NewV = 0,
    Mem = Memory).

The highlighted goal is incorrect: it should wrap the number in the
value/1 term. We make that correction...

NewV = value(0),

... and rerun the system:

?- read_file_to_codes('hello.wr', Hello, []), whirl(Hello).
Hello, World!
eeh, Whked!
;QXX[$J[^XQT$JTWQ@

o
?-

So, we do print out "Hello, World!" but then it seems the program doesn't
know how to stop.

Round 5: Hello, World!

Looking at how the interpreter works, we see that the
command is executed, and then interpret_next_instruction/2 is called.
We can therefore, after several
iterations,¹⁰ develop
an aspect to follow the exit command, to see if there is an issue with
that command:

:- aspect(after, command(Cmd, _, _),
         (Cmd = exit =>
              (format('Calling interpret_next_instruction/2 with exit:~n', []),
               spy(user:interpret_next_instruction)))).

Results (Round 5)

When we run the whirl interpreter with the above aspect advice, we see the
following debug trace:

?- read_file_to_codes('hello.wr', Hello, []), whirl(Hello).
Hello, World!
Calling interpret_next_instruction/2 with exit:
% Spy point on interpret_next_instruction/2
  Call: (2,708) interpret_next_instruction(program(math, history(executed, last_instruction(0)), _G2841984, _G2841985, program([])), _L324) ? skip
  Fail: (2,708) interpret_next_instruction(program(math, history(executed, last_instruction(0)), _G2841984, _G2841985, program([])), _L324) ? retry
[retry]
  Call: (2,708) interpret_next_instruction(program(math, history(executed, last_instruction(0)), _G2841984, _G2841985, program([])), _L324) ? creep
  Call: (2,709) program(math, history(executed, last_instruction(0)), _G2841984, _G2841985, program([]))=program(_G2842026, _G2842027, _G2842028, _G2842029, program([_G2842037-_G2842038|_G2842035])) ? skip
  Fail: (2,709) program(math, history(executed, last_instruction(0)), _G2841984, _G2841985, program([]))=program(_G2842026, _G2842027, _G2842028, _G2842029, program([_G2842037-_G2842038|_G2842035])) ? creep
  Fail: (2,708) interpret_next_instruction(program(math, history(executed, last_instruction(0)), _G2841984, _G2841985, program([])), _L324) ?

Looking at the implementation of implement_next_instruction/2, the problem
becomes obvious: I have a cut in the main (only) clause, but no "otherwise"
clause to catch failed tests in that predicate.

interpret_next_instruction(P0, P) :-
%    Loops until we run out of tokens to interpret
 P0 = program(Active, Hist, Rings, Mem, program([Idx - Op|Codes])),
 NewIndex is Idx + 1,
 (takeout(NewIndex - Code, [Idx - Op|Codes], Program) ->
    interpret(program(Active, Hist, Rings, Mem,
                      program([NewIndex - Code|Program])), P)
  ;
    P = P0).

My comment for the clause indicated my intent (but not what actually
happens), so it appears to be a slip in rigor. A simple catch-all
secondary clause fixes this issue:

interpret_next_instruction --> [].

Running the new interpreter shows we've found the errors preventing
"Hello, World!" from working:

?- read_file_to_codes('hello.wr', Hello, []), whirl(Hello).
Hello, World!

Hello = [49, 49, 48, 48, 49, 49, 49, 48, 48|...]

Yes
?-

Debugging: Wrap Up

So, we have a working "Hello, World!" program; all is right with
the world again (after all, a computer without a "Hello, World!"
program would certainly precipitate the final trumpet call). Reviewing
the development of the debugging aspects, it is easy to see their
usefulness, but there are tradeoffs to consider.

In reality, I would begin a comprehensive review of the source code
after round 3. From that review
and from the program state I determined that I was pushing untyped
values into the memory. So the aspects were useful in helping
to fix the interpreter, but there comes a point where developing
a specific debugging aspect is more costly than manual search
and correction.

Also, using aspects do not guarantee correctness or completeness.
The aspects we developed uncovered one obvious error (the insertion
of untyped values into memory) and only hinted at another
(a non-determinate interpret_next_instruction/2 predicate where
a determinate one was needed). There were two more untyped
memory insertions in the code, both occuring on the input modes
of the IO commands, but since "Hello, World!" never exercised those
modes, the aspects did not uncover these errors. Comprehensive unit
testing is still necessary, even with debugging aspects.

The outcome from the above debugging process is a correct (as far as
we can tell) Whirl interpreter.

The Problem: Spacetime

From Debugging Aspects to Production-tuning Aspects

So, an obvious use for aspects are to reduce the effort required
to debug program. Unlike tracing, however, their usefulness does
not end there. Aspects open the implementation of systems: this
is useful, for example, in "consequence-free"
exploration of alternatives in the implementation (prototyping and
optimization).¹¹
We will explore this facet of aspects in this section.

First let us examine the profiling information on the (now working)
"Hello, World!" run:

?- read_file_to_codes('hello.wr', Hello, []), profile(whirl(Hello)).
Hello, World!
=====================================================================
Total time: 1.81 seconds
=====================================================================
Predicate                       Box Entries =    Calls+Redos     Time
=====================================================================
$garbage_collect/1                        6 =        6+0        60.8%
takeout/3                             5,384 =    5,384+0        39.2%
quintus:module_expansion_argument/1       3 =        3+0         0.0%
quintus:meta_predicate/1                  2 =        2+0         0.0%
quintus:floor/2                         648 =      648+0         0.0%
quintus:meta_predicate/2                  2 =        2+0         0.0%
lists:member/2                            3 =        2+1         0.0%
$start_consult/1                          1 =        1+0         0.0%
$style_check/2                            2 =        2+0         0.0%
var/1                                     6 =        6+0         0.0%
$load_file/5                              2 =        1+1         0.0%
$dochk_file/4                             3 =        1+2         0.0%
$extend_file/3                            1 =        1+0         0.0%
$set_predicate_attribute/3                3 =        3+0         0.0%
$execute_directive/2                     12 =        4+8         0.0%
preprocessor/2                            1 =        1+0         0.0%
with_mutex/2                              2 =        2+0         0.0%
load_files/2                              1 =        1+0         0.0%
$load_file/6                              1 =        1+0         0.0%
$expand_directive/2                       4 =        4+0         0.0%
statistics/2                              4 =        4+0         0.0%
repeat/0                                 68 =        2+66        0.0%
$member/2                                 2 =        1+1         0.0%
atom_concat/3                             4 =        4+0         0.0%
assert/1                                  1 =        1+0         0.0%

Hello = [49, 49, 48, 48, 49, 49, 49, 48, 48|...]

Yes
?-

What we see here is the majority of time is taken by the takeout/3
predicate and the resulting garbage collection. The takeout/3
predicate is used pervasively throughout the interpreter, so it
may not be its implementation that needs improvement (as the
profiling information seems to indicate), but it may be how
the data are structured that causes takeout/3 to grab our
attention. In order to make this determination, we should be
clear in the code what we are using takeout/3 for. This requires
some refactoring of the source code to make the intentions
explicit.

After refactoring the source code, we have
version 0.03 of the
interpreter,
which changes the interpret/2 from indirect recursion (via
interpret_next_instruction/2) to a directly recursive predicate. We
also add a layer of description on top of takeout/3 to explicate the
particular use for the predicate (e.g.: ring selection, command
selection, or instruction selection). After this very simple
refactoring, the profiling information improves dramatically ...

?- read_file_to_codes('hello.wr', Hello, []), profile(whirl(Hello)).
Hello, World!
=====================================================================
Total time: 0.69 seconds
=====================================================================
Predicate                       Box Entries =    Calls+Redos     Time
=====================================================================
list_utils:takeout/3                  3,469 =    3,468+1        86.5%
bump_program_counter/2                1,351 =    1,351+0         2.7%
get_selected_command/3                  327 =      327+0         2.7%
history_is/4                          1,351 =    1,351+0         2.7%
switch_wheel/2                          327 =      327+0         2.7%
$garbage_collect/1                       36 =       36+0         2.7%
arithmetic/4                             62 =       62+0         0.0%
change_active_ring_value_by/6            64 =       64+0         0.0%
command/3                               351 =      327+24        0.0%
initialize_program/4                     54 =        1+53        0.0%
whirl/1                                   1 =        1+0         0.0%
switcheroo/2                            328 =      327+1         0.0%
instruction/3                         1,350 =    1,350+0         0.0%
rotate/2                                583 =      583+0         0.0%
execute_command/2                     1,207 =      767+440       0.0%
initialize_ops_ring/1                     1 =        1+0         0.0%
get_command/3                           583 =      583+0         0.0%
spin_reverse/2                          767 =      767+0         0.0%
execute/3                             1,350 =    1,350+0         0.0%
reverse_spin/2                          767 =      767+0         0.0%
initialize_math_ring/1                    1 =        1+0         0.0%
selected_memory_cell/3                  124 =      124+0         0.0%
initialize_memory/1                       1 =        1+0         0.0%
adjust/2                              1,137 =      583+554       0.0%
interpret/2                               2 =        1+1         0.0%

Hello = ["110011100111000001111100000001000011111000011111100000000010"|"..."]

Yes
?-

... so much so that the "Hello, World!" program is no
longer a viable target to use aspects to improve the interpreter's
task.¹² We need a
more demanding program to expose obvious optimization opportunities.

100 Bottles of Hefe Weizen of the Wall

Fortunately, there is a larger program than "Hello, World!", by
Kang Seonghoon, from the same author: beer.wr!
This program has approximately 10 times the number of instructions
and uses more commands to accomplish its goal. Running the program
with the profiler shows firstly that it takes an excessive amount
of time to execute, and that, of course, takeout/3 is the culprit:

?- read_file_to_codes('beer.wr', Hefe, []), profile(whirl(Hefe)).
99 bottles of beer on the wall,
99 bottles of beer.
Take one down, pass it around,
98 bottles of beer on the wall.

98 bottles of beer on the wall,
[...]
2 bottles of beer on the wall,
2 bottles of beer.
Take one down, pass it around,
1 bottle of beer on the wall.

1 bottle of beer on the wall,
1 bottle of beer.
Take one down, pass it around,
No bottles of beer on the wall.

=====================================================================
Total time: 916.43 seconds
=====================================================================
Predicate                       Box Entries =    Calls+Redos     Time
=====================================================================
takeout/3                           838,549 =  838,548+1        71.5%
$garbage_collect/1                    8,886 =    8,886+0        27.8%
is/2                                865,980 =  865,980+0         0.1%
=/2                               2,010,585 =2,010,585+0         0.1%
rotate/2                            148,885 =  148,885+0         0.1%
bump_program_counter/2              330,798 =  330,798+0         0.1%
interpret/2                               2 =        1+1         0.1%
<-/2                                622,614 =  622,612+2         0.1%
update_ring/3                       340,101 =  340,101+0         0.0%
reverse_spin/2                      181,912 =  181,912+0         0.0%
execute_command/2                   287,739 =  181,912+105,827   0.0%
history_is/4                        330,798 =  330,798+0         0.0%
get_active_ring/3                   341,419 =  341,419+0         0.0%
command/3                            88,575 =   76,085+12,490    0.0%
put/1                                10,889 =   10,889+0         0.0%
</2                                 148,885 =  148,885+0         0.0%
instruction/3                       330,797 =  330,797+0         0.0%
history/4                           181,912 =  181,912+0         0.0%
get_selected_command/3               76,085 =   76,085+0         0.0%
adjust/2                            294,806 =  148,885+145,921   0.0%
execute/3                           330,797 =  330,797+0         0.0%
switch_wheel/2                       76,085 =   76,085+0         0.0%
initialize_program/4                  7,222 =        1+7,221     0.0%
quintus:floor/2                     178,538 =  178,538+0         0.0%
direction/2                         148,885 =  148,885+0         0.0%

Hefe = ["This amazing implementation of "99 Bottles of Beer" was done"|"..."]

Yes
?-

Wow! That profile amassed quite a bit of information; information that
does not tell us directly which takeout/3 user is consuming most of the
resources. There are two traditional approaches: diving right in,
which we do not cover
here,¹³ and "working
harder, not smarter". That is, find all of takeout/3's callers, and
map the numbers to the operations.

Those are the traditional approaches; aspects afford an alternative.
The takeout/3 predicate can be wrapped in an aspect as well as its
callers. The callers can establish a specific activation which the
takeout/3 aspect then completes.

Without the profiling information, the aspect approach would be the
best one, but given the wealth of information provided by the profile,
we need not take the extra step of developing an aspect system that
would be much like the one we used to debug the previous version of
the interpreter.

Coalescing the profiling information shows the following takeout/3 or
<-/2 usage:

instruction (program) manipulation
 bump_program_counter  - 330,798
ring changes
 update_ring           - 340,101
 rotate                - 148,885
 get_active_ring       - 341,419
commands
 get_command (not recorded)
 get_selected_command  -  76,085
 command               -  88,575

So, the biggest user of takeout/3 or <-/2 are the ring predicates; it is fortuitous that the ring structure is simple and therefore can be easily changed by aspects to prototype improvements. We will now develop a change to the ring structure and predicates using aspects to affect this change.

Minor Surgery

Since we are changing the structure that holds the rings, that means we
are changing part of the program state data structure. This change
requires we introduce a slight change in the interpreter to construct
that structure from a builder predicate instead of manually. In this way,
we can write an aspect around the builder predicate to prototype a term
structure for the rings container. So, the first whirl/1 rule changes
from:

whirl([Whirl|Instructions]) :-
 initialize_ops_ring(Ops),
 initialize_math_ring(Math),
 initialize_memory(Memory),
 initialize_program([Whirl|Instructions], 0, Program, []),
 interpret(program(ops, history(quescient, last_instruction(1)),
                   [Ops, Math], Memory, program(Program)), _).

to:

whirl([Whirl|Insts]) :-
 initialize_ops_ring(Ops),
 initialize_math_ring(Math),
 initialize_memory(Memory),
 initialize_program([Whirl|Insts], 0, Opcodes, []),
 build_program_state(ops, history(quescient, last_instruction(1)),
                     Ops, Math, Memory, Opcodes, Program),
 interpret(Program, _).

build_program_state(Active, Hist, Ops, Math, Mem, Opcodes,
                   program(Active, Hist, [Ops, Math], Mem, program(Opcodes))).

Now that we have this change in place, we must also make sure every
ring selection uses the get_active_ring/3 predicate (there remain several
places the active ring is extracted from the Rings list directly). With
this change and the above program state builder predicate, we have
version 0.04 of the interpreter.
To change the representation of the rings data structure, we write aspects
around the program state builder predicate, and all predicates using that
structure directly. It turns out that there are only two such predicates:
get_active_ring/3 and update_ring/3, so the task of writing the aspects
is rather simple.

:- use_module(library(aspects), [aspect/3, aspect_declaration/1]).

:- use_module(library(ring_utils)).
:- aspect_declaration(use_module(library(ring_utils))).

:- aspect(around, build_program_state(A, H, Ops, Math, Mem, Opcodes, Program),
         Program = program(A, H, rings(Ops, Math), Mem, program(Opcodes))).

:- aspect(around, get_active_ring(A, Rings, Ring),
         get_active_ring_proxy(A, Rings, Ring)).

:- aspect(around, update_ring(Ring, Old, New),
         update_ring_proxy(Ring, Old, New)).

The above aspects change the implementation of the rings structure from
a list to a rings/2 term where the Ops ring is the first argument and the
Math ring the second. Since each ring proxy predicate has more than one
clause, we provide the implementation in a utility module:

:- module(rings_utils, [get_active_ring_proxy/3, update_ring_proxy/3]).

get_active_ring_proxy(ops, rings(Ops, _), Ops).
get_active_ring_proxy(math, rings(_, Math), Math).

update_ring_proxy(ring(ops, I, S, V), rings(_, Math),
                 rings(ring(ops, I, S, V), Math)).
update_ring_proxy(ring(math, I, S, V), rings(Ops, _),
                 rings(Ops, ring(math, I, S, V))).

With the aspect definition file
(ring_aspects.pl) and the utility file
containing the ring data access proxies
(ring_utils.pl), we weave the aspects into
the interpreter and profile the running system:

=====================================================================
Total time: 916.18 seconds
=====================================================================
Predicate                       Box Entries =    Calls+Redos     Time
=====================================================================
list_utils:takeout/3                498,447 =  498,447+0        71.6%
$garbage_collect/1                    8,863 =    8,863+0        27.6%
is/2                                865,980 =  865,980+0         0.1%
bump_program_counter/2              330,798 =  330,798+0         0.1%
=/2                               2,010,586 =2,010,586+0         0.1%
get_active_ring/3                   447,157 =  447,157+0         0.1%
tintin:around_user_get_active_ring_aspect/3447,157 =  447,157+0         0.0%
rotate/2                            148,885 =  148,885+0         0.0%
ring_utils:update_ring_proxy/3      340,102 =  340,101+1         0.0%
command/3                            88,575 =   76,085+12,490    0.0%
interpret/2                               2 =        1+1         0.0%
reverse_spin/2                      181,912 =  181,912+0         0.0%
syntax:<-/2                         175,455 =  175,455+0         0.0% history_is/4                        330,798 =  330,798+0         0.0% instruction/3                       330,797 =  330,797+0         0.0% execute/3                           330,797 =  330,797+0         0.0% adjust/2                            294,806 =  148,885+145,921   0.0% execute_command/2                   287,739 =  181,912+105,827   0.0% tintin:around_user_update_ring_aspect/3340,101 =  340,101+0         0.0% ring_utils:get_active_ring_proxy/3  447,157 =  447,157+0         0.0% spin_reverse/2                      181,912 =  181,912+0         0.0% update_ring/3                       340,101 =  340,101+0         0.0% switch_wheel/2                       76,085 =   76,085+0         0.0% get_selected_command/3               76,085 =   76,085+0         0.0% get_command/3                       148,885 =  148,885+0         0.0%

How disappointing! The "optimization" we made, replacing ring selection
from list element
access to term argument access, had no noticeable effect on program
runtime, even with the large numbers of calls to these predicates.
The list contains only two elements, however, and this may be the reason
that this "optimization" yielded no results, in spite of the fact that
term argument access occurs in constant time and list element access
occurs in linear time. Although this was a mistaken effort, it was not
a destructive one (as it would be if aspects were not used to develop it).
If we choose, we can simply discard the aspect and utility file, and the
target source code "automagically" reverts to its original implementation.
Without aspects, these changes would need to be removed from the target
source code manually, either by reverting to the previous revision under
source code control or by user intervention. Aspects create a safety net:
the previous version is immediately present once the aspects are removed.

Major Surgery...

There are larger lists being processed by the interpreter: the program,
itself, can be large (in the case of
beer.wr, there are over 15k instructions), and
each ring has a set of 12 command. Program instruction access occurs
more often then command access, and the instructions can be deep in the
list, so we will prototype changes to the program instruction handler
next. This change is not so trivial; the
Whirl interpreter document recommends
that we compile the instructions into a set of instruction/2 facts,
instead of storing the instructions in a list, as the interpreter does.
This change will allow Prolog's speedier indexing of clauses in a
predicate to improve runtime performance. This change,
like the one for ring access, requires the
same steps: modifying the program state data structure and then modifying
clauses that access the instructions from the state, but it also requires
that we change the instruction scanner so that it pushes the instructions
as facts into the Prolog database. As assert/1 is an expensive operation,
doing so in bulk, by compiling these facts from a file, is probably the
best approach, this compilation is also a change from the previous
optimization attempt.

As before, we need to modify the target source code slightly so that
we can use aspects to modify the system externally. Specifically,
we need to externalize the point when the instruction scanning routines
push the instructions onto the internal representation:

initialize_program([Op|Codes], Index) -->
 ({ Code is Op - 48,
    Code <- [0, 1] } ->
      [Index - Code],
      { NewIndex is Index + 1 }
    ;
      { NewIndex = Index }),
 initialize_program(Codes, NewIndex).
initialize_program([], _) --> [-1 - error].

becomes:

initialize_program([Op|Codes], Index) -->
 ({ Code is Op - 48,
    Code <- [0, 1] } ->
      push_instruction(Index, Code),
      { NewIndex is Index + 1 }
    ;
      { NewIndex = Index }),
 initialize_program(Codes, NewIndex).
initialize_program([], _) --> push_instruction(-1, error).

push_instruction(Index, Opcode) --> [Index - Opcode].

This change and by rolling the previous ring
modification into the interpreter,¹⁴ begets version
0.05 of the interpreter, which we can modify with aspects to compile
a Whirl's program instructions into memory. The first set of aspects
change how the initialize_program/4 predicate stores the scanned
instructions:

:- aspect(after, initialize_memory(_),
         (tell('the_whirl_program.pl'),
          write('% auto-generated: instruction/2 contains the whirl program'),
          nl, nl)).

:- aspect(before, build_program_state(_, _, _, _, _, _, _),
         (told, compile(the_whirl_program))).

:- aspect(around, push_instruction(Index, Opcode, A, B),
         (format('instruction(~d, ~w).~n', [Index, Opcode]), A = B)).

Interestingly, none of these aspects even work with the
initialize_program/4 predicate directly: the after and before aspects
manage the file into which the instruction/2 facts are written, and the
around aspect uses the newly created push_instruction/4 helper predicate
to do the task of writing those instructions into the file. Since the
instructions are stored in memory now, not in a list, we need to modify
the program state data structure to match this change...

:- aspect(around, build_program_state(A, H, Ops, Math, Mem, Opcodes, Program),
         Program = program(A, H, rings(Ops, Math), Mem, instruction_index(0))).

... and how instructions are obtained (including manipulating the program
counter):

:- aspect(around, build_program_state(A, H, Ops, Math, Mem, Opcodes, Program),
         Program = program(A, H, rings(Ops, Math), Mem, instruction_index(0))).

:- aspect(around, bump_program_counter(P0, P1),
         (P0 = program(A, H, R, M, instruction_index(Index)),
          NewIndex is Index + 1,
          P1 = program(A, H, R, M, instruction_index(NewIndex)))).

:- dynamic instruction/2.
:- aspect_declaration(dynamic instruction/2).

:- aspect(around, instruction(Instruction, P, P),
         (P = program(A, H, R, M, instruction_index(Index)),
          instruction(Index, Instruction))).

Finally, we change any commands that manipulate the program counter.
The Whirl language only has two such commands: exit and padd.

:- aspect(around, command(Command, P0, P1),
         (Command = exit,
          !,
          P1 = program(_, _, _, _, instruction_index(-999)))).

:- ensure_loaded(library(math)).
:- aspect_declaration(ensure_loaded(library(math))).

:- aspect(around, command(Command, P0, P1),
         (Command = padd,
          !,
          P0 = program(ops, H, Rings, M, instruction_index(Index)),
          get_active_ring(ops, Rings, ring(_, _, _, value(X))),
          floor(X, Y),
          NewIndex is Index + Y,
          (instruction(NewIndex, _) -> true;
               raise_exception(sigsegv(no_address(NewIndex), from(Index)))),
          NewerIndex is NewIndex - 1,
          P1 = program(ops, H, Rings, M, instruction_index(NewerIndex)))).

:- aspect(around, command(_, _, _), prove_goal).

We may now weave these instruction
aspects in place and profile the new interpreter:

?- read_file_to_codes('beer.wr', Hefe, []), profile(whirl(Hefe)).
% the_whirl_program compiled into insies 0.14 sec, 1,002,980 bytes
99 bottles of beer on the wall,
[...]
Take one down, pass it around,
No bottles of beer on the wall.

=====================================================================
Total time: 9.37 seconds
=====================================================================
Predicate                       Box Entries =    Calls+Redos     Time
=====================================================================
$garbage_collect/1                      208 =      208+0        29.9%
list_utils:takeout/3                167,055 =  167,055+0        13.9%
=/2                               2,179,320 =2,179,320+0         7.4%
is/2                                865,981 =  865,981+0         3.7%
rotate/2                            148,885 =  148,885+0         3.5%
insies:around_user_bump_program_counter_aspect/2330,797 =  330,797+0         3.3%
insies:instruction/2                331,096 =  331,096+0         2.9%
reverse_spin/2                      181,912 =  181,912+0         2.9%
syntax:<-/2                         175,453 =  175,453+0         2.9% interpret/2                               2 =        1+1         2.7% history_is/4                        330,797 =  330,797+0         2.7% update_ring/3                       340,101 =  340,101+0         2.7% get_active_ring/3                   447,157 =  447,157+0         2.2% instruction/3                       330,798 =  330,798+0         2.0% bump_program_counter/2              330,797 =  330,797+0         2.0% insies:around_user_instruction_aspect/3330,798 =  330,798+0         1.4% history/4                           181,912 =  181,912+0         1.4% execute/3                           330,797 =  330,797+0         1.2% get_command/3                       148,885 =  148,885+0         1.2% execute_command/2                   287,739 =  181,912+105,827   1.0% adjust/2                            294,806 =  148,885+145,921   1.0% command_aspect_target/3              88,473 =   75,985+12,488    1.0% read_clause/2                        15,558 =   15,558+0         1.0% $record_clause/3                     15,557 =   15,557+0         1.0% insies:around_user_command_aspect/3 228,353 =   76,085+152,268   0.6%  Hefe = ["This amazing implementation of "99 Bottles of Beer" was done"|"..."]   Yes ?-

Now this is the kind of return on our efforts that we are looking for --
we sped up the running Whirl program by a factor of nearly 100 fold.
This code improvement, prototyped using aspects, can be rolled into the
source code base, as this implementation choice seems to be a clear
winner for the interpreter.

... and a bit of patch-up

Reviewing the most recent runtime profile information, we see that
the percentages have dropped considerably from the first iteration of the
interpreter. This is good: in general, optimization should only be
attempted on predicates that consume more than 10% of the runtime
resources. There is still one area that could be improved: the commands
are stored in a list and use takeout/3 and <-/2 for selection and
rotation. Converting the list representation to a set of facts in a
predicate will improve the runtime efficiency of the interpreter. This
effort is justified, as it is a more modest effort than that of the
program instruction reorganization.

As before, we roll the aspect prototype code into the target source
code, which gives us version 0.06
of the Whirl interpreter. With this new version, we can apply our
changes, prototyping them with aspects. Up to now, the commands were
stored as a list in each ring where the command at the head of the list
is the selected one. We will change this representation: the commands
will be ordered not numerically, but by relation in a set of facts in
a predicate:

:- module(command_utils, [command/2]).

:- op(200, xfx, ::).

% the circle of ops commands:
command(ops::noop, ops::exit).
command(ops::exit, ops::one).
command(ops::one, ops::zero).
command(ops::zero, ops::load).
command(ops::load, ops::store).
command(ops::store, ops::padd).
command(ops::padd, ops::dadd).
command(ops::dadd, ops::logic).
command(ops::logic, ops::if).
command(ops::if, ops::intIO).
command(ops::intIO, ops::ascIO).
command(ops::ascIO, ops::noop).

% the circle of math commands:
command(math::noop, math::load).
command(math::load, math::store).
command(math::store, math::add).
command(math::add, math::mult).
command(math::mult, math::div).
command(math::div, math::zero).
command(math::zero, math::'<').
command(math::'<', math::'>').
command(math::'>', math::'=').
command(math::'=', math::not).
command(math::not, math::neg).
command(math::neg, math::noop).

(We adhere to the ring-command separator syntax used in the
Whirl programming community)

This new utility module allows us to replace the unordered-but-indexed
commands stored in a list in each ring with aspects that use the above
relations to select the appropriate command. First, we will change the
structure of the rings storing the command set; they will now store the
command selected:

:- op(200, xfx, ::).
:- aspect_declaration(op(200, xfx, ::)).

:- op(200, yf, spin).
:- aspect_declaration(op(200, yf, spin)).

:- aspect(around, initialize_ops_ring(Ring),
       Ring = ring(ops, ops::noop, clockwise spin, value(0))).
:- aspect(around, initialize_math_ring(Ring),
       Ring = ring(math, math::noop, clockwise spin, value(0))).

Once we change the ring structure, we need to change how predicates
interact with them, specifically when choosing or manipulating
commands. It turns out that there are only two
predicates that do this (get_selected_command/3 and
rotate/2):¹⁵

:- use_module(library(command_utils), [rotate_in_direction/3]).
:- aspect_declaration(use_module(library(command_utils), [rotate_in_direction/3])).

:- aspect(around, rotate(P0, P1),
         (P0 = program(Active, Hist, Rings0, Mem, Index),
     P1 = program(Active, Hist, Rings1, Mem, Index),
     get_active_ring(Active, Rings0, ring(_, Active::Cmd, Spin, Val)),
     rotate_in_direction(Spin, Active::Cmd, NewCmd),
     update_ring(ring(Active, NewCmd, Spin, Val), Rings0, Rings1))).

:- aspect(around, get_selected_command(Cmd, P0, P1),
    (P0 = program(Active, _, Rings, _, _),
     P1 = P0,
     get_active_ring(Active, Rings, ring(_, Active::Cmd, _, _)))).

The rotate/2 aspect calls for something more than a simple command/2
relation expressed in the command_utils
module (I am referring to the rotate_in_direction/3
call), so we add a few DCGs to the command module to finish out this
iteration...

rotate_in_direction(clockwise spin) --> clockwise_command.
rotate_in_direction(counterclockwise spin) --> counterclockwise_command.

clockwise_command --> command.
counterclockwise_command(A, B) :- command(B, A).

... and we are done! The profile of the running system yields the
following results:

=====================================================================
Total time: 7.13 seconds
=====================================================================
Predicate                       Box Entries =    Calls+Redos     Time
=====================================================================
$garbage_collect/1                      163 =      163+0        32.0%
list_utils:takeout/3                 18,170 =   18,170+0        10.1%
bump_program_counter/2              330,797 =  330,797+0         6.9%
=/2                               2,048,350 =2,048,350+0         6.3%
interpret/2                               2 =        1+1         4.0%
instruction/2                       331,096 =  331,096+0         4.0%
is/2                                419,326 =  419,326+0         3.4%
command_utils:command/2             148,885 =  148,885+0         3.4%
chain:around_user_rotate_aspect/2   148,885 =  148,885+0         2.9%
instruction/3                       330,798 =  330,798+0         2.6%
reverse_spin/2                      181,912 =  181,912+0         2.1%
spin_reverse/2                      181,912 =  181,912+0         2.1%
execute/3                           330,797 =  330,797+0         1.9%
history_is/4                        330,797 =  330,797+0         1.9%
rotate/2                            148,885 =  148,885+0         1.6%
chain:around_user_get_selected_command_aspect/376,085 =   76,085+0         1.3%
execute_command/2                   287,739 =  181,912+105,827   1.3%
update_ring/3                       340,101 =  340,101+0         1.3%
command/3                            88,573 =   76,085+12,488    1.3%
get_selected_command/3               76,085 =   76,085+0         1.3%
format/2                             15,558 =   15,558+0         1.3%
command_utils:rotate_in_direction/3 148,885 =  148,885+0         1.3%
get_active_ring/3                   447,157 =  447,157+0         1.1%
switch_wheel/2                       76,085 =   76,085+0         1.1%
history/4                           181,912 =  181,912+0         0.8%

This improvement shaved off a second or two from the runtime execution,
but, more importantly, the "weight" of the execution is now more evenly
distributed throughout the interpreter.

Taking Out takeout/3

The only predicate above 10% in the profile now is our long-standing
friend, takeout/3. Is there a way to eliminate it entirely from the
interpreter? Yes, we transform the representation of memory from a
list to a set of dynamic predicates. Let's do that. First, we roll
the changes from above into our interpreter, giving us
version 0.07. In so doing, we
observe that some predicates have become superfluous, so we discard them
(specifically, the different initialize_<type>_ring/1 commands are
reduced to one initialize_ring/2 fact, direction/2 is replaced by
rotate_in_dirction/3, and get_command/3 and adjust/2 are discarded).
Next, we replace the list representation of memory/1 in the program/5
term with a dynamic memory/2 predicate using aspects...

:- dynamic memory/2.
:- aspect_declaration(dynamic memory/2).

:- aspect(around, initialize_memory(Mem),
       (Mem = memory_index(0),
     retractall(memory(_, _)),
     assert(memory(0, value(0))))).

:- aspect(around, selected_memory_cell(Val, P0, P1),
         (P0 = program(A, H, R, memory_index(X), P),
     memory(X, value(Val)),
     P1 = P0)).

...and all the commands that change the memory layout (which are defined
in memory_aspects.pl). With that
we are ready to test the takeout/3-free system:

=====================================================================
Total time: 6.17 seconds
=====================================================================
Predicate                       Box Entries =    Calls+Redos     Time
=====================================================================
$garbage_collect/1                      142 =      142+0        31.0%
=/2                               1,963,567 =1,963,567+0         6.8%
bump_program_counter/2              330,797 =  330,797+0         5.2%
rotate/2                            148,885 =  148,885+0         4.7%
interpret/2                               2 =        1+1         4.7%
instruction/2                       331,096 =  331,096+0         4.2%
update_ring/3                       340,101 =  340,101+0         3.9%
instruction/3                       330,798 =  330,798+0         3.7%
command_utils:command/2             148,885 =  148,885+0         3.4%
history_is/4                        330,797 =  330,797+0         2.9%
get_active_ring/3                   447,157 =  447,157+0         2.9%
execute_command/2                   287,739 =  181,912+105,827   2.9%
mem:around_user_command_aspect/3    321,536 =   76,085+245,451   2.4%
reverse_spin/2                      181,912 =  181,912+0         2.4%
is/2                                419,326 =  419,326+0         2.4%
execute/3                           330,797 =  330,797+0         2.1%
read_clause/2                        15,558 =   15,558+0         2.1%
command_utils:rotate_in_direction/3 148,885 =  148,885+0         1.8%
format/2                             15,558 =   15,558+0         1.8%
command/3                            76,284 =   76,284+0         1.3%
history/4                           181,912 =  181,912+0         1.0%
command_aspect_target/3              58,002 =   56,796+1,206     1.0%
switch_wheel/2                       76,085 =   76,085+0         0.8%
expand_term/2                        46,674 =   15,558+31,116    0.8%
selected_memory_cell/3               17,594 =   17,594+0         0.5%

That did it! This implementation is nearly 150 times faster than the
first working revision
(version 0.02). Rolling in the
changes, including eliminating that pesky takeout/3, gives us an improved
version, using asserts.

Coda: takeout/3

We've "solved" the "problem" of takeout/3, but by introducing
extra-logical features into the interpreter (assert/1, retract/1,
and retractall/1). Time is time, but, if possible, I would prefer to
avoid using facilities that make the program more unwieldy. So, instead
of using dynamic facts, let's explore alternatives. The first, obvious,
one is to replace takeout/3 with the nearly equivalent delete/3. This
approach seems good: as delete/3 is implemented by the Prolog
implementation, it should be faster, taking advantage of
lower-level code available to the Prolog implementation. Replacing
takeout/3 itself with an aspect, however, is not so easy, as the aspects
module itself uses takeout/3 in its implementation, meaning that the
aspects load and compile takeout/3 before they can (in this case, cannot)
modify the predicate. For this special case, a manual search and replace
is required, but, fortunately, as there is only one call to takeout/3 left
in the source code (in version
0.07), this task is not too difficult (resulting in
version 0.07 using delete/3
instead of takeout/3).

Profiling the running executable shows surprising results...

=====================================================================
Total time: 7.38 seconds
=====================================================================
Predicate                       Box Entries =    Calls+Redos     Time
=====================================================================
$garbage_collect/1                      162 =      162+0        29.9%
lists:delete/3                       18,170 =   18,170+0        16.8%
=/2                               1,692,663 =1,692,663+0         8.9%
bump_program_counter/2              330,797 =  330,797+0         4.1%
instruction/2                       331,096 =  331,096+0         3.3%
instruction/3                       330,798 =  330,798+0         3.3%
interpret/2                               2 =        1+1         3.3%
is/2                                419,326 =  419,326+0         3.3%
update_ring/3                       340,101 =  340,101+0         3.0%
history_is/4                        330,797 =  330,797+0         2.8%
reverse_spin/2                      181,912 =  181,912+0         2.3%
execute_command/2                   287,739 =  181,912+105,827   2.0%
get_active_ring/3                   447,157 =  447,157+0         2.0%
rotate/2                            148,885 =  148,885+0         2.0%
command_utils:command/2             148,885 =  148,885+0         1.8%
history/4                           181,912 =  181,912+0         1.5%
command_utils:rotate_in_direction/3 148,885 =  148,885+0         1.5%
read_clause/2                        15,558 =   15,558+0         1.3%
command/3                           106,549 =   76,085+30,464    1.0%
switch_wheel/2                       76,085 =   76,085+0         0.8%
format/2                             15,558 =   15,558+0         0.8%
initialize_program/2                  7,222 =        1+7,221     0.5%
execute/3                           330,797 =  330,797+0         0.5%
spin_reverse/2                      181,912 =  181,912+0         0.5%
flag/3                               15,572 =   15,572+0         0.5%

... execution time actually increased over the takeout/3
implementation! Moral of the story: when working with lists, use
takeout/3; and avoid delete/3.

Another approach is to change the representation of the data structure
storing the memory cells. For the takeout/3 implementation, they are
stored in a list where the head cell is the currently selected one and
the other cells are indexed but unordered. Storing the cells in a list
give O(n), or linear, search time characteristics.
Trees, such as
binary, 2-3, or AVL trees, give better results, attaining O(log n) search
times.¹⁶ For small,
constantly changing data sets, trees may actually take longer, due to the
extra housekeeping involved. The payoff occurs with larger data sets.
The takeout/3 implementation's profile indicates takeout/3 was called over
18,000 times, so let's see how storing the memory cells in an
AVL tree does in comparison to the takeout/3
implementation. Once again, we change the representation of the program/5
state term. This time, as the root node of the AVL tree is not necessarily
the currently selected node, we need to add an index argument to the
memory store. An aspect serves this need:

:- aspect(around, initialize_memory(Mem),
         (avl_empty(Empty),
          avl_put(Empty, 0, value(0), AVL),
          Mem = memory(0, AVL))).

Now, wherever we access or change memory, we wrap that behavior with an
aspect replacing list access with AVL access:

:- use_module(library(avl), [
    avl_get/3,
    avl_replace/4,
    avl_put/4,
    avl_has/2,
    avl_empty/1
]).
:- aspect_declaration(use_module(library(avl), [
    avl_get/3,
    avl_replace/4,
    avl_put/4,
    avl_has/2,
    avl_empty/1
])).
:- aspect(around, selected_memory_cell(Datum, P0, P1),
         (P0 = program(_, _, _, memory(Index, AVL), _),
          avl_get(AVL, Index, value(Datum)),
          P1 = P0)).

... and replace the implementations of the commands that access memory
(as we did for the asserts
version). This aspect implementation can be found in
avl_aspects.pl. Weaving these aspects
into the takeout/3
implementation gives us the following profiled result:

=====================================================================
Total time: 6.44 seconds
=====================================================================
Predicate                       Box Entries =    Calls+Redos     Time
=====================================================================
$garbage_collect/1                      140 =      140+0        27.5%
=/2                               1,985,908 =1,985,908+0         9.6%
instruction/2                       331,096 =  331,096+0         6.5%
update_ring/3                       340,101 =  340,101+0         5.9%
command_utils:command/2             148,885 =  148,885+0         4.8%
is/2                                419,778 =  419,778+0         4.5%
interpret/2                               2 =        1+1         4.2%
bump_program_counter/2              330,797 =  330,797+0         3.7%
history_is/4                        330,797 =  330,797+0         3.4%
reverse_spin/2                      181,912 =  181,912+0         3.4%
rotate/2                            148,885 =  148,885+0         2.8%
execute/3                           330,797 =  330,797+0         2.5%
command/3                            76,284 =   76,284+0         1.7%
format/2                             15,558 =   15,558+0         1.7%
command_utils:rotate_in_direction/3 148,885 =  148,885+0         1.4%
avli:around_user_command_aspect/3   321,536 =   76,085+245,451   1.1%
spin_reverse/2                      181,912 =  181,912+0         1.1%
switch_wheel/2                       76,085 =   76,085+0         1.1%
instruction/3                       330,798 =  330,798+0         1.1%
initialize_program/2                  7,222 =        1+7,221     1.1%
history/4                           181,912 =  181,912+0         0.8%
get_active_ring/3                   447,157 =  447,157+0         0.8%
command_aspect_target/3              58,002 =   56,796+1,206     0.8%
execute_command/2                   287,739 =  181,912+105,827   0.8%
read_clause/2                        15,558 =   15,558+0         0.8%

It has a runtime execution speed better than the takeout/3 implementation,
and comparable to the assert/retract version. Since the two
divergent systems are of comparable speed, we are left with a choice:
choose the
memory-as-facts implementation and lose visible runtime
details and lose confidence that the system is (provably) correct; or,
choose the memory-as-AVL
tree implementation with an immediate debugging advantage and
knowledge that the model is logically consistent "all the way down". Of
course the way to go is the latter, but it does highlight that
alternatives exist: often software engineers "give in" to compromising
system fundamentals for expediency. This compromise need not be made:
improving algorithmic choices pays off both in code clarity and in
runtime results.

Spacetime Summary

This extended example demonstrated prototyping a series of optimizations
on a working system, iteratively, and safely, rolling in the
improvements until we obtained a new working system significantly
smaller in its memory footprint and significantly faster in its runtime
execution. We saw in this example that aspects do not necessarily
guide one to a correct approach, but we also saw that deadends could be
discarded without the usual recovery costs involved. So, aspects not only
act as excellent meta-debuggers, but also are a workable way to change the
implementation of a working system safely.

Production Aspect Survey

We've seen how aspects can be used in product development from the
perspectives of debugging nascent code and then prototypting optimizations
for working code (that are eventually rolled into the source base when
verified). There is also another use for aspects -- production aspects,
or aspects advising delivered code that are meant to be free-standing.
These aspects have received less attention from the
community,¹⁷ and,
due to restrictions on systems I advise with aspects, I can only offer
general descriptions, broken down into categories:

Security

Ken Pelletier describes how he eliminated vunerabilities from a content
rendering system for a web browser in
"An AOP success story
from the real world".

Notification

In a rule-based expert system, there is a reporting requirement that
results in a rule finding based on anomalies detected into the input
raw data set to the rule manager as it is being parsed into an internal
representation. As the parse phase preceeds the rule-finding phase,
implementing this special
functionality for this one rule using traditional methods (i.e.:
non-aspect-oriented methods) needlessly
complicates the entire system, but using a production aspect that observes
the parse phase and inserts the rule finding into the result set
keeps the rest of the system simple and clean.

Simulation

It is often the case that working models must be delivered in the context
of incomplete information or assets. The usual approach (which takes a
distant second to the most-exercised option: not delivering) is to stand
up proxies that interact as dummies for the missing resources. This works
for cleanly-separated sub-systems, but for systems that engage in complex
dialogues with other assets, the proxy code becomes entangled with the
production code. Production aspects provide a venue by which proxies may
be built without emmeshing their simulated functionality into the
production system.

Another good source for ideas on production aspects is the
AspectJ
Programmers'
Guide, particularly the
Production Aspects section of that manual, which discusses change
monitoring, context-passing, and protocol enforcement.

Conclusion

AOP is liberating! In this document we provided the protocol of the
aspects module and then demonstrated how to use them effectively in
debugging and prototyping with an extended example that yielded
extraordinary improvements to a non-trivial system. Also, the production
aspects we described are essential to the operations of in-production
systems and for other mission critical systems being build from first
principles. AOP is gathering momentum in the software engineering
community using imperative programming languages, with good reason.
Aspects also have relevance and utility with logic programming
languages, and this document and associated module provide a means to
use aspects effectively in rule-based systems.

Endnotes

⁰	This document, along with the aspects module and example files are available as a compressed archive at prolog-aspects-20051221.tgz.
¹	Java seems to be the lead-off technology for AOP. The first aspect-oriented compiler for Java was created at Xerox Parc, and then later merged into the Eclipse effort (Eclipse is a Java IDE) at IBM. AOP's home page is http://eclipse.org/aspectj.
²	"Open implementation" (oi) was the rallying cry that launched (what became) the AOP effort at Xerox Parc (The oi home page is at http://ww2.parc.com/csl/groups/sda/projects/oi). The manifesto of oi is [Ki96], a delightful little paper that clearly makes the case for oi -- it should be required reading for anyone wishing to understand the utility of aspects.
³	No surprise there: one of the authors of [KRB91], Gregor Kiczales, initialed the AOP movement with his manifesto on oi [Ki96]. The metaobject protocol is a strange, beautiful and powerful thing -- besides the first reference above [S90], § 28.1.8 shows its utility from a reference manual's perspective.
⁴	Programming languages with traditional (covariant) object-oriented systems vary a method's behavior based on the class of the object to which that method belongs (i.e.: methods are polymorphic based on its containing class). In contravariant object-oriented systems, method polymorphism is based on all the participating objects (contravariance is also known as parametric polymorphism because method invocation is based on the cartesian product of the (actual or formal) types of all of its arguments). In comparing contravariance to covariance, it is possible to highlight advantages⁵ and disadvantages;⁶ formally speaking contravariance is a generalisation of covariance.
⁵	There are several immediate advantages that contravariance gives the programmer (a more esoteric one is that the metaobject protocol allows users to control method selection), perhaps the most noticeable advantage over covariance is that method behavior may be refined by clients simply by adding, or removing, methods from the generic function. So, for example, addition (the '+' operator) may be extended to arrays by adding a '+(<array> a, <array> b)' method to the '+' generic function. See, e.g., http://www.cotilliongroup.com/code/dylan_programmer for the adding-arrays example played out, or [F97], § 6.1 for a fully-developed and -explained example (for time-offsets).
⁶	Although Bruce, Cardelli, et al, indicate in [BC95] that the surest approach to resolving the types of multiple arguments is covariant double-dispatch (the paper applauded the developers of Smalltalk in discovering this approach and using it to good effect), in practice programmers rarely use double-dispatch, and in contravariant systems the arguments' types are usually resolved without difficulty. Besides, double-dispatch requires thought and effort on the programmer's part to avoid logic errors (it is facile to juxtapose argument order mistakenly when they are eventually used); the problems of juxtaposition do not arise in contravariant languages as the dispatch is hidden in the language semantics.
⁷	This is even correct in like of Java 1.5's addition of annotations to the language specification. Annotations are, by their nature, external to (an object-oriented) language. They also require significantly more work to implement significantly less functionality that what aspects provide; so one is lead to wonder at the decision by the Java team to select annotations over aspects as a language feature to advise primary constructs of the language...
⁸	That the "Hello, World!" program is non-trivial in Whirl should be a very strong hint at the level of effort needed to do just about anything in the programming language...
⁹	The implication operator for Prolog (':-', representing ← or ⊂) is actually reverse implication, which for organizing like clauses together works well. The forward implication operator used here ('=>', representing → or ⊃) is more often seen in works using predicate logic.
¹⁰	At first I did an aspect around interpret_next_instruction/2 with the traditional if-then syntax ('->'/2), which worked fine with two clauses for that aspect, each clause using prove_goal/0 as its last statement. That obviously should be refectored to a before aspect, but that didn't work because the test looked for any exit command, even if it wasn't executed, so I finally switched to an after aspect on command/3 and used a spypoint/1 to leap to the interpret_next_instruction/2 predicate. Writing aspects is like writing "ordinary" code; it has its own developement process.
¹¹	I say "consequence-free" because in normal iterative development one goes from a working (less optimal) system to a working (more optimal) system often by breaking the implementation. Mistakes in this iterative development cycle are very costly because it usually entails rewriting the system's code back to functionality from scratch. By allowing aspects to modify the implementation, the developer earns a free safety net: the underlying implementation works and its source is untouched through the developement cycle. If an aspect prototype or optimization breaks the system, it can simply be discarded and the underlying, working, implementation will resume its role.
¹²	Optimizing sub-second programs is not something I need concern myself with. What is unfortunate about the success of the refactoring is not that it shows how refactoring, qua refactoring, improves code efficiency, but, instead, it shows that SWI Prolog handles indirect and direct recursion differently. Converting indirect recursion to computed jumps is a solved problem (see, e.g. [Ba95], which inspired the development a complete Scheme system; but, more to the point, [Ka99], pp. 56 on, cover Last Call Optimization (LCO) for Prolog in detail), so it is disappointing that I must code around what should be a non-issue.
¹³	It seems that in the software engineering field this particular approach has received particular attention -- not so much rigorous theoretical study, but there are large numbers of software engineers, including this one, who are dillegently applying their best efforts to explore every possibility of this approach.
¹⁴	This change is a good one. Although it does not improve the runtime execution speed, it does convert the algorithm of ring access from list processing (which implies all elements are accessed progressively) to term argument access (which better fits the non-deterministic (whereever did the word "random" come to be associated with this kind of element access) nature of accessing the rings). In the introduction to the list_utils manual I make this very recommendation, so it is a good day, indeed, when I follow my own advice.
¹⁵	The housekeeping part of the interpreter, which comprises most of the system, has changed very little through these iterative improvements. I attribute this effect of small changes to the code to accomodate sweeping changes in how the system works to DCGs, as they allow me to code in the style of expressing intention without concern for the underlying implementation.
¹⁶	Trees are discussed at length in any book on algorithms. [Br2001], chapters 9 and 10, implements these structures in Prolog. I found a better implementation of the AVL tree in [RL1999], § 5.9.
¹⁷	The slow adaptation of production aspects is understandable: after 15 years of research in the functional programming community, it has taken another 5 years for the general programmer to become aware of, and to begin to use, aspects. Production aspects, being the highest risk, require some familiarity with aspects of lower risk (debugging and prototyping) before being essayed.

Works Consulted

[Ba95]	Henry Baker, "CONS Should Not CONS Its Arguments, Part II: Cheney on the M.T.A.", ACM Sigplan Notices, 30, 9, Nimble Computer Corp, California, 1995.
[Br2001]	Prolog Programming for Artificial Intelligence, 3rd ed., Ivan Bratko, Addison-Wesley, Great Britain, 2001.
[BC95]	Kim Bruce, Luca Cardelli, Giuseppe Castagna, The Hopkins Object Group, Gary T. Leavens, Benjamin Pierce, "On Binary Methods", _, _, _, 1995.
[F97]	Dylan Programming: An Object-Oriented and Dynamic Language, Neal Feinberg, Sonya E. Keene, Robert O. Mathews, P. Tucker Withington, Addison-Wesley, Massachusetts, 1997.
[Ka99]	Warren's Abstract Machine: a Tutorial Reconstruction, Hassan Aït-Kaci, MIT Press, Massachusetts, 1999, http://www.vanx.org/archive/wam/wam.html.
[Ki96]	Gregor Kiczales, "Beyond the Black Box: Open Implementation", IEEE Software, _, January 1996
[KRB91]	The Art of the Metaobject Protocol, Gregor Kiczales, Jim des Rivières, Daniel G. Bobrow, MIT Press, Massachusetts, 1991.
[RL1999]	Algorithms: A Functional Programming Approach, Fethi Rabhi, Guy Lapalme, Addison-Wesley, Great Britain, 1999.
[S90]	Common Lisp: the Language, 2nd ed., Guy L. Steele, Jr. Digital Press, Massachusetts, 1990.

author:	Douglas M. Auclair (dauclair at hotmail dot com)
date:	December 14, 2005
version:	1.0
Copyright © 2005, Cotillion Group, Inc. All rights reserved.

Pages

Monday, December 26, 2005

Saturday, December 17, 2005

AVL Module Users' Manual

Purpose

Types and Modality

Functionality Not Implemented

Traversal Options

Element selection

Functional Operations

Dismissal

:- module(avl).

Data Structure

Exported Predicates

Exported Operators

Appendix A: Implementation Choices

AVL Trees as a Functional Data Structure

Quintus Prolog's AVL Module

Salvador Fandiño's AVL Module

Direct Memory Access

Reflections

Appendix B: The "Missing" Predicates

Implementations of Traversal Predicates

Top-down traversal using concatenation

Top-down traversal using append/3

Element Selection Predicates

Functional Tree Transformations

Endnotes

Works Consulted

Copyrights

Wednesday, December 14, 2005

Aspect-Oriented Programming in Prolog

Abstract

A Whirl Interpreter

The Problem: Debugging

Debugging: the Solution

Round 1: Full and Fast Trace

Results (Round 1)

Round 2: State Representation

Results (Round 2)

Round 3: Only Debugging for Errors

Results (Round 3)

Round 4: Rooting out the real error

Results (Round 4)

Round 5: Hello, World!

Results (Round 5)

Debugging: Wrap Up

The Problem: Spacetime

From Debugging Aspects to Production-tuning Aspects

100 Bottles of Hefe Weizen of the Wall

Major Surgery...

... and a bit of patch-up

Taking Out takeout/3

Coda: takeout/3

Spacetime Summary

Conclusion

Endnotes

Works Consulted

Top-down traversal using `append/3`