AVL Module Users' Manual
Purpose
This document0
covers the data structure of and the protocol for
the module that implements AVL
trees.1
AVL trees are always-balanced binary trees,
and, as such, offer better random search-time characteristics than
lists. Depending on the implementation and need, AVL trees have
comparable access times to that of direct memory access, without
incurring the troublesome cost of using extralogical procedures
(see Appendix A for implementation
options). Quintus Prolog offers an AVL tree module (which is
called, module(avl)
, as
is this module) with several differences: it has no documentation (but
the source is comprehensible), it embeds the -1,0,+1 height
difference into nodes instead of the height of the node, and the
Quintus version is somewhat slower in its operations than this
module's. This module was developed by Salvador Fandiño on March
21, 2005 under the
GPL.
Types and Modality
Throughout this document one will see type
and
mode
declarations. The syntax and semantics of these
declarations should be familiar to the Prolog programmer, but for
precise meanings, please see the
Mercury programming
language
reference manual [Merc],
particularly
§ 3.2 (user-defined types)
for information regarding type
, and
§ 4 (Predicate and
function mode declarations) for a discussion of mode
and the semantics of modality.
Most of the tokens should have clear meanings, a brief (and imprecise) overview of the determinism categories is as follows:
Maximum number of solutions Can fail? 0 1 > 0 no erroneous det multi yes failure semidet nondet
This table is lifted directly from § 6.1 (Determinism categories), which goes into detail on this topic.
Functionality Not Implemented
This AVL module's protocol covers the commonly-used predicates when working with AVL trees. There is some functionality not implemented here, for the reason that this functionality is either an academic curiosity or is rarely used. We mention this functionality here and cover some implementations in Appendix B.
Traversal Options
This module has one nondeterministic approach to collecting
elements (which resolves to a minimum index to a maximum index
scan of the tree), and one predicate that converts the AVL tree to
a list (again, with a smallest-to-largest index ordering). There
are other ways to traverse the AVL tree: if one wishes to go
from the largest index to the smallest, one need not incur the
cost of reverse/2
(a predicate in O(N) time). This
alternate traversal protocol gives the user iteration options
over the AVL tree:
left_to_right_traversal/2
: pseudonym foravl_to_list/2
, min-to-max index traversal.right_to_left_traversal/2
: max-to-min index traversal.top_down_traversal/2
: the academic curiosity predicate, a breath-first traversal of the tree.
Element selection
Besides the avl_get/3
and the avl_gen/3
predicates, there are no options for the user to direct the
selection of elements. Of course, one could convert the AVL tree to
a list and then select any desired element using, e.g. the
operations from the list_utils
module or from the set_utils
module, but then one pays the penalty in space (creating another
data structure, a list, to hold the node) and in time (in the
transformation from the AVL tree structure to the list structure).
The following predicates would extend the module's functionality
without the expenses outlined above:
top/2
: given a non-empty AVL tree, returns the value of the top node. As an AVL tree is a recursive data structure, this predicate can be used to iterate over a tree, depth-first, nondeterministically.first/2
: given a non-empty AVL tree, returns the left-most node's value in depth that is not a terminal. Returns the value of the smallest key in the tree in O(logN) time.last/2
: given a non-empty AVL tree, returns the right-most node's value.
Functional Operations
As it is with lists, it may be necessary to perform a one-to-one
transformation of the elements of an AVL tree, or, to distill the
AVL tree into a single result (such as by, e.g., element
summation). The former operation is know in the functional
programming language community as the map
function;
the latter, fold
. Implementing these functions as
predicates would also require implementing the lambda calculus to
permit any kind of generality, or utility, of the resulting
predicate. Fortunately, as I have an implementation available,
both in the lambda module and in
the combinators module
(which is an accident resulting from my development of set-theory
predicates for use by Prolog in the
set_utils module), providing
this functionality devolves to a derivative exercise:
map_tree/3
: takes a lambda-term or phi-term (an anonymous functional) and transforms the input AVL tree to a transformed output AVL tree of the same size and shape.
Dismissal
Although the functionality described above may appear interesting
or useful (so much so that I've developed it all in my own AVL
module), I've only used one of the above predicates:
top/2
. And that one only for the very simple reason
that I have need of the currently-available value (the top node),
and do not wish to have my implementation code become emmeshed
with the AVL tree's implementation code (after all, a protocol
should encapsulate implementation details). Futhermore, this
survey was not comprehesive: I did not include the scorned (for
good reason) avl_delete/[3,4] predicate. Whenever this predicate
is mentioned in the literature, it is mentioned only in passing,
as an exercise for the reader, and I have had no need for this
predicate. In sum, the AVL module's protocol is written well:
it exposes necessary functionality, and avoids cluttering the
namespace with frivolous functionality.
:- module(avl).
Data Structure
An AVL Tree, as used by this module, is a binary type of the following structure...
:- type avl_tree ---> empty_avl_tree | non_empty_avl_tree.where:and::- type empty_avl_tree ---> t. :- type non_empty_avl_tree ---> t(key, value, avl_tree, avl_tree, height).:- type key == comparable. :- type value == any. :- type height == integer.Another type used in this document is the following:
:- type assoc_list ---> [] | [assoc|assoc_list].where::- type assoc ---> key - value.
An AVL tree may be created from a list using the
sorted_list_to_avl/2 call or
list_to_avl/2 call or
by transforming a nascent AVL tree (obtained from the
avl_empty/1
) with
calls to avl_put/4
. Elements
may be accessed or changed with the predicates that operate on
elements of AVL trees (avl_get/3
or avl_replace/4
)
or by translating the AVL tree back to an ordered list of the elements
(avl_to_list/2
).
Exported Predicates
The avl module exports the following predicates:
- avl_empty/1
- avl_to_list/2
- avl_put/[3,4]
- avl_replace/[3,4]
- avl_get/3
- avl_has/2
- sorted_list_to_avl/2
- list_to_avl/2
- avl_gen/3
- avl_dump/1
Exported Operators
The avl module does not define any operators.
avl_empty/1
Synopsis: | avl_empty(?AVLTree) | |||
---|---|---|---|---|
Arguments: |
|
|||
Description: | Gives the empty AVL tree. | |||
Backtracking: | :- mode avl_empty(out) is det. :- mode avl_empty(in) is semidet. | |||
Example: | This predicate is normally used to generate a seed AVL tree if items are being inserted non-deterministically (instead of via a list, for example). This example gives the runtime representation of the initial state of memory for a Whirl program: initialize_memory(memory(0, AVL)) :- avl_empty(AVL0), avl_put(AVL0, 0, value(0), AVL). |
avl_to_list/2
Synopsis: | avl_to_list(+AVLTree, ?List) | ||||||
---|---|---|---|---|---|---|---|
Arguments: |
| ||||||
Description: | Converts an AVL tree to a sorted list with Key-Value pairs. |
||||||
Backtracking: | :- mode avl_to_list(in, out) is det. :- mode avl_to_list(in, in) is semidet. | ||||||
See also: |
sorted_list_to_avl/2
list_to_avl/2 |
avl_put/[3, 4]
Synopsis: | avl_put(+In, +Key, -Out) avl_put(+In, +Key, +Value, -Out) | ||||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|
Arguments: |
| ||||||||||||
Description: | Inserts a Key/Value pair on an AVL tree. Fails if an element with the same Key already exists on the tree (use avl_replace/4
instead). | ||||||||||||
Backtracking: | :- mode avl_put(in, in, out) is semidet. :- mode avl_put(in, in, in, out) is semidet. | ||||||||||||
Example: | This example demonstrates a lazy memory structure for a Whirl program -- cells are added when accessed, and not before: get_active_ring(ops, Rings, ring(ops, _, _, value(X))), floor(X, Y), NewIndex is Y + Index, (avl_has(AVL0, NewIndex) -> AVL1 = AVL0 ; avl_put(AVL0, NewIndex, value(0), AVL1)). I currently have not needed to use | ||||||||||||
See also: | compare/3
avl_replace/4 |
avl_replace/[3,4]
Synopsis: | avl_replace(+In, +Key, -Out) avl_replace(+In, +Key, +Value, -Out) | |||||||||||||||||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
Arguments: |
| |||||||||||||||||||||||||
Description: | Inserts a Key/Value pair on an AVL tree. If an element with the same Key already exists on the tree, it is replaced.
| |||||||||||||||||||||||||
Backtracking: | :- mode avl_replace(in, in, out) is det. :- mode avl_replace(in, in, in, out) is det. | |||||||||||||||||||||||||
Examples: | The following example overwrites the current memory cell with the value in the accumulator in a Whirl program: get_active_ring(Active, Rings, ring(_, _, _, Value)), avl_replace(AVL0, Index, Value, AVL1). The next example does the same for a user-entered integral value, given that the accumulator is null: get_active_ring(ops, Rings, ring(ops, _, _, value(0))), !, read_term(N, []), (integer(N) -> true; raise_exception(integral_expected(received(N)))), avl_replace(AVL0, Index, value(N), AVL1). As is for | |||||||||||||||||||||||||
See also: | compare/3
avl_put/[3,4] |
avl_get/3
Synopsis: | avl_get(+AVLTree, +Key, ?Value) | |||||||||
---|---|---|---|---|---|---|---|---|---|---|
Arguments: |
| |||||||||
Description: | Retrieves Value associated to Key. Predicate fails if no element with Key is found on the tree.
| |||||||||
Backtracking: | :- mode avl_get(in, in, in) is semidet. :- mode avl_get(in, in, out) is semidet. | |||||||||
Example: | Of course, AVL trees shine with regard to "random" element access, for: avl_get(MemAVL, Index, value(Datum)) has O(logN) performance characteristics, whereas:
obtains Datum in O(N) time. | |||||||||
See also: | compare/3
avl_has/2 avl_gen/3 |
avl_has/2
Synopsis: | avl_has(+AVLTree, +Key) | ||||||
---|---|---|---|---|---|---|---|
Arguments: |
| ||||||
Description: | Checks whether the AVLTree contains an element with the given Key. | ||||||
Backtracking: | :- mode avl_has(in, in) is semidet. | ||||||
Example: | (see the example for avl_put/4) | ||||||
See also: | compare/3
avl_get/3 |
sorted_list_to_avl/2
Synopsis: | sorted_list_to_avl(+List, -AVLTree) | ||||||
---|---|---|---|---|---|---|---|
Arguments: |
| ||||||
Description: | Converts a sorted list of key/value pairs without duplicate keys to an AVL tree in O(N), or linear, time. | ||||||
Backtracking: | :- mode sorted_list_to_avl(in, out) is det. | ||||||
See also: | avl_to_list/2
list_to_avl/2 |
list_to_avl/2
Synopsis: | list_to_avl(+List, -AVLTree) | ||||||
---|---|---|---|---|---|---|---|
Arguments: |
| ||||||
Description: | Converts a list of key/value pairs to an AVL tree in O(N logN) time. As duplicates are encountered in List, the elements are replaced in AVLTree. | ||||||
Backtracking: | :- mode list_to_avl(in, out) is det. | ||||||
See also: | avl_to_list/2
sorted_list_to_avl/2 |
avl_gen/3
Synopsis: | avl_gen(+AVLTree, ?Key, ?Value) | |||||||||
---|---|---|---|---|---|---|---|---|---|---|
Arguments: |
| |||||||||
Description: | Enumerates via backtracking the elements on the AVL tree.
| |||||||||
Backtracking: | :- mode avl_gen(in, in, in) is semidet. :- mode avl_gen(in, in, out) is semidet. :- mode avl_gen(in, out, in) is nondet. :- mode avl_gen(in, out, out) is multi. | |||||||||
See also: | compare/3
avl_get/3
avl_has/2 |
avl_dump/1
Synopsis: | avl_dump(+AVLTree) | |||
---|---|---|---|---|
Argument: |
| |||
Description: | Prints an human friendly representation of the tree to the current stream.
| |||
Backtracking: | :- mode avl_dump(in) is det. |
Appendix A: Implementation Choices
Different algorithm books give different approaches to implementing AVL trees. We will examine several different approaches in turn.
AVL Trees as a Functional Data Structure
[RL1999] shows the now-standard single/double rotation one-step rebalancing algorithm, but (re)computes the height of every node on demand. This leads to gains over straight list processing, but the gains are dampened by the continuous height recomputation on each rebalance. [B2001] eagerly computes the heights of the nodes, so that the height information is simply a lookup.
I first implemented the AVL tree type in Prolog from [RL1999] and then incorporated immediate height information into the nodes of the tree. This new implementation of the AVL module resulted in a 30-fold gain in execution speed. Its profile characteristics when run through a count test are as follows:
===================================================================== Total time: 1.18 seconds ===================================================================== Predicate Box Entries = Calls+Redos Time ===================================================================== $garbage_collect/1 9 = 9+0 46.0% avl_tree:splice/4 6,000 = 6,000+0 9.5% is/2 89,861 = 89,861+0 7.9% avl_tree:height/2 149,686 = 149,686+0 6.3% avl_tree:simple_compare/4 267,300 = 139,644+127,656 6.3% =/2 235,462 = 235,462+0 4.8% @</2 127,656 = 127,656+0 3.2% avl_tree:grapht/4 23,969 = 6,000+17,969 3.2% count/5 2 = 1+1 1.6% @>/2 5,988 = 5,988+0 1.6% avl_tree:get_branch/3 69,822 = 69,822+0 1.6% avl_tree:replace_branches/3 5,977 = 5,977+0 1.6% avl_tree:balance_in_direction/5 17,969 = 17,969+0 1.6% avl_tree:add_element/3 6,001 = 6,001+0 1.6% avl_tree:simply_replace_branch/4 45,876 = 45,876+0 1.6% =</2 6,002 = 6,002+0 0.0% </2 53,891 = 53,891+0 0.0% avl_tree:rotate/3 5,988 = 5,988+0 0.0% avl_tree:replace_in_direction/5 6,000 = 6,000+0 0.0% avl_tree:add_element_aux/5 69,822 = 69,822+0 0.0% avl_tree:replace_branch_recomputing_height/423,946 = 23,946+0 0.0% avl_tree:then_rotate/4 5,988 = 5,988+0 0.0% avl_tree:incr_max/3 71,844 = 35,922+35,922 0.0% avl_tree:simple_balance/6 17,969 = 17,969+0 0.0% avl_tree:replace_element/4 63,834 = 63,834+0 0.0%Table 1: the author's module(avl_tree); 6000 nodes
This profile was obtained by iteratively inserting and modifying 6000 key/value pairs into a tree. At some number over 6000 elements SWI Prolog runs out of global stack.
Quintus Prolog's AVL Module
I was confident that this approach was the best implementation, but then I
observed Quintus Prolog has
an (undocumented) AVL module. Its approach is quite different, embedding
the rebalancing algorithm directly into the predicate that does node
insertion or transformation, and using the compare/3
test
to obtain the comparator atom to dispatch to the appropriate rebalancing
routine (instead of using the @</2
comparator against
indices to determine the branch to follow, as I implemented). Also, it
uses a height difference algorithm: the height information in a
node of a balanced tree is encoded as +1, 0 or -1. After reading
the source code, I built the same
test for this module. The performance gain for the
same-sized test was impressive...
===================================================================== Total time: 0.46 seconds ===================================================================== Predicate Box Entries = Calls+Redos Time ===================================================================== avl:avl_change/10 6,000 = 6,000+0 25.0% avl:avl_store/10 69,834 = 6,000+63,834 25.0% is/2 169,621 = 169,621+0 25.0% =:=/2 69,822 = 69,822+0 10.0% $garbage_collect/1 17 = 17+0 5.0% qui_count/5 2 = 1+1 0.0% avl:avl_store/5 75,823 = 75,823+0 0.0% avl:avl_store/4 6,001 = 6,001+0 0.0% avl:avl_change/5 6,000 = 6,000+0 0.0% =</2 6,002 = 6,002+0 0.0% =/2 81,810 = 81,810+0 0.0% compare/3 133,656 = 133,656+0 0.0% >=/2 5,988 = 5,988+0 0.0%Table 2: Richard O'Keefe's module(avl); 6000 nodes.
Module copyright © 1989, Quintus Computer Systems, Inc.
... but that was not the only gain: the Quintus version could work with many more nodes before it ran into global stack issues -- I was able to test up to 120,000 nodes:
===================================================================== Total time: 38.39 seconds ===================================================================== Predicate Box Entries = Calls+Redos Time ===================================================================== $garbage_collect/1 462 = 462+0 73.8% is/2 4,417,861 =4,417,861+0 7.0% avl:avl_store/10 1,908,962 = 120,000+1,788,962 6.3% avl:avl_change/10 120,000 = 120,000+0 5.1% =:=/2 1,908,946 =1,908,946+0 1.4% compare/3 3,697,908 =3,697,908+0 1.0% =/2 2,148,930 =2,148,930+0 0.9% avl:avl_change/5 120,000 = 120,000+0 0.5% qui_count/5 2 = 1+1 0.4% avl:avl_store/5 2,028,947 =2,028,947+0 0.3% =</2 120,002 = 120,002+0 0.1% avl:avl_store/4 120,001 = 120,001+0 0.0% >=/2 119,984 = 119,984+0 0.0%Table 3: Richard O'Keefe's module(avl); 120,000 nodes.
Salvador Fandiño's AVL Module
The Quintus module shows excellent performance as compared to lazy AVL trees, but there is a cost of computing relative height difference for each node. Salvador Fandiño reverted to using absolute node heights, saving that computation times. It seems to pay off, for the 6000 node test is an improvement over the other AVL tree implementations:
===================================================================== Total time: 0.41 seconds ===================================================================== Predicate Box Entries = Calls+Redos Time ===================================================================== =/2 267,300 = 267,300+0 23.8% avl:avl_put/9 57,853 = 6,000+51,853 14.3% $garbage_collect/1 13 = 13+0 14.3% avl:avl_balance_right/5 35,938 = 17,969+17,969 9.5% is/2 47,927 = 47,927+0 9.5% avl:avl_replace/9 63,834 = 6,000+57,834 4.8% avl:avl_cmp_depth/3 20,957 = 20,957+0 4.8% avl:avl_put/4 75,823 = 75,823+0 4.8% ther_count/5 2 = 1+1 0.0% avl:avl_replace/4 63,834 = 63,834+0 0.0% =</2 6,002 = 6,002+0 0.0% compare/3 133,656 = 133,656+0 0.0%Table 4: Salvador Fandiño's module(avl); 6000 nodes.
Module copyright © 2005, Salvador Fandiño under the GPL
But, of course, comparing execution speeds of subsecond systems is a pointless exercise. This profile does show fewer garbage collections and fewer redos on avl_store/avl_put, hinting at a more efficient algorithm in this implementation. This hint is bourne out on the larger test:
===================================================================== Total time: 28.71 seconds ===================================================================== Predicate Box Entries = Calls+Redos Time ===================================================================== $garbage_collect/1 305 = 305+0 70.6% =/2 7,395,800 =7,395,800+0 5.6% avl:avl_put/9 1,668,985 = 120,000+1,548,985 4.7% avl:avl_replace/9 1,788,962 = 120,000+1,668,962 4.4% avl:avl_balance_right/5 719,922 = 359,961+359,961 1.8% is/2 959,907 = 959,907+0 1.8% compare/3 3,697,908 =3,697,908+0 1.6% ther_count/5 2 = 1+1 0.8% avl:avl_cmp_depth/3 419,945 = 419,945+0 0.3% avl:avl_replace/4 1,788,962 =1,788,962+0 0.3% avl:avl_put/4 2,028,947 =2,028,947+0 0.1% =</2 120,002 = 120,002+0 0.1%Table 5: Salvador Fandiño's module(avl); 120,000 nodes.
The above test shows this module fully 10 seconds faster, or a 25% improvement over the Quintus implementation.
Direct Memory Access
So, Salvador Fandiño's AVL module performs best in the
simple tests given to the various AVL implementations. How does
it compare to direct memory access? By
using assert/1
and retract/1
, is there
a significant gain in runtime execution speed?
It turns out that direct memory access has a clear advantage over the AVL implementations, for a 6000-element system, it is the fastest. Again, for sub-second systems, comparisons are rather silly, but for form's sake the results from running the count test are presented here...
===================================================================== Total time: 0.06 seconds ===================================================================== Predicate Box Entries = Calls+Redos Time ===================================================================== assert/1 12,001 = 12,001+0 50.0% retract/1 6,000 = 6,000+0 50.0% $garbage_collect/1 3 = 3+0 0.0% is/2 18,001 = 18,001+0 0.0% =</2 6,002 = 6,002+0 0.0% memory/2 6,000 = 6,000+0 0.0% mem_count/5 2 = 1+1 0.0%Table 6: Memory as dynamic facts; 6000 nodes
The next level of comparison is at 120,000 elements, but, in this case, SWI Prolog balks at more than 80,000 dynamic facts. Even at this "limitation", the execution speed is unmatched with 80,000 nodes:
===================================================================== Total time: 0.88 seconds ===================================================================== Predicate Box Entries = Calls+Redos Time ===================================================================== retract/1 80,000 = 80,000+0 35.0% mem_count/5 2 = 1+1 30.0% $garbage_collect/1 45 = 45+0 15.0% is/2 240,001 = 240,001+0 15.0% assert/1 160,001 = 160,001+0 5.0% =</2 80,002 = 80,002+0 0.0% memory/2 80,000 = 80,000+0 0.0%Table 7: Memory as dynamic facts; 80,000 nodes
The profiles of Quintus' and Salva's AVL modules are pitiful in comparison to the above results...
===================================================================== Total time: 13.96 seconds ===================================================================== Predicate Box Entries = Calls+Redos Time ===================================================================== $garbage_collect/1 216 = 216+0 56.0% avl:avl_store/10 1,228,962 = 80,000+1,148,962 12.7% is/2 2,857,861 =2,857,861+0 11.6% avl:avl_change/10 80,000 = 80,000+0 7.1% qui_count/5 2 = 1+1 1.9% compare/3 2,377,908 =2,377,908+0 1.4% =/2 1,388,930 =1,388,930+0 1.3% =:=/2 1,228,946 =1,228,946+0 1.0% avl:avl_store/4 80,001 = 80,001+0 0.4% avl:avl_change/5 80,000 = 80,000+0 0.4% >=/2 79,984 = 79,984+0 0.3% avl:avl_store/5 1,308,947 =1,308,947+0 0.1% =</2 80,002 = 80,002+0 0.0%Table 8: Richard O'Keefe's AVL module; 80,000 nodes===================================================================== Total time: 10.34 seconds ===================================================================== Predicate Box Entries = Calls+Redos Time ===================================================================== $garbage_collect/1 133 = 133+0 44.7% =/2 4,755,800 =4,755,800+0 11.3% avl:avl_replace/9 1,148,962 = 80,000+1,068,962 8.9% avl:avl_put/9 1,068,983 = 80,000+988,983 6.4% is/2 639,911 = 639,911+0 4.5% compare/3 2,377,908 =2,377,908+0 4.4% avl:avl_balance_right/5 479,926 = 239,963+239,963 2.9% salva_count/5 2 = 1+1 0.9% avl:avl_replace/4 1,148,962 =1,148,962+0 0.9% avl:avl_cmp_depth/3 279,947 = 279,947+0 0.5% =</2 80,002 = 80,002+0 0.4% avl:avl_put/4 1,308,947 =1,308,947+0 0.4%Table 9: Salvador Fandiño's AVL module; 80,000 nodes
...but these results are not actually showing these modules'
protocols in the best light. Both modules have predicates that
create AVL trees from sorted lists; since we know in advance the
elements to be manipulated, we can call these predicates
instead. But, here again, we run into a memory
overflow issue with SWI Prolog: lists containing over 20,000 elements
overflow the local stack.2
Throttling back the number of elements to 20,000
nodes, we end up comparing the AVL modules' replacement
predicates against SWI Prolog's retract/1
and
assert/1
, with just one more change: as we do not
test the sorted-lists-to-AVL-trees predicates, we should not test
the initial assertations; so we now make the memory predicate an
external file:
:- dynamic memory/2. memory(0, value(0)). memory(1, value(1)). memory(2, value(2)). ... memory(20000, value(20000)).
With this new framework, the count test profile is unsurprisingly still sub-second...
===================================================================== Total time: 0.15 seconds ===================================================================== Predicate Box Entries = Calls+Redos Time ===================================================================== retract/1 20,000 = 20,000+0 42.9% $garbage_collect/1 1 = 1+0 28.6% is/2 20,000 = 20,000+0 21.4% mem_ord/3 1 = 1+0 7.1% assert/1 20,000 = 20,000+0 0.0%Table 10: Direct memory access; 20,000 nodes
... and the test for the Quintus AVL module still trails behind:
===================================================================== Total time: 0.69 seconds ===================================================================== Predicate Box Entries = Calls+Redos Time ===================================================================== $garbage_collect/1 4 = 4+0 50.0% avl:avl_change/10 20,000 = 20,000+0 36.0% compare/3 267,248 = 267,248+0 6.0% qui_ord/3 1 = 1+0 6.0% avl:avl_change/5 20,000 = 20,000+0 2.0% is/2 20,000 = 20,000+0 0.0%Table 11: Quintus' AVL module; 20,000 nodes
... as well as the test for Salva's AVL module:
===================================================================== Total time: 0.77 seconds ===================================================================== Predicate Box Entries = Calls+Redos Time ===================================================================== $garbage_collect/1 3 = 3+0 31.0% avl:avl_replace/9 267,248 = 20,000+247,248 25.4% =/2 494,496 = 494,496+0 14.1% compare/3 267,248 = 267,248+0 5.6% avl:avl_replace/4 267,248 = 267,248+0 1.4% is/2 20,000 = 20,000+0 1.4% salva_ord/3 1 = 1+0 1.4%Table 12: Salva's AVL Module; 20,000 nodes
Note that in these sets of tests the Quintus AVL module performs better than Salva's, owing to the better implementation of the avl_change/5 predicate. Perhaps Salva or an enterprising reader has a way to improve the avl_replace/4 implementation?
Reflections
We've demonstrated that of the AVL implementations Salvador
Fandiño's consistently performs better. We have also
observed that for any sized data sets direct memory access
using assert/1
and retract/1
do even
better. Do I then recommend direct memory access over storing
elements in an AVL tree? For almost all
cases,3 the
answer is no. In my experience, using the former
reduces programs to intractable
messes.4
Furthermore, the performance gain, for most
cases, is insignificant.5
Using AVL trees give a significant gain over list
processing, and incurs none of the costs of using extralogical
features (such as assert/1
and retract/1
) required by direct memory access.
Appendix B: The "Missing" Predicates
Implementations of Traversal Predicates
left_to_right_traversal/2
This predicate converts this AVL tree to a sorted list with the "smallest" element (i.e. the leftmost node in the tree) at the front of the list. Note that this implementation uses DCGs; don't you agree that it's prettier this way?
left_to_right_traversal(AVL, List) :- left_to_right_traversal(AVL, List, []). left_to_right_traversal(t(Key, Value, LeftTree, RightTree, _)) --> { left_to_right_traversal(LeftTree, Left, []), left_to_right_traversal(RightTree, Right, []) }, Left, [Key-Value|Right]. left_to_right_traversal(t) --> [].
right_to_left_traversal/2
This predicate starts with the maximum element and descends to the minimum one.
right_to_left_traversal(AVL, List) :- right_to_left_traversal(AVL, List, []). right_to_left_traversal(t(Key, Value, L, R, _)) --> { right_to_left_traversal(R, Right, []), right_to_left_traversal(L, Left, []) }, Right, [Key-Value|Left]. right_to_left_traversal(t) --> [].
top_down_traversal/2
The implementation strategy here is to give the top node, then
recurse over the list of top nodes of this node's children, etc.
There are two approaches to go about the induction: append/3
the new nodes to the list of nodes to traverse or use difference
lists and concatenation. The former approach is straightforward,
but incurs a recursive linear cost (which can grow exponentially);
the latter, more complex, but has only a straight linear cost. We
present both implementations (choosing the difference list
implementation in the interface predicate).
top_down_traversal(AVL, List) :- top_down_traversal_concatenating([AVL|T] - T, List, []).
Top-down traversal using concatenation
This implementation shows the shape of the tree by putting the
top-most element first, and continuing in a breath-first,
left-to-right traversal. This implementation uses concatenation
(concat/3
is a predicate with constant-time access)
to update the seed list of branches. Note that the
atom '_|_'
represents the functional
term ⊥
[pronounced: "bottom"] which is the
functional programming community's equivalent
to fail/0
: being able to unify
with ⊥
is a contradiction, showing that unified
variable is actually free; this test protects the system from
falling into infinite regression when unifying the last-element
pointer with a term including that same variable.
top_down_traversal_concatenating(List - List) --> { \+ \+ List = '_|_' }. top_down_traversal_concatenating([t(Key, Value, t, t, _)|T] - Q) --> [Key-Value], top_down_traversal_concatenating(T - Q). top_down_traversal_concatenating([t(Key, Value, t, Right, _)|T] - Q) --> { concat(T - Q, [Right|X] - X, NewRight) }, [Key-Value], top_down_traversal_concatenating(NewRight). top_down_traversal_concatenating([t(Key, Value, Left, t, _)|T] - Q) --> { concat(T - Q, [Left|X] - X, NewLeft) }, [Key-Value], top_down_traversal_concatenating(NewLeft). top_down_traversal_concatenating([t(Key, Value, Left, Right, _)|T] - Q) --> { concat(T - Q, [Left, Right|X] - X, Down1) }, [Key-Value], top_down_traversal_concatenating(Down1). concat(List - List) --> { \+ \+ List = '_|_' }. concat(A1 - Z1, Z1 - Z2, A1 - Z2).
Top-down traversal using append/3
This approach is a simpler implementation but updates the
branches using append/3
which incurs a linear-time
cost.
top_down_traversal_appending([t(Key, Value, t, t, _)|T]) --> [Key-Value], top_down_traversal_appending(T). top_down_traversal_appending([t(Key, Value, t, Right, _)|T]) --> { append(T, [Right], NewRight) }, [Key-Value], top_down_traversal_appending(NewRight). top_down_traversal_appending([t(Key, Value, Left, t, _)|T]) --> { append(T, [Left], NewLeft) }, [Key-Value], top_down_traversal_appending(NewLeft). top_down_traversal_appending([t(Key, Value, Left, Right, _)|T]) --> { append(T, [Left, Right], Down1) }, [Key-Value], top_down_traversal_appending(Down1). top_down_traversal_appending([]) --> [].
Element Selection Predicates
top/2
This fact simply gives the Key-Value pair of the current node.
top(t(Key, Value, _, _, _), Key - Value).
first/2
We "left-cdr" down the tree, retaining the most-recently-visited node until we reach a terminal, returning the last node.
first(t(Key, Value, Left, _, _), Ans) :- first(Left, Key-Value, Ans). first(t(Key, Value, Left, _, _), _, Ans) :- first(Left, Key-Value, Ans). first(t) --> [].
last/2
The same work as first/2
, except we "right-cdr" down
the tree.
last(t(Key, Value, _, Right, _), Ans) :- last(Right, Key-Value, Ans). last(t(Key, Value, _, Right, _), _, Ans) :- last(Right, Key-Value, Ans). last(t) --> [].
Functional Tree Transformations
map_tree/3
This predicate modifies the Values of the tree based on the phi-transformation. For example, to copy the tree exactly, one would execute the following query:
map_tree(i, In, Out)
Either of the following queries increments each of the tree's values:
map_tree(lambda(X, X + 1), In, Out)
ormap_tree("`+1", In, Out)
The implementation of map_tree/3
depends on the
combinators module and
simply traverses the tree and copies the transformed elements:
map_tree(Fn) --> { phi_equivalent(Fn, Phi) }, map_aux(Phi). map_aux(Phi, t(Key, Value, Left, Right, H), t(Key, NewValue, NewLeft, NewRight, H)) :- compose(Phi, Value, NewValue), map_aux(Phi, Left, NewLeft), map_aux(Phi, Right, NewRight). map_aux(_Phi, t, t).
Endnotes
0 | This document, Salvador Fandiño's AVL module and supporting (testing) files are archived and available at prolog-avl-20051225.tgz. |
1 | Any book on algorithms and data structures provides introductory material on AVL trees. For example, [RL1999], § 5.9 or [B2001], § 10.2. |
2 | I am aware that SWI Prolog allows the user to tune the stack sizes on invocation. I also believe that it is now appropriate to revisit the default values as modern computers come ready to handle much heavier loads. Or, perhaps it is the case that I am always tackling larger-than-average problems. At any rate, I find it a chore, as the user, always being required to inform the system what sizes its stacks should be. |
3 | The one exception that comes to mind for a
valid use of |
4 | Intractable mess (noun):
the strong sense of befuddlement that overcomes one when staring
at a 400,000-line expert system where every third line is either
an |
5 | Firstly, in systems that I work with, it is
not uncommon to exceed the 80,000 transactions that would cause a
stack overflow if using |
Works Consulted
[B2001] | Prolog Programming for Artificial Intelligence, 3rd ed., Ivan Bratko, Addison-Wesley, Reading, Massachusetts, 2001. |
[Merc] | Mercury Language Reference Manual, Version 0.12.1, Fergus Henderson, Thomas Conway, Zoltan Somogyi, David Jeffery, Peter Schachte, Simon Taylor, Chris Speirs, Tyson Dowd, Ralph Becket, University of Melbourne (Australia), http://www.cs.mu.oz.au/research/mercury. No date of publication given. |
[RL1999] | Algorithms: A Functional Programming Approach, 2nd ed., Fethi Rabhi, Guy Lapalme, Addison-Wesley, Reading, Massachusetts, 1999. |
Copyrights
This manual copyright © 2005, Cotillion Group, Inc.
under the GNU Free
Document License, version 1.2
avl.pl copyright © 2005,
Salvador Fandiño under the
GPL
Quintus
Prolog copyright © 2004, SICS AB. All rights reserved.
Quintus module(avl)
copyright © 1989, Quintus Computer Systems, Inc. All rights
reserved.
SWI
Prolog copyright © 1987 under the
LGPL
Documents referenced (linked) by this document
maintain their own copyrights.
Author: | Douglas M. Auclair (dauclair at hotmail dot com) |
Created: | December 17, 2005 |
Last Modified: | December 25, 2005; MERRY CHRISTMAS! |
No comments:
Post a Comment