Tuesday, November 29, 2005

Whirl Interpreter

A Whirl Interpreter In Prolog

A Whirlwind Introduction

We present a
Whirl
interpreter. The language,
being [sic:] Turning-complete, presents delightful challenges to
the programmer: anything can be programmed in Whirl, the
real work is to go about finding how it is at all
possible to do this; much like coding is in Java, but, in the
case of Whirl, so much more rewarding.



Whirl, like BF,
is composed of a very small instruction (and command) set that
may operate over a relatively large mutable data array. Unlike
BF, which has instructions that have fixed (static) meaning, the
meaning of Whirl's pair of instructions can only be determined
dynamically based on two layers of state
resolution
.1
Deciphering Whirl program meaning is called "fun".



Given those issues, writing an interpreter for this language may
appear to be a difficult undertaking. This is not the case: any
interpreter can be decomposed into a series of manageable tasks,
and Whirl, although differing in style and substance from most
programming language (like, for example, Prolog, hmmmm...), is
Turing-equivalent,2
so follows similar patterns of interpretation.
These patterns are as follows:




  1. Initialization;

  2. Program scan; and

  3. Interpretation, which:

    1. Fetches the next instruction,
      and then

    2. Resolves and executes the command
    3. Repeating interpretation for any
      remaining instructions





In fact, a complete and correct interpreter for Whirl can be
written in an afternoon. We present this very interpreter here.



Whirl Program Initialization



Whirl can be viewed as a state-heavy language, so this language
requires more than the usual amount house-keeping variables, each
of which must be instantiated to an initial state. These
variables fall into the following categories:



Ring Variables



Each ring has its own commands and states, and the program
carries a variable for each of the two rings. The ring structure
is as follows:



ring(Type, Commands, Spin, Accumulator)


Each ring is initialized to the following states:





























Ring Variable
Ops Math
Type ops math
Commands
[









0- noop,1- exit, 2- one,
3- zero,4- load, 5- store,
6- padd,7- dadd, 8- logic,
9- if,10- intIO, 11- ascIO
]

[





0 - noop,1 - load,2 - store,
3 - add,4 - mult,5 - div,
6 - zero,7 - <,8 - >,
9 - =,10 - not,11 - neg
]
Spin clockwise spin clockwise spin
Accumulator value(0) value(0)



The Prolog code to do the above initialization is
straightforward. First we have the op/3 declaration for the
spin/1 syntax:



:- op(200, yf, spin).


Then the initialization code for each of the rings follows:




initialize_ops_ring(ring(ops, Commands, clockwise spin, value(0))) :-
Commands = [0-noop, 1-exit, 2-one, 3-zero, 4-load, 5-store,
6-padd, 7-dadd, 8-logic, 9-if, 10-intIO, 11-ascIO].

initialize_math_ring(ring(math, Commands, clockwise spin, value(0))) :-
Commands = [0-noop, 1-load, 2-store, 3-add, 4-mult, 5-div,
6-zero, 7-'<', 8-'>', 9-'=', 10-not, 11-neg].



The two rings are combined into an unordered list, and the
program starts with the "ops" ring being selected as active.



Memory Variable



A Whirl Program has access to an infinite amount of memory; we
implement this in the interpreter by defining memory lazily:
a cell is created only when needed, with the exception of the
first memory cell:



initialize_memory(memory([cell(0, value(0))])).


Program Instructions
Variable



We do a modicum of parsing to extract the instructions of a
program from the copious comments accompanying nearly every Whirl
program (we discuss the program scan
below). In doing the one-pass parse, we also decorate the
program instructions with an index to assist the interpreter in
its job of executing the instructions in order. The end result
is a term that contains the index program instructions with the
instruction at the head being the one to be interpreted. The
structure of this term is as follows (using the program
two-commented.wr as an example)...



program([0-0, 1-0, 2-0, 3-1, 4-1|Rest])


... where 'Rest' is the rest of the program's instructions.



History Variable



Command execution is constrained by several factors, and what
effects immediately preceeded this instruction. The history
variable captures this information, and has the following
structure...



history(ExecutionState, last_instruction(Inst))


... where 'ExecutionState' is whether the last selected command
was executed (it has one of the following values: executed or
quescient) and 'Inst' is "0" or "1". This term is initialized to
the following on program start-up:



history(quescient, last_instruction(1))


Program State Variable



All the above variables are encapsulated into a central state
repository, represented by a program state variable which is of
the following form:



program(ActiveRingType,
History,
Rings,
Memory,
Instructions)



The only variable we haven't yet encountered is the
'ActiveRingType': it is the state variable that indicates which
ring is currently selected and its ground states are either 'ops'
or 'math'.



The program state variable is initialized by the following
block:



  initialize_ops_ring(Ops),
initialize_math_ring(Math),
initialize_memory(Memory),
initialize_program(ProgramSourceString, 0, Instructions, []),
History = history(quescient, last_instruction(1)),
Program = program(ops, History, [Ops, Math], Memory, program(Instructions)).



Reflections of Data Structure
Design Choices



A common thread throughout the development of the program's
runtime structure is the use of (singly-linked) lists to represent
groups. This design choice is natural: lists are heavily
emphasized in storing and retrieving data in Prolog programs.
They also give an additional layer of abstraction, for example,
several commands operate on either the ops or the math ring, and
the code can be written generically, retrieving the desired ring
non-deterministically from the (unordered) rings variable. Using
lists as the default data structure allowed the rapid development
of this interpreter.



Using lists may demand a price, and the case of this interpreter,
there is a price to be paid in the runtime: accessing the nth
element occurs in linear time. In many cases, the system already
knows which element it must access --
constant-time
access
3 for
these situations would speed up the runtime execution of Whirl
programs while keeping the memory footprint at a more manageable
size (a list is modified by inserting the changed element to a
fresh copy). Of course, changing something as fundamental as the
structure of the program's runtime would demand comprehensive
changes to the interpreter, so one must balance runtime speed
gains against introducing corresponding complexity.



Whirl Program Scan



We parse in the program source string as a set of indexed tokens
(to avoid the costs of 1. ingesting commented strings and
2. repositioning the program index for jump commands). As with
any parsing task, DCGs (definite clause
grammars
) reduce this task to triviality:



initialize_program([Op|Codes], Index) -->
% Add instruction and index to parse tree and repeat (ignore comments)
({ Code is Op - 48,
Code <- [0, 1] } ->
[Index - Code],
{ NewIndex is Index + 1 }
;
{ NewIndex = Index }),
initialize_program(Codes, NewIndex).





Implementation Note:

The op "<-" is my ASCII equivalent for ε
which is used in set theory to indicate membership, so the
following code --


Code <- [0, 1]

-- tests that the character being examined is a Whirl
instruction. It must, of course, be declared and defined before
we use it:



:- op(200, xfy, '<-').

takeout(H, [H|T], T).
takeout(Elt, [H|T], [H|R]) :-
takeout(Elt, T, R).

Elt <- List :- takeout(Elt, List, _).


The predicate takeout/3 is of general use and will be
used by other parts of the interpreter, as well.






initialize_program([], _) --> [-1 - error].
% throw an error if we fall off the beginning of the program


Implementation Note:

We have the second
clause
(the empty string as represented by the symbol
'[]') because of an implementation detail of the
interpreter: it automatically increments the program index after
any command, even a padd command,
so that command compensates by subtracting one from the
relative jump, possibly putting the program index before the
beginning of the program (and the autoincrement adjusts the index
to the beginning).



Whirl Program Interpretation



Now that we have completed the preliminaries, we turn to
interpreting the program under the
formal language definition. We
divide the task of interpretation into three areas:




  1. instruction fetch,

  2. Command resolution and execution, and

  3. loop for next instruction
    (a.k.a. "lather, rinse, repeat")



In code the above algorithm translates
directly
4
to the following...



interpret -->
instruction(X),
execute(X),
interpret_next_instruction.


... with the interpret/2 predicate being called in the following
manner:




interpret(Program, _)


Fetch Instruction



Thanks to the structure as initialized by the
instructions state
variable
, fetching the current instruction reduces to the
simple rule:



instruction(Bit, P, P) :-
% grabs the next instruction of the program IR
P = program(_, _, _, _, program([_ - Bit|_])).


Command Execution



Actually interpreting the instruction fetched is not so simple a
matter. The instruction must first be
resolved to a command and
then that command executed.



Command Resolution



First things first -- the instruction must be resolved to the
appropriate command. Because we've done the work of declaring
the states and their relationships, command resolution is not a
complicated task.



Resolution for the "1" Instruction


For the "1" instruction, the command resolves
simply to the active ring rotation:



execute(1) -->
rotate,
history_is(quescient, 1).


The supporting predicates for the "1" instruction are as one
would expect and available for review in the
interpreter
sources
.5



Resolution for the "0" Instruction


As for the "0" instruction, it always reverses the spin of the
currently active ring:



execute(0) -->
reverse_spin,
execute_command.


Simple enough; but now comes the task of resolving this
instruction to its resulting command. Command resolution is
dependent firstly on the command history. The history must be
that a command was not executed at the last instruction, which
also must have been "0" in order to execute a command with this
"0" instruction:



execute_command -->
% Given the last instruction was 0 and did nothing, do all the below ...
history(quescient, 0),
!,
get_command(Cmd),
command(Cmd),
switch_wheel,
history_is(executed, 0).


In all other cases, we simply update the history state variable:



execute_command --> history_is(quescient, 0).



So, given that the history is in line with command execution, we
must resolve "0" instruction to the appropriate command. This
devolves into a state table lookup, given that we have already
established and maintained the state variables:



get_command(Command, P, P) :-
% Convert this token to an equivalent command by looking it up on
% the active wheel
P = program(Active, _, Rings, _, _),
ring(Active, [_-Command|_], _, _) <- Rings.


The above code simply consults the first command (the element at
the head of the command list) on the active ring.



Command Execution



Executing a command, for the most part now, is simply mapping the
command token to an executable block. Since Prolog is a language
of relations, this mapping is very easy to implement. There are
the four special commands on the ops ring (logic, if, intIO, and
ascIO) than have one additional layer of state dependency, but
these are binary conditions and are handled as alternate clauses
for each of the four command tokens. The commands fall into three
categories: common (or shared)
commands
, the math-specific
commands
(which are arithmetic and comparative in nature),
and the ops-specific commands.



Common Commands


There are four commands common to both the ops and the math
rings: noop,
zero,
load, and
store.



Implementating the noop command


The noop command is simple enough -- do nothing:



command(noop) --> [].


Implementating the zero command


Next is the zero command, which sets the current ring's
accumulator to 0:



command(zero, program(Active, Hist, Rings, M, P),
program(Active, Hist, [ring(Active, I, Spin, value(0))|Ring], M, P)) :-
takeout(ring(Active, I, Spin, _), Rings, Ring).



Implementation note:


We "takeout" the active ring from the pair of rings,
dump any value in that accumulator and replace it with 0. The
rings are stored as an unordered list so that these common
commands may access either ring without falling into
state-dependent hackery (i.e. the big state switch statement).
This pattern of taking out the active ring, transforming it and
then rejoining it to the ring pair is a common one throughout the
implementation of the commands for this interpreter.




Implementating the load command


The load command is much like the zero command, except that it
also interacts with (the first cell of) the memory:



command(load, program(Active, Hist, Rings, Mem, P),
program(Active, Hist, [ring(Active, I, Spin, Value)|Ring], Mem, P)) :-
Mem = memory([cell(_, Value)|_]),
takeout(ring(Active, I, Spin, _), Rings, Ring).


Implementating the store
command


Finally, the store command moves the data in the opposite
direction of the load command -- from the accumulator to the memory:



command(store, program(Active, Hist, Rings, memory([cell(Idx, _)|Cells]), P),
program(Active, Hist, Rings, memory([cell(Idx, Value)|Cells]), P)) :-
ring(Active, _, _, Value) <- Rings.


Math Ring Commands


The commands for the math ring fall into two categories
-- arithmetic commands and
comparison commands.
Actually, both categories are very similar, the difference
between the two are the outcomes: comparison commands always
return a 1 or 0 result.



Arithmetic commands


The arithmetic commands are add, mult, div, and neg. They all
use the math ring's accumulator and the current memory value
(except, of course, neg, which just uses the accumulator
-- there's always got to be one exception to make life
interetesting) and perform the operation. By abstracting the
operation from the equation, all these commands reduce to one
rule that extracts the actors and the operation and one helper
predicate that then actually performs the arithmetic:



command(MathOp, program(math, Hist, Rings, M, P),
program(math, Hist, [ring(math, I, S, value(Value))|Ops], M, P)) :-
MathOp <- [add, mult, div, neg],
takeout(ring(math, I, S, value(A)), Rings, Ops),
M = memory([cell(_, value(B))|_]),
arithmetic(MathOp, A, B, Value).

arithmetic(add, A, B, Value) :- Value is A + B.
arithmetic(mult, A, B, Value) :- Value is A * B.
arithmetic(div, A, B, Value) :- Value is A / B.
arithmetic(neg, A, _, Value) :- Value is A * -1.


Comparison commands


The comparison commands follow the same pattern as the arithmetic
commands, only calling a comparison/4 predicate that gives a
result of 1 on success and 0 otherwise (which we cleverly reverse
when implementing the 'not' command):



command(CmpOp, program(math, Hist, Rings, M, P),
program(math, Hist, [ring(math, I, S, value(Value))|Ops], M, P)) :-
CmpOp <- ['<', '>', '=', not],
takeout(ring(math, I, S, value(A)), Rings, Ops),
M = memory([cell(_, value(B))|_]),
comparison(CmpOp, A, B, Value).

comparison('<', A, B, 1) :- A < B. comparison('>', A, B, 1) :- A > B.
comparison('=', A, A, 1).
comparison(not, 0, _, 1).
comparison(_, _, _, 0).


Ops Ring Commands


The commands specific to the ops ring are a varied lot; I group
them as two math commands (one
and logic), two (four, actually)
input/output commands (intIO and
ascIO), and four jump commands
(exit,
padd,
dadd, and
if). The math ring commands were
all of the same cloth, greatly simplifying their implementations;
these commands, however, have very little in common with each
other, so each will be discussed individually.



Implementing the exit command


The exit command exploits a feature of the interpreter: the
interpreter stops when it cannot find the index of the next
instruction. If there are no instructions in the instruction
state variable, the interpreter will terminate:



command(exit, _, program(_, _, _, _, program([]))).



Put obscurely: the empty Whirl program is a
quine
.6



Implementing the one command


The one command should look familiar...



command(one, program(ops, Hist, Rings, Mem, P),
program(ops, Hist, [ring(ops, I, Spin, value(1))|Maths], Mem, P)) :-
takeout(ring(ops, I, Spin, _), Rings, Maths).


... it is the same implementation of that of the
zero command. There is a bit
more strictness here -- we ensure that ops is the active
ring.



Implementing the padd command


The padd command is a relative jump instruction, using the
accumulator's value to determine the offset. It then "jumps to
that new program address" by moving that indexed instruction to
the head of the
instructions list. If
the padd command executes a jump that falls off (either) end of
the program the interpreter raises an exception, terminating the
Whirl program.



command(padd, program(ops, Hist, Rings, Mem, program([Idx - Opcode|Opcodes])),
program(ops, Hist, Rings, Mem, program([NewerIdx - Op|Codes]))) :-
% Jumps to program address X, if it exists; SEGVs sinon
ring(ops, _, _, value(X)) <- Rings,
floor(X, Y),
NewIdx is Idx + Y,
(takeout(NewIdx - _, [Idx - Opcode|Opcodes], _) ->
NewerIdx is NewIdx - 1,
takeout(NewerIdx - Op, [Idx - Opcode|Opcodes], Codes)
;
raise_exception(sigsegv(no_address(NewIdx), from(Idx)))).






Implementation note:


This command is unfortunately complicated by the
interpreter loop. The interpreter
automatically increments the program index
to fetch the next instruction, so
the jump command must decrement its offset to compensate. This
offset adjustment requires that the
interpreter put an extra instruction
into the
indexed instructions
in case the offset returns the interpreter to the beginning of
the program.

Sly Observation:


It is interesting to note that most programs avoid
using this command (as well as if).
For example,
fib.wr
supplies more numbers of the sequence by
repeatedly copying the relevant block of code. The lone
standout in the repetoire is
beer.wr
by Kang Seonghoon. Implementing this program without iteration,
albeit possible, is probably not worth the effort (which would
include purchasing an external drive to store the program...).

Optimization opportunity:


It becomes apparent from examining the runtime
profiles of larger programs that the takeout/3 predicate is used
quite a bit. This command uses takeout/3 twice. An alternate
approach is to add an additional term to the program, a program
counter, and convert the HT list to a fully ordered one. Prolog
does have a limit on the size of a term, so for very large Whirl
programs, storing the instructions into a single term will not
work, perhaps precompiling the Whirl Program as a fact-table of
the form (for, e.g., the first five instructions from
two-commented.wr):


instruction(0, 0).
instruction(1, 0).
instruction(2, 0).
instruction(3, 1).
instruction(4, 1).



Implementing the dadd command


The dadd command follows the same pattern as the
padd command, avoiding the
one-off complications:



command(dadd, program(ops, Hist, Rings, memory(Memory), P),
program(ops, Hist, Rings, memory([cell(NewIdx, NewV)|Mem]), P)) :-
% Here we jump to a new memory address, given that that cell exists.
% If the cell doesn't exist, create it and add it to the memory store;
% thereby "lazily" growing the memory store as needed.
%
% Remember, it's not a memory leak, it's lazy growth of the memory store
% (not in any respect resembling a memory leak ... *cough*).
Memory = [cell(Idx, _)|_],
ring(ops, _, _, value(X)) <- Rings, floor(X, Y), NewIdx is Y + Idx, (takeout(cell(NewIdx, SomeNewV), Memory, SomeMem) ->
NewV = SomeNewV,
Mem = SomeMem
;
NewV = value(0),
Mem = Memory).




Implementation note:


Of course, Prolog is a garbage-collected language, so
programmers need not concern themselves with allocating or
freeing memory cells (which would never occur, anyway; this
prompted the sly reference in the code's comment about memory
leaks).

Optimization opportunity:


There are several alternate approaches to the scheme
presented here:



  • It may be better to allocate a certain number of memory
    cells at first to cover the needs of most programs, or

  • do a quick scan of the program to see how memory cells it
    will need, allocate those up front in a static term, or

  • eliminate the concept of stored memory entirely for static
    programs: store all data in local
    variables
    .7






Implementing the logic command


The logic command's behavior depends on the value of the
currently selected memory cell. If the memory cell's value is 0,
it simply assigns the value 0 to the ops' accumulator:



command(logic, program(ops, Hist, Rings, Mem, P),
program(ops, Hist, [ring(ops, Inst, Spin, value(0))|Maths], Mem, P)) :-
Mem = memory([cell(_, value(0))|_]),
!,
takeout(ring(ops, Inst, Spin, _), Rings, Maths).


In all other cases, the logic command performs a bitwise
logic-and of the accumulator -- the accumulator becomes 1 for all
odd values and 0 for all even ones:



command(logic, program(ops, Hist, Rings, Mem, P),
program(ops, Hist, [ring(ops, Inst, Spin, value(Val))|Maths],
Mem, P)) :-
takeout(ring(ops, Inst, Spin, value(X)), Rings, Maths),
floor(X, Y),
Val is Y /\ 1.





Implementation note:


The above implementation is honest to the language
definition, but it may be overly complex. After all, the logic
command can be viewed as a boolean operation:


















Truth table for logic command
Memory Cell Value
Accumulator 0 any other value
00 0
odd0 1
even0 0


So a very simple (two-line) block of code could replace the
above implementation, but ... (see the observation below)



Sly Observation:


... the comment preceeding the logic command
implementation is telling:


% Okay, it would be an interesting coincidence to use the 'logic' opcode,
% but I suppose optimizing compilers would select this command in favor of
% one of the other brute-force assignment opcodes ...

Of the programs all the Whirl programs in existence (which I
believe have a one-to-one correspondence to the Whirl programs
posted on the main
Whirl site
), none use this command, so better
implementations or optimizations are moot until we have
programs that use this command to the point where its runtime
characteristics become an issue.




Implementing the if command


Of course, it is easy to imagine the implementation of the if
command if one thinks of it as a conditional jump
instruction:



command(if, P, P) :-
P = program(ops, _, _, memory([cell(_, value(0))|_]), _),
!.
command(if) --> command(padd).


And, given that implementation, we simply hand off the work to
the padd command.



Implementing the intIO command


This command allows integral entry or display, depending on the
state of accumulator. When the accumulator is 0, the program
reads an integer (raising an exception if the input cannot be
converted into an integer):



command(intIO, program(ops, Hist, Rings, memory([cell(Idx, _)|Cells]), P),
program(ops, Hist, Rings, memory([cell(Idx, value(N))|Cells]), P)) :-
ring(ops, _, _, value(0)) <- Rings, !, read_term(N, []), (integer(N) -> true; raise_exception(integral_expected(received(N)))).



In all other cases, this command outputs the current memory cell
as an integer:



command(intIO, P, P) :-
P = program(ops, _, _, memory([cell(_, value(V))|_]), _),
floor(V, Y),
write(Y).



Sly Observation: (rant, actually)


What exactly is an integer, anyway? Is
'CAFEBABE'8
an
integer? What is the integral value of '10111101100001'? This
command raises a set of questions about representation, but then
gives no ready answers. Then there are the questions about very
different representations, such as the

Church numerals
,
Peano
series
,
spatial
forms
(graphic notation for numerals), or 'MCMXCVII'. There
are also questions about Arabic and Chinese numerals, but we
will leave these pensées for the
ascIO command.



Nothing in the above rant will prevent me from using this
command whenever convenient ... printing numerals in BF or
Lazy
K
is a rather tedious exercise.





Implementing the ascIO command


Of the same cloth as the intIO
command
, except this command accepts all the ASCII
characters, not just numeric input:



command(ascIO, program(ops, Hist, Rings, memory([cell(Idx, _)|Cells]), P),
program(ops, Hist, Rings, memory([cell(Idx, value(C))|Cells]), P)) :-
ring(ops, _, _, value(0)) <- Rings, !, get(C). command(ascIO, P, P) :- P = program(ops, _, _, memory([cell(_, value(V))|_]), _), floor(V, Y), put(Y).



Sly Observation: (rant
continuation
)


ASCII? How 20th century! And, being that one of the
major contributors to the Whirl repository also
uses
Hangeul quite a bit
, it would be nice to provide input and
output across a range of character sets ... unicode does seem
to have settled into some order since its shaky beginnings...



Fetch Next Instruction



Now that we have covered the command resolution and execution in
detail, the only remaining part of the interpreter is
continuation. How do we move to the next instruction? This task
falls to the interpret_next_instruction/2 predicate. The first
clause handles the case where there are more instructions to
interpret:



interpret_next_instruction(P0, P) :-
% Loops until we run out of tokens to interpret
P0 = program(Active, Hist, Rings, Mem, program([Idx - Op|Codes])),
!,
NewIndex is Idx + 1,
(takeout(NewIndex - Code, [Idx - Op|Codes], Program) ->
interpret(program(Active, Hist, Rings, Mem,
program([NewIndex - Code|Program])), P)
;
P = P0).


It does this work by incrementing the program counter (which is
the index of the currently executed instruction; and this
increment does make the implementation of the
padd command interesting),
grabbing the instruction at that new address and intepreting
it.



For the case where there are no more instructions to interpret
(possibly because the exit
command
drained the
instructions state
variable
), the second clause simply does nothing, causing the
Whirl program to exit:



interpret_next_instruction --> [].


Conclusion



What an interesting programming language Whirl is! Even with its
many differences from most traditional (and most
non-traditional) programming languages, it still follows
traditional techniques for interpretation. Building an
interpreter for this language may take an afternoon, but building
a good one, as this paper highlights in several areas, requires
more thought and work. Other (excellent) interpreters are
available from the Whirl
page
; the purpose of this paper was to show the ease in which
an interpreter could be developed for Whirl (and, by extension,
many programming languages), and to show that the declarative
nature of Prolog helps in this development process.






Appendix A: The "Flavor" of Prolog



There are more than several introductions to
Prolog and its programming style
.9
The aim of this appendix is not to cover these
principles, but some code snippets from the
interpreter sources do convey
the flavor of "thinking in Prolog", and we present these here to
illustrate the differences between Prolog and the more
traditional functional/imperative programming style.



Prolog is not a functional programming language; its semantics is
based on the language of the predicate calculus, so statements do
not resolve to a value, they resolve, period. A "function's"
value is not important to the veracity of proof; what is
important is the resolution to provability ("truth"). So, for
example, the rotate/2 predicate has the following code
fragment:



...
adjust(SemiIdx, NewIdx),
takeout(NewIdx-NewCmd, [Idx-Cmd|Insts], NewInsts),
...


A functional programming language's equivalent is not so easy to
obtain, as the calls above use unification (pattern matching is a
weak subset) and implicit backtracking to resolve these goals.
The predicate adjust/2 has a simple enough functional
equivalent...



NewIdx = adjust SemiIdx


... but attempting the same transformation for the takeout/3
predicate would be incorrect, for the following equation...



NewIdx-NewCmd
= takeout [Idx-Cmd|Insts] NewInsts


...does not convey, in the functional programming sense, that
takeout requires NewIdx to obtain NewCmd and also that NewInsts
is returned as well as NewCmd (in fact, any and all of
takeout/3's arguments may be ground terms or free
variables).



A simpler example is from the same predicate:



...
direction(Spin, Offset),
SemiIdx is Idx + Offset,
...


where direction/2 is defined as:



direction(clockwise spin, 1).
direction(counterclockwise spin, -1).




The functional equivalent is a direct translation:



SemiIdx = Idx + direction Spin

with the function direction being:


direction clockwise = 1
direction counterclockwise = -1




This translation is not so simple, however, because the issue now
lies with the Prolog code -- is/2 is an attempt to bring some of
the functional style into Prolog, particularly for arithmetic,
but the predicate itself is viewed with some suspicion by the
logic programming community, as it has extra-logical
ramifications (particularly with non-deterministic modality).
Successors to Prolog have in various ways attempted to excise the
language of extra-logical features, either by attempting to wed
the functional and logical programming style (with various
degrees of compromise and success) or by eliminating
these features altogether. It is a testament to Prolog, warts
and all, that it still towers over its successors as the logic
programming language of choice. One might say it is an
improvement over its successors.





Endnotes































1 We leave a detailed and illustrative description
of the language
to more capable hands. Sean Heber at
http://www.bizaphod.org/whirl
maintains the central repository of the language and its
resources. We do provide a
formal
definition of the language
from which we develop our
interpreter.
2 Whirl does not have a rigorous proof of being
Turing-equivalent, unlike BF (see the proof at

http://www.iwriteiam.nl/Ha_bf_Turing.html
), and with the
dynamic nature of the language (a jump command may return to the
same sequence of instructions, but the semantics may be entirely
different if the state is not first properly restored), a proof may
require a bit more cleverness and effort. At any rate, such a
proof is outside the scope of this manual.
3 One can obtain constant-time access by
creating a static term: accessing a term's argument occurs in
constant-time, whereas an element access in a list occurs in
linear time. Exploiting this difference is widely known,
e.g. [Bratko2001], § 8.5.5
demonstrates using this technique.
4 As a declarative, rule-based language,
Prolog
is particularly well-suited to program language
compilation/interpretation, automating tasks such as parsing
(syntax) and program logic (sematics). I have also found that
it is a good "specification" language allowing requirements to
be translated into code with very little fuss.
5 For the sake of cohesiveness and brevity we
do not present the implementation of these helper predicates in
this document. Appendix A explores
some snippets of note in the implementation.
6 Some have used this common rule, "the empty
program evaluates to itself"
, to write quines in a variety of
languages. This, of course, is bad form. Of equal badness, I
must add, is using the feature of printing the returned value in
functional languages to make very tiny, thoughtless, quines, such
as the quine "42" in Lisp or Smalltalk or Haskell or whatever.
Nothing learnt there. Off the quine page
(http://www.nyx.net/~gthompso/quine.htm)
there are some interesting explorations of developing quines. I
found the research off the INTERCAL page
(
http://www.muppetlabs.com/~breadbox/intercal
) to be
illuminating (particularly
YAPP
and the
INTERCAL quine page
). My favorite is to write a program that
produces a program in that language that produces as output the
input string to the first program. Then, feed that program
itself as the input string. Simple and beautiful.
7 Of course, Prolog variables, once bound, are
immutable!
[Bratko2001], §
8.5.5 demonstrates using lists to represent the history of a
mutable variable. A problem with Whirl programs, like BF
programs, is that everything is global; there are no local
variables. So, one must use data flow analysis to track a
variable's use in the program.
8 'CAFEBABE' is the first 32 bits of every
Java class (binary) file.
9 See, e.g.
http://www.cotilliongroup.com/code/research.htm
along with
other compilations, for a list of both introductory and more
advanced works on Prolog programming.


Works Consulted






[Bratko2001] Prolog Programming for Artificial Intelligence, 3rd ed.,
Ivan Bratko, Addison-Wesley, Reading, Massachusetts, 2001.









author:Douglas M. Auclair
(email: dauclair at hotmail dot com)
date:November 29, 2005
whirl@

http://www.bizaphod.org/whirl
Whirl created by:Sean Heber

Monday, November 28, 2005

Formal Whirl Program Definition




Formal Whirl Program Definition



Formal Whirl Program Definition



The execution model of a Whirl program ("Whirl program" is a term
that means either the execution model or the sequence of
instructions which is a composition of "0" and "1" symbols) depend
on the instructions themselves as well as two levels of state:
one state level determines which table to use to determine the
instruction's command (these tables are called the "ops ring" and
the "math ring"), the other state level determines for that table
a set of conditions that determine which command with which
argument (if any) is to be selected for execution. A Whirl
program also contains "memory" of an indeterminate number of
mutable cells, each cell contains a numeric value initialized to
0.

Program



The above description can be summarized precisely by the
following tuple equivalence:













PWhirl = { Ring-typeactive,
Hist, Rings,
MemHT(indexed),
InstHT(indexed)
}
 where:
 





















PWhirl is the execution model of this Whirl Program ("the
program")
Ring-typeactive one-of([ops, math]) indicates which ring is currently active
Histhist(State, last_instruction(Inst)) the results of interpretation of the previous
instruction
 where: 
 







Stateone-of([quescient, executed]) was a command selected for execution?
 'quescient'
means a command was not selected for execution
 'executed'
means a command was selected and executed
Instone-of(["0", "1"]) the last interpreted instruction

Rings { Ringops,
Ringmath }
the current state of each ring
(rings described below)
MemHT(indexed) is A list of memory cells
(described below); current active
one at the head
InstHT(indexed) is Indexed (unsorted) program instructions; current active
one at the head (structure
described below
).




Instructions



Semantics



Whirl has two instructions "0", and "1". The meanings of these
instructions are captured here:










1

Rotate the currently active ring in its indicated direction
and set the history to "hist(quescient,
last_instruction(1))"



ex 1:









GIVEN the current ring is "ops"
 and its direction is "clockwise spin"
 and
the currectly selected command is "3 - zero"
THEN the ops ring's newly selected command is
"4 - load"


ex 2:









GIVEN the current ring is "math"
 and its direction is "counterclockwise spin"
 and
the currectly selected command is "0 - noop"
THEN the math ring's newly selected command is
"11 - neg"



0




First, Reverse the spin of the currently active ring
And then, perform the test:
 















IF the last instruction was "0",
 and the last instruction did not result in a
command execution,
  (i.e.: the command history is
"hist(quescient, last_instruction(0))"
)
THEN execute the currently selected command
 and activate the inactive ring and deactivate this one (i.e.:
switch rings
)
 and set the history to "hist(executed, last_instruction(0))"
OTHERWISE set the history to
"hist(quescient, last_instruction(0))"








ex 1: 
 










GIVEN the current ring is "math"
 and its direction is "counterclockwise spin"
 and
the command history is
"hist(executed, last_instruction(0))"
THEN the math ring's new direction is
"clockwise spin"
 and the command history is set to
"hist(quescient, last_instruction(0))"

ex 2: 
 

















GIVEN the current ring is "ops"
 and its direction is "clockwise spin"
 and the currently selected command is "2 - one"
 and
the command history is
"hist(quescient, last_instruction(0))"
THEN the ops ring's new direction is
"counterclockwise spin"
 and the "one" command is executed
 
(i.e.: the ops Accumulator is set to 1)
 and the ops ring is deactivated; the math ring, activated
 and the command history is set to
"hist(executed, last_instruction(0))"

ex 3: 
 










GIVEN the current ring is "math"
 and its direction is "clockwise spin"
 and
the command history is
"hist(quescient, last_instruction(1))"
THEN the math ring's new direction is
"counterclockwise spin"
 and the command history is set to
"hist(quescient, last_instruction(0))"





Traversal



Whirl instructions are executed in sequence (a
padd command alters the sequence by a
relative amount for the next instruction to be executed), and to
enforce this discipline, we store the sequence of instructions
decorated with an index (such storage is normally referred to as an
"array", but such a term has a grossly overloaded meaning
(particularly when it comes to element access) in the software
engineering field). Individual elements are of the form:



{ Index, Instruction }






where: 
 Index indicates this instruction is ith in
the program source
 Instruction is one-of(["0", "1"])




These elements are stored in a list with the only ordering being
that the current instruction is at the head of the list. The
next instruction is obtained by incrementing the Index by an
offset (the offset is either 1 or the amount in the ops
accumulator if the command executed is a 'padd') and then moving
that corresponding element to the head of the list.



Program Termination



The program will terminate when any of the following conditions
are met:




  • The exit command is executed; or,

  • The next index sought is not in
    InstHT(indexed); or,

  • The program encounters an error (e.g. the user enters
    non-numeric input on a intIO input
    command)




Rings



Whirl programs interact with two rings, the "ops" ring and the
"math" ring. Each ring has its own
(not necessarily orthogonal) set of commands, a selector pointing
to the command to be executed, an accumulator, and a (reversible)
direction of spin. The general structure of each ring is as
follows:










RingType = { CommandsHT, Direction, Accumulator }
 where:
 







Typeone-of([ops, math])which ring this tuple represents
CommandsHT isList of indexed (unsorted) commands; the command at the head is considered
selected for execution
Directionone-of([clockwise spin, counterclockwise spin]) the direction in which the ring rotations for next command selection
Accumulatorvalue(Number) Number is initially 0 and changes when commands are executed









The indexed commands for the ops ring are: The indexed commands for the math ring are:















0-noop 4-load 8-logic
1-exit 5-store 9-if
2-one 6-padd 10-intIO
3-zero 7-dadd 11-ascIO















0-noop 4-mult 8->
1-load 5-div 9-=
2-store 6-zero 10-not
3-add 7-< 11-neg




Ring Command Semantics



Not only is command selection dependent on a dual state layer from the instruction, but also
the command itself may have meaning dependent on the state. The commands are described as follows:



































































IndexCommandPreconditionsEffect
Ops Ring
0noopn/a(no effect)
1exitn/aprogram halts
2oneAccumulator = any valueAccumulator = 1
3zeroAccumulator = any valueAccumulator = 0
4load Accumulator = any value; Current Memory cell = NumAccumulator = Num
5store Accumulator = Num; Current Memory cell = any valueCurrent Memory cell = Num
6padd Accumulator = Num1; program index = Num2 program index = Num1 + Num2
(relative jump)
7dadd Accumulator = Num1; memory index = Num2 memory index = Num1 + Num2
(relative access)
8logic


current memory index = 0
otherwise (Accumulator = Val)



Accumulator = 0
Accumulator = Val & 1
9if


current memory index = 0
otherwise



noop
padd
10intIO


Accumulator = 0
otherwise



current memory value = input integer
current memory value printed as an integer
11ascIO


Accumulator = 0
otherwise



current memory value = input ASCII character
current memory value printed as an ASCII character
Math Ring
0noop n/a(no effect)
1load Accumulator = any value; Current Memory cell = Num Accumulator = Num
2store Accumulator = Num; Current Memory cell = any value Current Memory cell = Num
3add Accumulator = A; Current Memory cell = B Accumulator = A + B
4mult Accumulator = A; Current Memory cell = B Accumulator = AB
5div Accumulator = A; Current Memory cell = B Accumulator = A / B
6zero Accumulator = any valueAccumulator = 0
7<


Accumulator < current memory value
otherwise



Accumulator = 1
Accumulator = 0
8>


Accumulator > current memory value
otherwise



Accumulator = 1
Accumulator = 0
9=


Accumulator = current memory value
otherwise



Accumulator = 1
Accumulator = 0
10not


Accumulator = 0
otherwise



Accumulator = 1
Accumulator = 0
11neg Accumulator = NumAccumulator = -Num



Memory



The Whirl specification states that Whirl program memory is
infinite; we obey by accumulating memory on an as-needed basis
("lazily"). Memory follows a similar structural and allocation
scheme as with program
instruction storage
: decorated with an index when the head
element being the currently selected one.



The difference with memory is that:




  1. instructions are of the binary form "0" or "1", but memory may
    store any (Rational) numeric value; and,

  2. it starts initially with one tuple: { 0, 0.0 }; and,

  3. when another element is selected (with
    dadd) that is not present, that element
    is added to the memory store, initialized to { Index, 0.0 }



It is implementation-dependent as to how memory values are
converted to integral or to (ASCII) character
near-equivalents.









author:Douglas M. Auclair
date:November 28, 2005
whirl@ href="http://www.bigzaphod.org/whirl">http://www.bigzaphod.org/whirl