Mutable Syntax in the Mercury Programming Language
Introduction
The Mercury programming
language is a compiled, strict, pure, type-safe logical and
functional programming language. Its programming methodology is
based on predicate logic, with syntax and semantics in line with
that of Prolog. Along with logic, it has a fully integrated
Hindley-Milner type system with lambda-terms, very much in the
style of the Haskell programming
language.
The smooth integration of Prolog with Haskell sounds like a
marriage of programming paradigms to release the programmer into
Coding Nirvana, as it were. This is how it is for most cases, as the
language has a consistent design philosophy backed up by
well-researched principles and explained by copious and clear
documentation with numerous practical examples. What remains are
niche constructs, that is: constructs that may be helpful for
specific problems, but are not strictly necessary, nor generally
applicable.
One such niche construct, one that I turn to quite often when
programming in Prolog, is the op/3
declaration, or,
the ability to introduce new syntax into the language so that I
may model the problem more naturally. This document covers
extending the language to include the op/3
declaration in its full breath of functionality.
Alternatives, and Raison d'être
The approach we take is the modify the compiler so that it
accepts the op/3
directive in a module and
thereafter, within only that module, parse the operator
declared with the specification and priority given in the
directive. This may seem like a drastic measure, so we must
consider the alternatives before choosing this course of action.
There are basically four viable, albeit inferior, alternatives:
The Mercury programming language provides the grave
syntactic construct which converts the standard prefixed call
to an infix one:fn(X, Y)
becomesX `fn` Y
See, for example, the
pprint
module, as it used to
make extensive use of this style (until it recently deprecated
this approach to use one of the builtin operators, instead).
Just as the Mercury language developers have discovered, this
approach has at least two drawbacks:- these infix "operators" are clearly marked as
second-class citizens, unnecessarily lengthening what is
supposed to be a concise
representation;1
and, - only binary infix operators are allowed under this syntax; I
often find it useful to type values using a postfix
type.
- these infix "operators" are clearly marked as
One could construct a specialized instance of
theop_table
typeclass from theops
module and thereafter useread_term_with_op_table/4
from theterm_io
module to parse strings at
runtime. See samples/calculator2.m provided with the
distribution for an example program that demonstrates this
approach.This approach also has its own set of associated problems:
- Constructing one's own
op_table
is excessive
when using only a few operators and tedious when introducing
many operators. This manual process steals precious time
away from program development that addresses the problem,
itself. - Until now, there was no "cookbook" approach addressing
the problem of how to create a mutable syntax.
Theops
module and the sample calculator program
are well-documented and provide good examples of how to
implement static syntax, but provide no guidance for
constructing dynamic, mutable, syntax. For this, one had to
design such a framework from first
principles.2 - An user-defined
op_table
instance may only be
used at runtime. The Mercury compiler, as implemented, does
not allow such tables during module compilation.
- Constructing one's own
Third, use one of the available scanners (such as
samples/lex/) or parser generators (such as samples/moose/) to
create a language syntax-aware preprocessor that substitutes
operators and their arguments with the well-formed term
replacement. Problems:- This is highly redundant and fruitless exercise,
as the compiler has its own parse phase that does the same
work, and with the language itself in flux (as is the case for
any living language) changes to the syntax quickly render a
system created by these means obsolete. Parser generators
for other programming languages provide complete grammars
for every version of the host programming language, Mercury
has no parser generator with such grammars, so this task is
left to a user of these kinds of systems. - Furthermore, although the domain-specific languages for
these tools closely follow the Mercury programming language
to do their work, they do have their own languages that
require time and effort to master. When presented with
powerful parsing facilities built right into logic
programming languages (I'm referring specifically to Definite
Clause Grammars (DCGs)), one must weigh the costs of learning
these languages before embarking on such an endeavor.
- This is highly redundant and fruitless exercise,
Worst for last: as with C/C++, create a specific
preprocessor that parses the source file, converting annotated
operators to equivalent Mercury terms by following the
preprocessing directives. This approach requires so much work
(the C preprocesser is a compiler-sized program) and has so many
known pitfalls (such as replacing elements inappropriately (in
a quoted string, for example) and causing an unacceptable
disjunction between the generated executable and the original
source base (confusing debugging and error reporting efforts),
that it should not receive serious
consideration;3
it does not in this document, at any rate.
By embedding the op/3
syntax into the compiler, the
changes we make are hygenic in that they are part of the
language syntax, not external and blindly unaware of it, as is the
case with with C preprocessor and immediate so that they
may be used at compile time in the module in which they are
declared. This implementation also limits the lexical scope
of the operator within the module in which it is
declared,4
preventing these declarations from corrupting modules that
eventually use modules with specialized syntax.
The desired state is to integrate the op/3
declaration fully into the the language, so that, e.g., facts may
be stated in their vernacular and still be compiled into
executable content in the Mercury idiom, as in this real-world
example:
for the open weekly timecard ending date(2006, 1, 6):
employee cgi_emp_001 billed [
3 hours on sunday - date(2006, 1, 1),
16.5 hours on monday - date(2006, 1, 2),
5 hours on tuesday - date(2006, 1, 3),
5 hours on wednesday - date(2006, 1, 4)
] against contract lt_2005_001.
Far from being a contrived pedagogical
example,5 the
above illustrates the various typing uses of op/3
defined syntax, both prefix ('employee cgi_emp_001
')
and postfix ('3 hours
'). The above fact is certainly
"only" a data term (in fact, as well as being a data term, the
above fact also contains op/3
-based data
terms), but fully actualized operators exist as well; the Prolog
syntax
module is rife with such examples. These uses ofop/3
-declared syntax (describing entity
relationships clearly and as activated syntax) are in no way
limited to the rather straightforward problems of accounting, but
are also used in production expert systems handling over 1,000,000
transactions per day; the use of these extensions are tied
directly to rule findings satisfying customer requirements.
In short, op/3
-declared syntax is used extensively in
production systems built using Prolog serving real-world
requirements under heavy demands. With the preexisting extensions
for purity, typing, and functional programming, imagine the
utility and expressivity that could be obtained with Mercury so
extended!
Implementation
Now that we have justification for modifying the compiler,
nothing remains but to get to it. Fortunately, the Mercury
compiler, after some study, yields a straightforward
implementation approach.
First things first: theops
module uses a discriminator (the typecategory
) to choose among
different uses for an operator (e.g. unary '-
'
verses binary '-
'). This discriminator is internal,
and, as we need the same functionality when defining new
operators, so we externalize that type in library/ops.m by moving
the type declaration from the implementation section to the
interface.
Given this type, we now decorate predefined (inflexible) op table
that will permit additional syntax declarations. For this, we
need to index the operator and its category
to the
syntax declaration, and then make this new type anop_table
(typeclass) instance ... we add this type
to the interface of compiler/prog_io_util.m:
:- type op_map == map(pair(string, ops.category), op_info).
:- type mercury_op_map ---> mercury_op_map(ops.table, op_map).
:- instance ops.op_table(mercury_op_map).
To further support the new type, we need information against
which we index, and we need supporting predicates to construct
the information for the parser when encountering the
operator (the declarations for this also go into the interface of
compiler/prog_io_util.m):
:- type op_info ---> op_info(ops.specifier, ops.priority).
:- func op_specifier_from_string(string) = ops.specifier.
:- func op_category_from_specifier(ops.specifier) = ops.category.
The op_specifier_from_string
function simply takes
an input string, e.g. "xfx"
, and coverts it to the
equivalent specifier representation, e.g. the functorxfx
. The op_category_from_specifier
function follows the (implied) convention of the ops
module, which is all prefix specifier
types
(including binary prefix) are the before
category
and all other specifier
types
(one of several different infix and postfix possibilities) are
the after
category
type. The complete
set of changes are enumerated explicitly in
the email on the
implementation.
After we augment the functional of the ops
module,
we need to integrate this into the compiler's parser module
(which is actually called prog_io
). The efficacious
point is where the parser works at the module
level,6 this
occurs, after some initialization in read_module/11
,
in read_all_items/7
. We initialize the op map here
(with a call to init_mercury_op_map
), and then pass
along that nascent syntax map to the calls that parse the items in the
module (by modifying the signatures
of read_first_item/9
and the recursive calls read_items_loop_2/11
and read_items_loop/10
).
So, for example, read_items_loop/9
becomes:
read_items_loop(ModuleName,
SourceFileName, !Msgs,
!Items, !Error, Syn0,
!IO)
... where Syn0 is the new syntax map. This map is
initialized in the new read_all_items/7
before
calling read_items_loop/10
with the goal:
init_mercury_op_map(init_mercury_op_table,
Syntax)
The magic occurs in module parser
'sread_term_with_op_table/5
(called viaread_item/7
) which normally scans and parses the
items in the module. When it encounters an op/3
declaration, however, it eventually resolves to theprocess_decl/8
back in the prog_io
module, which reads the declaration and then adds the syntax
declaration to the op map, enhancing the syntax for the current
module.
When read_all_items/7
completes its iteration on a
module's items, it exits, discarding the op_map
instance and any syntax it accumulated from op/3
declarations in that module, returning the compiler to the base,
Mercury-defined, syntax. So that the "next" module starts fresh
without syntax from other modules polluting the compilation.
Reconsideration
"Worse is Better"7
After some discussion on the Mercury maillist, it was resolved
that dynamic syntactic extensions should be external to the
compiler. So, Logical Types has developed separate compiliation
system that converts modules with syntactic enhancements to
plain-jane Mercury equivalents. For modules with no syntactic
enhancements, `ltq
' is equivalent to `mmc
'. For modules with
--make --infer-allop/3
declarations in the implementation, `ltq
' first
parses the module and writes out all terms canonically. After
this translation, the system compiles the modules into the
resulting executable or library.
Operation
This system reduces rather nicely by using facilities provided
by the Mercury compiler, and another declarative system:make
. `mmc -M
<file>'
discovers file's dependencies and stores these in a
makefile variable $(<file>.ms) in the file<file>.dv
Given this,ltq
simply builds a makefile with the enumerated
dependencies and then calls the system that manages the dynamic
syntax, which then writes out syntactically-enhanced modules in
their canonical form (called `dopp
'). Bothltq
and dopp
are available, along
with samples as
dynamic_ops.tgz.
Conclusion
This study came from my experience with the ease of use of
mallable syntax in Prolog and comments in the Mercury sources
about the need to add op/3
declarations as well as
at least two aborted implementation attempts to do so. In the
ensuing process, where I did implement this solution, quite a
discussion emerged on the maillist on the estetic of allowing the
user to introduce or to change syntax, and how to go about doing
it properly. This implementation is one approach, and is offered
to assist those who wish to add syntactic extensions to their
Mercury systems.
Endnotes
1 | The normal infix operators do not have this
while shackled to the grave syntax:
Note the extra parentheses -- these are now necessary, as the grave syntax does not communicate operator
...but I have no desire to write out the parsed internal representation by hand (it may look like Lisp, circa
There! Isn't the canonical tree syntax so much better than |
2 | This is not all that bad, given the |
3 | This pronouncement in no way prevented this |
4 | In ISO Prolog |
5 | [Bratko2001], :- op(300, xfx, plays). ...but then the textbook quickly redeems itself -- it ruth was the executive director at wncog. I leave the |
6 | Prolog's |
7 | "Worse Is Better" the catchy title of one |
Works Consulted
[Bratko2001] | Prolog Programming for Artificial Intelligence, 3rd ed., Ivan Bratko, Addison-Wesley, Reading, Massachusetts, 2001. |
author: | Douglas M. Auclair (email: dauclair at hotmail dot com) |
---|---|
date: | January 3, 2006 |
This document is copyright © 2006 by Logical Types, LLC. under the GNU FDL, version 1.2 |