You cannot select more than 25 topics
Topics must start with a letter or number, can include dashes ('-') and can be up to 35 characters long.
1011 lines
34 KiB
Markdown
1011 lines
34 KiB
Markdown
## LuaMacro - a macro preprocessor for Lua
|
|
|
|
This is a library and driver script for preprocessing and evaluating Lua code.
|
|
Lexical macros can be defined, which may be simple C-preprocessor style macros or
|
|
macros that change their expansion depending on the context.
|
|
|
|
It is a new, rewritten version of the
|
|
[Luaforge](http://luaforge.net/projects/luamacro/) project of the same name, which
|
|
required the [token filter
|
|
patch](http://www.tecgraf.puc-rio.br/~lhf/ftp/lua/#tokenf) by Luiz Henrique de
|
|
Figueiredo. This patch allowed Lua scripts to filter the raw token stream before
|
|
the compiler stage. Within the limits imposed by the lexical filter approach this
|
|
worked pretty well. However, the token filter patch is unlikely to ever become
|
|
part of mainline Lua, either in its original or
|
|
[revised](http://lua-users.org/lists/lua-l/2010-02/msg00325.html) form. So the most
|
|
portable option becomes precompilation, but Lua bytecode is not designed to be
|
|
platform-independent and in any case changes faster than the surface syntax of the
|
|
language. So using LuaMacro with LuaJIT would have required re-applying the patch,
|
|
and would remain within the ghetto of specialized, experimental use.
|
|
|
|
This implementation uses a [LPeg](http://www.inf.puc-rio.br/~roberto/lpeg.html)
|
|
lexical analyser originally by [Peter
|
|
Odding](http://lua-users.org/wiki/LpegRecipes) to tokenize Lua source, and builds
|
|
up a preprocessed string explicitly, which then can be loaded in the usual way.
|
|
This is not as efficient as the original, but it can be used by anyone with a Lua
|
|
interpreter, whether it is Lua 5.1, 5.2 or LuaJIT 2. An advantage of fully building
|
|
the output is that it becomes much easier to debug macros when you can actually see
|
|
the generated results. (Another example of a LPeg-based Lua macro preprocessor is
|
|
[Luma](http://luaforge.net/projects/luma/))
|
|
|
|
It is not possible to discuss macros in Lua without mentioning Fabien Fleutot's
|
|
[Metalua](metalua.luaforge.net/) which is an alternative Lua compiler which
|
|
supports syntactical macros that can work on the AST (Abstract Syntax Tree) itself
|
|
of Lua. This is clearly a technically superior way to extend Lua syntax, but again
|
|
has the disadvantage of being a direct-to-bytecode compiler. (Perhaps it's also a
|
|
matter of taste, since I find it easier to think about extending Lua on the lexical
|
|
level.)
|
|
|
|
My renewed interest in Lua lexical macros came from some discussions on the Lua
|
|
mailing list about numerically optimal Lua code using LuaJIT. We have been spoiled
|
|
by modern optimizing C/C++ compilers, where hand-optimization is often discouraged,
|
|
but LuaJIT is new and requires some assistance. For instance, unrolling short loops
|
|
can make a dramatic difference, but Lua does not provide the key concept of
|
|
constant value to assist the compiler. So a very straightforward use of a macro
|
|
preprocessor is to provide named constants in the old-fashioned C way. Very
|
|
efficient code can be generated by generalizing the idea of 'varargs' into a
|
|
statically-compiled 'tuple' type.
|
|
|
|
tuple(3) A,B
|
|
|
|
The assigment `A = B` is expanded as:
|
|
|
|
A_1,A_2,A_3 = B_1,B_2,B_3
|
|
|
|
I will show how the expansion can be made context-sensitive, so that the
|
|
loop-unrolling macro `do_` changes this behaviour:
|
|
|
|
do_(i,1,3,
|
|
A = 0.5*B
|
|
)
|
|
|
|
expands to:
|
|
|
|
A_1 = 0.5*B_1
|
|
A_2 = 0.5*B_2
|
|
A_3 = 0.5*B_3
|
|
|
|
Another use is crafting DSLs, particularly for end-user scripting. For instance,
|
|
people may be more comfortable with `forall x in t do` rather than `for _,x in
|
|
ipairs(t) do`; there is less to explain in the first form and it translates
|
|
directly to the second form. Another example comes from this common pattern:
|
|
|
|
some_action(function()
|
|
...
|
|
end)
|
|
|
|
Using the following macro:
|
|
|
|
def_ block (function() _END_CLOSE_
|
|
|
|
we can write:
|
|
|
|
some_action block
|
|
...
|
|
end
|
|
|
|
A criticism of traditional lexical macros is that they don't respect the scoping
|
|
rules of the language itself. Bad experiences with the C preprocessor lead many to
|
|
regard them as part of the prehistory of computing. The macros described here can
|
|
be lexically scoped, and can be as 'hygenic' as necessary, since their expansion
|
|
can be finely controlled with Lua itself.
|
|
|
|
For me, a more serious charge against 'macro magic' is that it can lead to a
|
|
private dialect of the language (the original Bourne shell was written in C
|
|
'skinned' to look like Algol 68.) This often indicates a programmer uncomfortable
|
|
with a language, who wants it to look like something more familiar. Relying on a
|
|
preprocessor may mean that programmers need to immerse themselves more in the idioms of
|
|
the new language.
|
|
|
|
That being said, macros can extend a language so that it can be more expressive for
|
|
a particular task, particularly if the users are not professional programmers.
|
|
|
|
### Basic Macro Substitution
|
|
|
|
To install LuaMacro, expand the archive and make a script or batch file that points
|
|
to `luam.lua`, for instance:
|
|
|
|
lua /home/frodo/luamacro/luam.lua $*
|
|
|
|
(Or '%*' if on Windows.) Then put this file on your executable path.
|
|
|
|
Any Lua code loaded with `luam` goes through four distinct steps:
|
|
|
|
* loading and defining macros
|
|
* preprocessing
|
|
* compilation
|
|
* execution
|
|
|
|
The last two steps happen within Lua itself, but always occur, even though the Lua
|
|
compiler is fast enough that we mostly do not bother to save the generated bytecode.
|
|
|
|
For example, consider this `hello.lua`:
|
|
|
|
print(HELLO)
|
|
|
|
and `hello-def.lua`:
|
|
|
|
local macro = require 'macro'
|
|
macro.define 'HELLO "Hello, World!"'
|
|
|
|
To run the program:
|
|
|
|
$> luam -lhello-def hello.lua
|
|
Hello, World!
|
|
|
|
So the module `hello-def.lua` is first loaded (compiled and executed, but not
|
|
preprocessed) and only then `hello.lua` can be preprocessed and then loaded.
|
|
|
|
Naturaly, there are easier ways to use LuaMacro, but I want to emphasize the
|
|
sequence of macro loading, preprocessing and script loading. `luam` has a `-d`
|
|
flag, meaning 'dump', which is very useful when debugging the output of the
|
|
preprocessing step:
|
|
|
|
$> luam -d -lhello-def hello.lua
|
|
print("Hello, World!")
|
|
|
|
`hello2.lua` is a more sensible first program:
|
|
|
|
require_ 'hello-def'
|
|
print(HELLO)
|
|
|
|
You cannot use the Lua `require` function at this point, since `require` is only
|
|
executed when the program starts executing and we want the macro definitions to be
|
|
available during the current compilation. `require_` is the macro version, which
|
|
loads the file at compile-time.
|
|
|
|
New with 2.5 is the default @ shortcut available when using `luam`,
|
|
so `require_` can be written `@require`.
|
|
(`@` is itself a macro, so you can redefine it if needed.)
|
|
|
|
There is also `include_/@include`, which is analogous to `#include` in `cpp`. It takes a
|
|
file path in quotes, and directly inserts the contents of the file into the current
|
|
compilation. Although tempting to use, it will not work here because again the
|
|
macro definitions will not be available at compile-time.
|
|
|
|
`hello3.lua` fits much more into the C preprocessor paradigm, which uses the `def_`
|
|
macro:
|
|
|
|
@def HELLO "Hello, World!"
|
|
print(HELLO)
|
|
|
|
(Like `cpp`, such macro definitions end with the line; however, there is no
|
|
equivalent of `\` to extend the definition over multiple lines.)
|
|
|
|
With 2.1, an alternative syntax `def_ (name body)` is also available, which can be
|
|
embedded inside a macro expression:
|
|
|
|
def_ OF_ def_ (of elseif _value ==)
|
|
|
|
Or even extend over several lines:
|
|
|
|
def_ (complain(msg,n)
|
|
for i = 1,n do
|
|
print msg
|
|
end
|
|
)
|
|
|
|
`def_` works pretty much like `#define`, for instance, `def_ SQR(x) ((x)*(x))`. A
|
|
number of C-style favourites can be defined, like `assert_` using `_STR_`, which is
|
|
a predefined macro that 'stringifies' its argument.
|
|
|
|
def_ assert_(condn) assert(condn,_STR_(condn))
|
|
|
|
`def_` macros are _lexically scoped_:
|
|
|
|
local X = 1
|
|
if something then
|
|
def_ X 42
|
|
assert(X == 42)
|
|
end
|
|
assert(X == 1)
|
|
|
|
LuaMacro keeps track of Lua block structure - in particular it knows when a
|
|
particular lexical scope has just been closed. This is how the `_END_CLOSE_`
|
|
built-in macro works
|
|
|
|
def_ block (function() _END_CLOSE_
|
|
|
|
my_fun block
|
|
do_something_later()
|
|
end
|
|
|
|
When the current scope closes with `end`, LuaMacro appends the necessary ')' to
|
|
make this syntax valid.
|
|
|
|
A common use of macros in both C and Lua is to inline optimized code for a case.
|
|
The Lua function `assert()` always evaluates its second argument, which is not
|
|
always optimal:
|
|
|
|
def_ ASSERT(condn,expr) if condn then else error(expr) end
|
|
|
|
ASSERT(2 == 1,"damn! ".. 2 .." is not equal to ".. 1)
|
|
|
|
If the message expression is expensive to execute, then this can give better
|
|
performance at the price of some extra code. `ASSERT` is now a statement, not a
|
|
function, however.
|
|
|
|
### Conditional Compilation
|
|
|
|
For this to work consistently, you need to use the `@` shortcut:
|
|
|
|
@include 'test.inc'
|
|
@def A 10
|
|
...
|
|
|
|
This makes macro 'preprocessor' statements stand out more. Conditional compilation
|
|
works as you would expect from C:
|
|
|
|
-- test-cond.lua
|
|
@if A
|
|
print 'A defined'
|
|
@else
|
|
print 'A not defined'
|
|
@end
|
|
@if os.getenv 'P'
|
|
print 'Env P is defined'
|
|
@end
|
|
|
|
Now, what is `A`? It is a Lua expression which is evaluated at _preprocessor_
|
|
time, and if it returns any value except `nil` or `false` it is true, using
|
|
the usual Lua rule. Assuming `A` is just a global variable, how can it be set?
|
|
|
|
$ luam test-cond.lua
|
|
A not defined
|
|
$ luam -VA test-cond.lua
|
|
A defined
|
|
$ export P=1
|
|
$ luam test-cond.lua
|
|
A not defined
|
|
Env P is defined
|
|
|
|
Although this looks very much like the standard C preprocessor, the implementation
|
|
is rather different - `@if` is a special macro which evaluates its argument
|
|
(everything on the rest of the line) as a _Lua expression_
|
|
and skips upto `@end` (or `@else` or `@elseif`) if that condition is false.
|
|
|
|
|
|
### Using macro.define
|
|
|
|
`macro.define` is less convenient than `def_` but much more powerful. The extended
|
|
form allows the substitution to be a _function_ which is called in-place at compile
|
|
time. These definitions must be loaded before they can be used,
|
|
either with `-l` or with `@require`.
|
|
|
|
macro.define('DATE',function()
|
|
return '"'..os.date('%c')..'"'
|
|
end)
|
|
|
|
Any text which is returned will be tokenized and inserted into the output stream.
|
|
The explicit quoting here is needed to ensure that `DATE` will be replaced by the
|
|
string "04/30/11 09:57:53". ('%c' gives you the current locale's version of the
|
|
date; for a proper version of this macro, best to use `os.date` [with more explicit
|
|
formats](http://www.lua.org/pil/22.1.html) .)
|
|
|
|
This function can also return nothing, which allows you to write macro code purely
|
|
for its _side-effects_.
|
|
|
|
Non-operator characters like `@`,`$`, etc can be used as macros. For example, say
|
|
you like shell-like notation `$HOME` for expanding environment variables in your
|
|
scripts.
|
|
|
|
macro.define '$(x) os.getenv(_STR_(x))'
|
|
|
|
A script can now say `$(PATH)` and get the expected expansion, Make-style. But we
|
|
can do better and support `$PATH` directly:
|
|
|
|
macro.define('$',function(get)
|
|
local var = get:iden()
|
|
return 'os.getenv("'..var..'")'
|
|
end)
|
|
|
|
If a macro has no parameters, then the substitution function receives a 'getter'
|
|
object. This provides methods for extracting various token types from the input
|
|
stream. Here the `$` macro must be immediately followed by an identifier.
|
|
|
|
We can do better, and define `$` so that something like `$(pwd)` has the same
|
|
meaning as the Unix shell:
|
|
|
|
macro.define('$',function(get)
|
|
local t,v = get()
|
|
if t == 'iden' then
|
|
return 'os.getenv("'..v..'")'
|
|
elseif t == '(' then
|
|
local rest = get:upto ')'
|
|
return 'os.execute("'..tostring(rest)..'")'
|
|
end
|
|
end)
|
|
|
|
(The getter `get` is callable, and returns the type and value of the next token.)
|
|
|
|
It is probably a silly example, but it illustrates how a macro can be overloaded
|
|
based on its lexical context. Much of the expressive power of LuaMacro comes from
|
|
allowing macros to fetch their own parameters in this way. It allows us to define
|
|
new syntax and go beyond 'pseudo-functions', which is more important for a
|
|
conventional-syntax language like Lua, rather than Lisp where everything looks like
|
|
a function anyway. These kinds of macros are called 'reader' macros in the Lisp world,
|
|
since they temporarily take over reading code.
|
|
|
|
It is entirely possible for macros to create macros; that is what `def_` does.
|
|
Consider how to add the concept of `const` declarations to Lua:
|
|
|
|
const N,M = 10,20
|
|
|
|
Here is one solution:
|
|
|
|
macro.define ('const',function(get)
|
|
get() -- skip the space
|
|
local vars = get:idens '='
|
|
local values = get:list '\n'
|
|
for i,name in ipairs(vars) do
|
|
macro.assert(values[i],'each constant must be assigned!')
|
|
macro.define_scoped(name,tostring(values[i]))
|
|
end
|
|
end)
|
|
|
|
The key to making these constants well-behaved is `define_scoped`, which installs a
|
|
block handler which resets the macro to its original value, which is usually `nil`.
|
|
This test script shows how the scoping works:
|
|
|
|
require_ 'const'
|
|
do
|
|
const N,M = 10,20
|
|
do
|
|
const N = 5
|
|
assert(N == 5)
|
|
end
|
|
assert(N == 10 and M == 20)
|
|
end
|
|
assert(N == nil and M == nil)
|
|
|
|
|
|
If we were designing a DSL intended for non-technical users, then we cannot just
|
|
say to them 'learn the language properly - go read PiL!'. It would be easier to
|
|
explain:
|
|
|
|
forall x in {10,20,30} do
|
|
|
|
than the equivalent generic `for` loop. `forall` can be implemented fairly simply
|
|
as a macro:
|
|
|
|
macro.define('forall',function(get)
|
|
local var = get:iden()
|
|
local t,v = get:next() -- will be 'in'
|
|
local rest = tostring(get:upto 'do')
|
|
return ('for _,%s in ipairs(%s) do'):format(var,rest)
|
|
end)
|
|
|
|
That is, first get the loop variable, skip `in`, grab everything up to `do` and
|
|
output the corresponding `for` statement.
|
|
|
|
Useful macros can often be built using these new forms. For instance, here is a
|
|
simple list comprehension macro:
|
|
|
|
macro.define('L(expr,select) '..
|
|
'(function() local res = {} '..
|
|
' forall select do res[#res+1] = expr end '..
|
|
'return res end)()'
|
|
)
|
|
|
|
For example, `L(x^2,x in t)` will make a list of the squares of all elements in `t`.
|
|
|
|
Why don't we use a long string here? Because we don't wish to insert any extra line
|
|
feeds in the output.`macro.forall` defines more sophisticated `forall` statements
|
|
and list comprehension expressions, but the principle is the same - see 'tests/test-forall.lua'
|
|
|
|
There is a second argument passed to the substitution function, which is a 'putter'
|
|
object - an object for building token lists. For example, a useful shortcut for
|
|
anonymous functions:
|
|
|
|
M.define ('\\',function(get,put)
|
|
local args = get:idens('(')
|
|
local body = get:list()
|
|
return put:keyword 'function' '(' : idens(args) ')' :
|
|
keyword 'return' : list(body) : space() : keyword 'end'
|
|
end)
|
|
|
|
The `put` object has methods for appending particular kinds of tokens, such as
|
|
keywords and strings, and is also callable for operator tokens. These always return
|
|
the object itself, so the output can be built up with chaining.
|
|
|
|
Consider `\x,y(x+y)`: the `idens` getter grabs a comma-separated list of identifier
|
|
names upto the given token; the `list` getter grabs a general argument list. It
|
|
returns a list of token lists and by default stops at ')'. This 'lambda' notation
|
|
was suggested by Luiz Henrique de Figueiredo as something easily parsed by any
|
|
token-filtering approach - an alternative notation `|x,y| x+y` has been
|
|
[suggested](http://lua-users.org/lists/lua-l/2009-12/msg00071.html) but is
|
|
generally impossible to implement using a lexical scanner, since it would have to
|
|
parse the function body as an expression. The `\\` macro also has the advantage
|
|
that the operator precedence is explicit: in the case of `\\(42,'answer')` it is
|
|
immediately clear that this is a function of no arguments which returns two values.
|
|
|
|
I would not necessarily suggest that lambdas are a good thing in
|
|
production code, but they _can_ be useful in iteractive exploration and within tests.
|
|
|
|
Macros with explicit parameters can define a substitution function, but this
|
|
function receives the values themselves, not the getter and putter objects. These
|
|
values are _token lists_ and must be converted into the expected types using the
|
|
token list methods:
|
|
|
|
macro.define('test_(var,start,finish)',function(var,start,finish)
|
|
var,start,finish = var:get_iden(),start:get_number(),finish:get_number()
|
|
print(var,start,finish)
|
|
end)
|
|
|
|
|
|
Since no `put` object is received, such macros need to construct their own:
|
|
|
|
local put = M.Putter()
|
|
...
|
|
return put
|
|
|
|
(They can of course still just return the substitution as text.)
|
|
|
|
### Dynamically controlling macro expansion
|
|
|
|
Consider this loop-unrolling macro:
|
|
|
|
do_(i,1,3,
|
|
y = y + i
|
|
)
|
|
|
|
which will expand as
|
|
|
|
y = y + 1
|
|
y = y + 2
|
|
y = y + 3
|
|
|
|
For each iteration, it needs to define a local macro `i` which expands to 1,2 and 3.
|
|
|
|
macro.define('do_(v,s,f,stat)',function(var,start,finish,statements)
|
|
local put = macro.Putter()
|
|
var,start,finish = var:get_iden(),start:get_number(),finish:get_number()
|
|
macro.push_token_stack('do_',var)
|
|
for i = start, finish do
|
|
-- output `set_ <var> <value> `
|
|
put:iden 'set_':iden(var):number(i):space()
|
|
put:tokens(statements)
|
|
end
|
|
-- output `undef_ <var> <value>`
|
|
put:iden 'undef_':iden(var)
|
|
-- output `_POP_ 'do_'`
|
|
put:iden '_DROP_':string 'do_'
|
|
return put
|
|
end)
|
|
|
|
Ignoring the macro stack manipulation for a moment, it works by inserting `set_`
|
|
macro assignments into the output. That is, the raw output looks like this:
|
|
|
|
set_ i 1
|
|
y = y + i
|
|
set_ i 2
|
|
y = y + i
|
|
set_ i 2
|
|
y = y + i
|
|
undef_ i
|
|
_DROP_ 'do_'
|
|
|
|
It's important here to understand that LuaMacro does not do _recursive_
|
|
substitution. Rather, the output of macros is pushed out to the stream which is
|
|
then further substituted, etc. So we do need these little helper macros to set the
|
|
loop variable at each point.
|
|
|
|
Using the macro stack allows macros to be aware that they are expanding inside a
|
|
`do_` macro invocation. Consider `tuple`, which is another macro which creates
|
|
macros:
|
|
|
|
tuple(3) A,B
|
|
A = B
|
|
|
|
which would expand as
|
|
|
|
local A_1,A_2,A_3,B_1,B_2,B_3
|
|
A_1,A_2,A_3 = B_1,B_2,B_3
|
|
|
|
But we would like
|
|
|
|
do_(i,1,3,
|
|
A = B/2
|
|
)
|
|
|
|
to expand as
|
|
|
|
A_1 = B_1/2
|
|
A_2 = B_2/2
|
|
A_2 = B_2/2
|
|
|
|
And here is the definition:
|
|
|
|
macro.define('tuple',function(get)
|
|
get:expecting '('
|
|
local N = get:number()
|
|
get:expecting ')'
|
|
get:expecting 'space'
|
|
local names = get:idens '\n'
|
|
for _,name in ipairs(names) do
|
|
macro.define(name,function(get,put)
|
|
local loop_var = macro.value_of_macro_stack 'do_'
|
|
if loop_var then
|
|
local loop_idx = tonumber(macro.get_macro_value(loop_var))
|
|
return put:iden (name..'_'..loop_idx)
|
|
else
|
|
local out = {}
|
|
for i = 1,N do
|
|
out[i] = name..'_'..i
|
|
end
|
|
return put:idens(out)
|
|
end
|
|
end)
|
|
end
|
|
end)
|
|
|
|
The first expansion case happens if we are not within a `do_` macro; a simple list
|
|
of names is outputted. Otherwise, we know what the loop variable is, and can
|
|
directly ask for its value.
|
|
|
|
### Operator Macros
|
|
|
|
You can of course define `@` to be a macro; a new feature allows you to add new
|
|
operator tokens:
|
|
|
|
macro.define_tokens {'##','@-'}
|
|
|
|
which can then be used with `macro.define`, but also now with `def_`. It's now
|
|
possible to define a list comprehension syntax that reads more naturally, e.g.
|
|
`{|x^2| i=1,10}` by making `{|` into a new token.
|
|
|
|
Up to now, making a Lua operator token such as `.` into a macro was not so useful.
|
|
Such a macro may now return an extra value which indicates that the operator should
|
|
simply 'pass through' as is. Consider defining a `with` statement:
|
|
|
|
with A do
|
|
.x = 1
|
|
.y = 2
|
|
end
|
|
|
|
I've deliberately indicated the fields using a dot (a rare case of Visual Basic
|
|
syntax being superior to Delphi). So it is necessary to overload '.' and look at
|
|
the previous token: if it isn't a case like `name.` or `].` then we prepend the
|
|
table. Otherwise, the operator must simply _pass through_, to prevent an
|
|
uncontrolled recursion.
|
|
|
|
M.define('with',function(get,put)
|
|
M.define_scoped('.',function()
|
|
local lt,lv = get:peek(-1,true) -- peek before the period...
|
|
if lt ~= 'iden' and lt ~= ']' then
|
|
return '_var.'
|
|
else
|
|
return nil,true -- pass through
|
|
end
|
|
end)
|
|
local expr = get:upto 'do'
|
|
return 'do local _var = '..tostring(expr)..'; '
|
|
end)
|
|
|
|
Again, scoping means that this behaviour is completely local to the with-block.
|
|
|
|
A more elaborate experiment is `cskin.lua` in the tests directory. This translates
|
|
a curly-bracket form into standard Lua, and at its heart is defining '{' and '}' as
|
|
macros. You have to keep a brace stack, because these tokens still have their old
|
|
meaning and the table constructor in this example must still work, while the
|
|
trailing brace must be converted to `end`.
|
|
|
|
if (a > b) {
|
|
t = {a,b}
|
|
}
|
|
|
|
### Pass-Through Macros
|
|
|
|
Normally a macro replaces the name (plus any arguments) with the substitution. It
|
|
is sometimes useful to pass the name through, but not to push the name into the
|
|
token stream - otherwise we will get an endless expansion.
|
|
|
|
macro.define('fred',function()
|
|
print 'fred was found'
|
|
return nil, true
|
|
end)
|
|
|
|
This has absolutely no effect on the preprocessed text ('fred' remains 'fred', but
|
|
has a side-effect. This happens if the substitution function returns a second
|
|
`true` value. You can look at the immediate lexical environment with `peek`:
|
|
|
|
macro.define('fred',function(get)
|
|
local t,v = get:peek(1)
|
|
if t == 'string' then
|
|
local str = get:string()
|
|
return 'fred_'..str
|
|
end
|
|
return nil,true
|
|
end)
|
|
|
|
Pass-through macros are useful when each macro corresponds to a Lua variable; they
|
|
allow such variables to have a dual role.
|
|
|
|
An example would be Python-style lists. The [Penlight
|
|
List](http://stevedonovan.github.com/Penlight/api/modules/pl.List.html) class has
|
|
the same functionality as the built-in Python list, but does not have any
|
|
syntactical support:
|
|
|
|
> List = require 'pl.List'
|
|
> ls = List{10,20,20}
|
|
> = ls:slice(1,2)
|
|
{10,20}
|
|
> ls:slice_assign(1,2,{10,11,20,21})
|
|
> = ls
|
|
{10,11,20,21,30}
|
|
|
|
It would be cool if we could add a little bit of custom syntax to make this more
|
|
natural. What we first need is a 'macro factory' which outputs the code to create
|
|
the lists, and also suitable macros with the same names.
|
|
|
|
-- list <var-list> [ = <init-list> ]
|
|
M.define ('list',function(get)
|
|
get() -- skip space
|
|
-- 'list' acts as a 'type' followed by a variable list, which may be
|
|
-- followed by initial values
|
|
local values
|
|
local vars,endt = get:idens (function(t,v)
|
|
return t == '=' or (t == 'space' and v:find '\n')
|
|
end)
|
|
-- there is an initialization list
|
|
if endt[1] == '=' then
|
|
values,endt = get:list '\n'
|
|
else
|
|
values = {}
|
|
end
|
|
-- build up the initialization list
|
|
for i,name in ipairs(vars) do
|
|
M.define_scoped(name,list_check)
|
|
values[i] = 'List('..tostring(values[i] or '')..')'
|
|
end
|
|
local lcal = M._interactive and '' or 'local '
|
|
return lcal..table.concat(vars,',')..' = '..table.concat(values,',')..tostring(endt)
|
|
end)
|
|
|
|
Note that this is a fairly re-usable pattern; it requires the type constructor
|
|
(`List` in this case) and a type-specific macro function (`list_check`). The only
|
|
tricky bit is handling the two cases, so the `idens` method finds the end using a
|
|
function, not a simple token. `idens`, like `list`, returns the list and the token
|
|
that ended the list, so we can use `endt` to check.
|
|
|
|
list a = {1,2,3}
|
|
list b
|
|
|
|
becomes
|
|
|
|
local a = List({1,2,3})
|
|
local b = List()
|
|
|
|
unless we are in interactive mode, where `local` is not appropriate!
|
|
|
|
Each of these list macro/variables may be used in several ways:
|
|
|
|
- directly `a` - no action!
|
|
- `a[i]` - plain table index
|
|
- `a[i:j]` - a list slice. Will be `a:slice(i,j)` normally, but must
|
|
be `a:slice_assign(i,j,RHS)` if on the right-hand side of an assignment.
|
|
|
|
The substitution function checks these cases by appropriate look-ahead:
|
|
|
|
function list_check (get,put)
|
|
local t,v = get:peek(1)
|
|
if t ~= '[' then return nil, true end -- pass-through; plain var reference
|
|
get:expecting '['
|
|
local args = get:list(']',':')
|
|
-- it's just plain table access
|
|
if #args == 1 then return '['..tostring(args[1])..']',true end
|
|
|
|
-- two items separated by a colon; use sensible defaults
|
|
M.assert(#args == 2, "slice has two arguments!")
|
|
local start,finish = tostring(args[1]),tostring(args[2])
|
|
if start == '' then start = '1' end
|
|
if finish == '' then finish = '-1' end
|
|
|
|
-- look ahead to see if we're on the left hand side of an assignment
|
|
if get:peek(1) == '=' then
|
|
get:next() -- skip '='
|
|
local rest,eoln = get:upto '\n'
|
|
rest,eoln = tostring(rest),tostring(eoln)
|
|
return (':slice_assign(%s,%s,%s)%s'):format(start,finish,rest,eoln),true
|
|
else
|
|
return (':slice(%s,%s)'):format(start,finish),true
|
|
end
|
|
end
|
|
|
|
This can be used interactively, like so (it requires the Penlight list library.)
|
|
|
|
$> luam -llist -i
|
|
Lua 5.1.4 Copyright (C) 1994-2008 Lua.org, PUC-Rio
|
|
Lua Macro 2.3.0 Copyright (C) 2007-2011 Steve Donovan
|
|
> list a = {'one','two'}
|
|
> = a:map(\x(x:sub(1,1)))
|
|
{o,t}
|
|
> a:append 'three'
|
|
> a:append 'four'
|
|
> = a
|
|
{one,two,three,four}
|
|
> = a[2:3]
|
|
{two,three}
|
|
> = a[2:2] = {'zwei','twee'}
|
|
{one,zwei,twee,three,four}
|
|
> = a[1:2]..{'five'}
|
|
{one,zwei,five}
|
|
|
|
### Preprocessing C
|
|
|
|
With the 2.2 release, LuaMacro can preprocess C files, by the inclusion of a C LPeg
|
|
lexer based on work by Peter Odding. This may seem a semi-insane pursuit, given
|
|
that C already has a preprocessor, (which is widely considered a misfeature.)
|
|
However, the macros we are talking about are clever, they can maintain state, and
|
|
can be scoped lexically.
|
|
|
|
One of the irritating things about C is the need to maintain separate include
|
|
files. It would be better if we could write a module like this:
|
|
|
|
|
|
// dll.c
|
|
#include "dll.h"
|
|
|
|
export {
|
|
typedef struct {
|
|
int ival;
|
|
} MyStruct;
|
|
}
|
|
|
|
export int one(MyStruct *ms) {
|
|
return ms->ival + 1
|
|
}
|
|
|
|
export int two(MyStruct *ms) {
|
|
return 2*ms->ival;
|
|
}
|
|
|
|
and have the preprocessor generate an apppropriate header file:
|
|
|
|
|
|
#ifndef DLL_H
|
|
#define DLL_H
|
|
typedef struct {
|
|
int ival;
|
|
} MyStruct;
|
|
|
|
int one(MyStruct *ms) ;
|
|
int two(MyStruct *ms) ;
|
|
#endif
|
|
|
|
The macro `export` is straightforward:
|
|
|
|
|
|
M.define('export',function(get)
|
|
local t,v = get:next()
|
|
local decl,out
|
|
if v == '{' then
|
|
decl = tostring(get:upto '}')
|
|
decl = M.substitute_tostring(decl)
|
|
f:write(decl,'\n')
|
|
else
|
|
decl = v .. ' ' .. tostring(get:upto '{')
|
|
decl = M.substitute_tostring(decl)
|
|
f:write(decl,';\n')
|
|
out = decl .. '{'
|
|
end
|
|
return out
|
|
end)
|
|
|
|
It looks ahead and if it finds a `{}` block it writes the block as text to a file
|
|
stream; otherwise writes out the function signature. `get:upto '}'` will do the
|
|
right thing here since it keeps track of brace level. To allow any other macro
|
|
expansions to take place, `substitute_tostring` is directly called.
|
|
|
|
`tests/cexport.lua` shows how this idea can be extended, so that the generated
|
|
header is only updated when it changes.
|
|
|
|
To preprocess C with `luam`, you need to specify the `-C` flag:
|
|
|
|
luam -C -lcexport -o dll.c dll.lc
|
|
|
|
Have a look at [lc](modules/macro.lc.html) which defines a simplified way to write
|
|
Lua bindings in C. Here is `tests/str.l.c`:
|
|
|
|
// preprocess using luam -C -llc -o str.c str.l.c
|
|
#include <string.h>
|
|
|
|
module "str" {
|
|
|
|
def at (Str s, Int i = 0) {
|
|
lua_pushlstring(L,&s[i-1],1);
|
|
return 1;
|
|
}
|
|
|
|
def upto (Str s, Str delim = " ") {
|
|
lua_pushinteger(L, strcspn(s,delim) + 1);
|
|
return 1;
|
|
}
|
|
|
|
}
|
|
|
|
The result looks like this:
|
|
|
|
// preprocess using luam -C -llc -o str.c str.l.c
|
|
#line 2 "str.lc"
|
|
#include <string.h>
|
|
|
|
#include <lua.h>
|
|
#include <lauxlib.h>
|
|
#include <lualib.h>
|
|
#ifdef WIN32
|
|
#define EXPORT __declspec(dllexport)
|
|
#else
|
|
#define EXPORT
|
|
#endif
|
|
typedef const char *Str;
|
|
typedef const char *StrNil;
|
|
typedef int Int;
|
|
typedef double Number;
|
|
typedef int Boolean;
|
|
|
|
|
|
#line 6 "str.lc"
|
|
static int l_at(lua_State *L) {
|
|
const char *s = luaL_checklstring(L,1,NULL);
|
|
int i = luaL_optinteger(L,2,0);
|
|
|
|
#line 7 "str.lc"
|
|
|
|
lua_pushlstring(L,&s[i-1],1);
|
|
return 1;
|
|
}
|
|
|
|
static int l_upto(lua_State *L) {
|
|
const char *s = luaL_checklstring(L,1,NULL);
|
|
const char *delim = luaL_optlstring(L,2," ",NULL);
|
|
|
|
#line 12 "str.lc"
|
|
|
|
lua_pushinteger(L, strcspn(s,delim) + 1);
|
|
return 1;
|
|
}
|
|
|
|
static const luaL_reg str_funs[] = {
|
|
{"at",l_at},
|
|
{"upto",l_upto},
|
|
{NULL,NULL}
|
|
};
|
|
|
|
EXPORT int luaopen_str (lua_State *L) {
|
|
luaL_register (L,"str",str_funs);
|
|
|
|
return 1;
|
|
}
|
|
|
|
Note the line directives; this makes working with macro-ized C code much easier
|
|
when the inevitable compile and run-time errors occur. `lc` takes away some
|
|
of the more irritating bookkeeping needed in writing C extensions
|
|
(here I only have to mention function names once)
|
|
|
|
`lc` was used for the [winapi](https://github.com/stevedonovan/winapi) project to
|
|
preprocess [this
|
|
file](https://github.com/stevedonovan/winapi/blob/master/winapi.l.c)
|
|
into [standard C](https://github.com/stevedonovan/winapi/blob/master/winapi.c).
|
|
|
|
This used an extended version of `lc` which handled the largely superficial
|
|
differences between the Lua 5.1 and 5.2 API.
|
|
|
|
(The curious thing is that `winapi` is my only project where I've leant on
|
|
LuaMacro, and it's all in C.)
|
|
|
|
### A Simple Test Framework
|
|
|
|
LuaMacro comes with yet another simple test framework - I apologize for this in
|
|
advance, because there are already quite enough. But consider it a demonstration
|
|
of how a little macro sugar can make tests more readable, even if you are
|
|
uncomfortable with them in production code (see `tests/test-test.lua`)
|
|
|
|
require_ 'assert'
|
|
assert_ 1 == 1
|
|
assert_ "hello" matches "^hell"
|
|
assert_ x.a throws 'attempt to index global'
|
|
|
|
The last line is more interesting, since it's transparently wrapping
|
|
the offending expression in an anonymous function. The expanded output looks
|
|
like this:
|
|
|
|
T_ = require 'macro.lib.test'
|
|
T_.assert_eq(1 ,1)
|
|
T_.assert_match("hello" ,"^hell")
|
|
T_.assert_match(T_.pcall_no(function() return x.a end),'attempt to index global')
|
|
|
|
(This is a generally useful pattern - use macros to provide a thin layer of sugar
|
|
over the underlying library. The `macro.assert` module is only 75 lines long, with
|
|
comments - its job is to format code to make using the implementation easier.)
|
|
|
|
Remember that the predefined meaning of @ is to convert `@name` into `name_`. So we
|
|
could just as easily say `@assert 1 == 1` and so forth.
|
|
|
|
Lua functions often return multiple values or tables:
|
|
|
|
two = \(40,2)
|
|
table2 = \({40,2})
|
|
@assert two() == (40,2)
|
|
@assert table2() == {40,2}
|
|
|
|
For a proper grown-up Lua testing framework
|
|
that uses LuaMacro, see [Specl](http://gvvaughan.github.io/specl).
|
|
|
|
|
|
### Implementation
|
|
|
|
It is not usually necessary to understand the underlying representation of token
|
|
lists, but I present it here as a guide to understanding the code.
|
|
|
|
#### Token Lists
|
|
|
|
The token list representation of the expression `x+1` is:
|
|
|
|
{{'iden','x'},{'+','+'},{'number','1'}}
|
|
|
|
which is the form returned by the LPeg lexical analyser. Please note that there are
|
|
also 'space' and 'comment' tokens in the stream, which is a big difference from the
|
|
token-filter standard.
|
|
|
|
The `TokenList` type defines `__tostring` and some helper methods for these lists.
|
|
|
|
The following macro is an example of the lower-level coding needed without the
|
|
usual helpers:
|
|
|
|
local macro = require 'macro'
|
|
macro.define('qw',function(get,put)
|
|
local append = table.insert
|
|
local t,v = get()
|
|
local res = {{'{','{'}}
|
|
t,v = get:next()
|
|
while t ~= ')' do
|
|
if t ~= ',' then
|
|
append(res,{'string','"'..v..'"'})
|
|
append(res,{',',','})
|
|
end
|
|
t,v = get:next()
|
|
end
|
|
append(res,{'}','}'})
|
|
return res
|
|
end)
|
|
|
|
We're using the getter `next` method to skip any whitespace, but building up the
|
|
substitution without a putter, just manipulating the raw token list. `qw` takes a
|
|
plain list of words, separated by spaces (and maybe commas) and makes it into a
|
|
list of strings. That is,
|
|
|
|
qw(one two three)
|
|
|
|
becomes
|
|
|
|
{'one','two','three'}
|
|
|
|
#### Program Structure
|
|
|
|
The main loop of `macro.substitute` (towards end of `macro.lua`) summarizes the
|
|
operation of LuaMacro:
|
|
|
|
There are two macro tables, `imacro` for classic name macros, and `smacro` for
|
|
operator style macros. They contain macro tables, which must have a `subst` field
|
|
containing the substitution and may have a `parms` field, which means that they
|
|
must be followed by their arguments in parentheses.
|
|
|
|
A keywords table is chiefly used to track block scope, e.g.
|
|
`do`,`if`,`function`,etc means 'increase block level' and `end`,`until` means
|
|
'decrease block level'. At this point, any defined block handlers for this level
|
|
will be evaluated and removed. These may insert tokens into the stream, like
|
|
macros. This is how something like `_END_CLOSE_` is implemented: the `end` causes
|
|
the block level to decrease, which fires a block handler which passes `end` through
|
|
and inserts a closing `)`.
|
|
|
|
Any keyword may also have an associated keyword handler, which works rather like a
|
|
macro substitution, except that the keyword itself is always passed through first.
|
|
(Allowing keywords as regular macros would generally be a bad idea because of the
|
|
recursive substitution problem.)
|
|
|
|
The macro `subst` field may be a token list or a function. if it is a function then
|
|
that function is called, with the parameters as token lists if the macro defined
|
|
formal parameters, or with getter and setter objects if not. If the result is text
|
|
then it is parsed into a token list.
|