[RFC] Supporting Lua Scripting in LLDB

Hi everyone,

Earlier this year, when I was working on the Python script
interpreter, I thought it would be interesting to see what it would
take to support other scripting languages in LLDB. Lua, being designed
to be embedded, quickly came to mind. The idea remained in the back of
my head, but I never really got around to it, until now.

I was pleasantly surprised to see that it only took me a few hours to
create a basic but working prototype. It supports running single
commands as well as an interactive interpreter and has access to most
of the SB API through bindings generated by SWIG. Of course it's far
from complete.

Before I invest more time in this, I'm curious to hear what the
community thinks about adding support for another scripting language
to LLDB. Do we need both Lua and Python?

Here are some of the reasons off the top of my head as to why the
answer might be
"yes":

- The cost for having another scripting language is pretty small. The
Lua script interpreter is very simple and SWIG can reuse the existing
interfaces to generate the bindings.
- LLDB is designed to support multiple script interpreters, but in
reality we only have one. Actually exercising this property ensures
that we don't unintentionally break that design assumptions.
- The Python script interpreter is complex. It's hard to figure out
what's really needed to support another language. The Lua script
interpreter on the other hand is pretty straightforward. Common code
can be shared by both.
- Currently Python support is disabled for some targets, like Android
and iOS. Lua could enable scripting for these environments where
having all of Python is overkill or undesirable.

Reasons why the answer might be "no":

- Are our users going to use this?
- Supporting Python is an ongoing pain. Do we really want to risk
burdening ourselves with another scripting language?
- The Python API is very well tested. We'd need to add test for the
Lua bindings as well. It's unlikely this will match the coverage of
Python, and probably even undesirable, because what's the point of
testing the same thing twice. Also, do we want to risk fragmenting
tests across two scripting languages?

There's probably a bunch more stuff that I didn't even think of. :slight_smile:

Personally I lean towards "yes" because I feel the benefits outweigh
the costs, but of course that remains to be seen. Please let me know
what you think!

If you're curious about what this looks like, you can find the patches
on my fork on GitHub:
https://github.com/JDevlieghere/llvm-project/tree/lua

Cheers,
Jonas

I think this is great, thanks for working on this! My only concern is that I would prefer if we could limit the Lua tests to just the Lua->C++ calling machinery (e.g., that we handle Lua strings correctly and all that jazz) and not fragment our test suit. Otherwise Lua seems to require far less maintenance work than Python, so I am not worried about the technical debt this adds.

I think this would be a very interesting project, and would allow us to
flesh out the details of the script interpreter interface.

A lot of the complexity in our python code comes from the fact that
python can be (a) embedded into lldb and (b) lldb can be embedded into
python. It's been a while since I worked with lua, but from what I
remember, lua was designed to make (a) easy., and I don't think (b) was
ever a major goal (though it can always be done ways, of course)..

Were you intending to implement both of these directions or just one of
them ((a), I guess)?

The reason I am asking this is because doing only (a) will definitely
make lua support simpler than python, but it will also mean it won't be
a "python-lite".

Both of these options are fine -- I just want to understand where you're
going with this. It also has some impact on the testing strategy, as our
existing python tests are largely using mode (b).

Another question I'm interested in is how deeply will this
multi-interpreter thing go? Will it be a build time option, will it be
selectable at runtime, but we'll have only one script interpreter per
SBDebugger, or will we be able to freely mix'n'match scripting languages?

I think the last option would be best because of data formatters
(otherwise one would have a problem is some of his data formatters are
written in python and others in lua), but it would also create a lot
more of new api surface, as one would have to worry about consistency of
the lua and python views of lldb, etc.

Python is in general a no-go for a BSD basesystem (micropython can be an
exception.. but Python is so large today that we can reevaluate this
statement at some point). This is why we need to either disable certain
features in LLDB or split LLDB between everything without Python in the
base and the rest (like data formatters) through 3rd party packaging system.

Once we will finish our goals for LLDB/NetBSD we will work on this
separation in order to include LLDB in the basesystem.

Lua is a part of the NetBSD and FreeBSD basesystem so all the scripting
issues can be gone. I cannot speak for Darwin or Windows basesystem, but
I can imagine that Lua is a smaller issue than Python.

Personally, I am interested in another [today] niche scripting language
(Tcl), but adding Lua support should make LLDB easier for scripting in
general. Switching Lua bindings to other language is simple, so this is
another benefit in my eyes, especially since Python bindings are machine
generated today.

I think this would be a very interesting project, and would allow us to
flesh out the details of the script interpreter interface.

A lot of the complexity in our python code comes from the fact that
python can be (a) embedded into lldb and (b) lldb can be embedded into
python. It's been a while since I worked with lua, but from what I
remember, lua was designed to make (a) easy., and I don't think (b) was
ever a major goal (though it can always be done ways, of course)..

Were you intending to implement both of these directions or just one of
them ((a), I guess)?

Thanks for pointing this out. Indeed, my goal is only to support (a)
for exactly the reasons you brought up.

The reason I am asking this is because doing only (a) will definitely
make lua support simpler than python, but it will also mean it won't be
a "python-lite".

Both of these options are fine -- I just want to understand where you're
going with this. It also has some impact on the testing strategy, as our
existing python tests are largely using mode (b).

That's part of my motivation for *not* doing (b). I really don't want
to create/maintain another (Lua driven) test suite.

Another question I'm interested in is how deeply will this
multi-interpreter thing go? Will it be a build time option, will it be
selectable at runtime, but we'll have only one script interpreter per
SBDebugger, or will we be able to freely mix'n'match scripting languages?

There is one script interpreter per debugger. As far as I can tell
from the code this is already enforced.

I think the last option would be best because of data formatters
(otherwise one would have a problem is some of his data formatters are
written in python and others in lua), but it would also create a lot
more of new api surface, as one would have to worry about consistency of
the lua and python views of lldb, etc.

That's an interesting problem I didn't think of. I'm definitely not
excited about having the same data formatter implemented in both
scripting languages. Mixing scripting languages makes sense for when
your LLDB is configured to support both Python and Lua, but what do
you do for people that want only Lua? They might still want to
re-implement some data formatters they care about... Anyway, given
that we don't maintain/ship data formatters in Python ourselves, maybe
this isn't that big of an issue at all?

Given that the response so far has been positive, I've put up the
patches for review:

https://reviews.llvm.org/D71232
https://reviews.llvm.org/D71234
https://reviews.llvm.org/D71235

Jonas

I think this would be a very interesting project, and would allow us to
flesh out the details of the script interpreter interface.

A lot of the complexity in our python code comes from the fact that
python can be (a) embedded into lldb and (b) lldb can be embedded into
python. It's been a while since I worked with lua, but from what I
remember, lua was designed to make (a) easy., and I don't think (b) was
ever a major goal (though it can always be done ways, of course)..

Were you intending to implement both of these directions or just one of
them ((a), I guess)?

Thanks for pointing this out. Indeed, my goal is only to support (a)
for exactly the reasons you brought up.

The reason I am asking this is because doing only (a) will definitely
make lua support simpler than python, but it will also mean it won't be
a "python-lite".

Both of these options are fine -- I just want to understand where you're
going with this. It also has some impact on the testing strategy, as our
existing python tests are largely using mode (b).

That's part of my motivation for *not* doing (b). I really don't want
to create/maintain another (Lua driven) test suite.

I certainly see where you're coming from, but I'm not sure if this will
actually achieve the intended effect. The thing is, not doing (b) does
not really reduce the testing surface that much -- it just makes the
tested APIs harder to reach. If Python didn't have (b), we wouldn't be
able to do "import lldb" in python, but that's about it. The full lldb
python api would still be reachable by starting lldb and typing "script".

What this means is that if lua doesn't support (b) then the lua bindings
will need to be tested by driving lldb from within the lua interpreter
embedded within lldb -- which doesn't exactly sound like a win. I'm not
saying this means we *must* implement (b), or that the alternative
solution will be more complex than testing via (b) (though I'm sure we
could come up with something way simpler than dotest), but I think we
should try to come up with a good testing story very early on.

Speaking of testing, will there be any bot configured to build&test the
lua code?

Another question I'm interested in is how deeply will this
multi-interpreter thing go? Will it be a build time option, will it be
selectable at runtime, but we'll have only one script interpreter per
SBDebugger, or will we be able to freely mix'n'match scripting languages?

There is one script interpreter per debugger. As far as I can tell
from the code this is already enforced.

I think the last option would be best because of data formatters
(otherwise one would have a problem is some of his data formatters are
written in python and others in lua), but it would also create a lot
more of new api surface, as one would have to worry about consistency of
the lua and python views of lldb, etc.

That's an interesting problem I didn't think of. I'm definitely not
excited about having the same data formatter implemented in both
scripting languages. Mixing scripting languages makes sense for when
your LLDB is configured to support both Python and Lua, but what do
you do for people that want only Lua? They might still want to
re-implement some data formatters they care about...

Well, if they really have a lldb build which only supports one scripting
language, then yes, they'd have to reimplement something -- there isn't
anything else that can be done. But it'd be a pitty if someone had lldb
which supports *both* languages and he is forced to choose which data
structures he wants pretty-printed.

Anyway, given
that we don't maintain/ship data formatters in Python ourselves, maybe
this isn't that big of an issue at all?

Hard to say without this thing actually being used. I certainly don't
think this is something that we need to solve right now, though I think
it something that we should be aware of, and not close the door on that
possibility completely.

And BTW we do ship python data formatters right now. The libc++ and
libstdc++ have some formatters written in python -- with the choice of
formatters being pretty arbitrary.

pl

I think this would be a very interesting project, and would allow us to
flesh out the details of the script interpreter interface.

A lot of the complexity in our python code comes from the fact that
python can be (a) embedded into lldb and (b) lldb can be embedded into
python. It's been a while since I worked with lua, but from what I
remember, lua was designed to make (a) easy., and I don't think (b) was
ever a major goal (though it can always be done ways, of course)..

Were you intending to implement both of these directions or just one of
them ((a), I guess)?

Thanks for pointing this out. Indeed, my goal is only to support (a)
for exactly the reasons you brought up.

The reason I am asking this is because doing only (a) will definitely
make lua support simpler than python, but it will also mean it won't be
a "python-lite".

Both of these options are fine -- I just want to understand where you're
going with this. It also has some impact on the testing strategy, as our
existing python tests are largely using mode (b).

That's part of my motivation for *not* doing (b). I really don't want
to create/maintain another (Lua driven) test suite.

I certainly see where you're coming from, but I'm not sure if this will
actually achieve the intended effect. The thing is, not doing (b) does
not really reduce the testing surface that much -- it just makes the
tested APIs harder to reach. If Python didn't have (b), we wouldn't be
able to do "import lldb" in python, but that's about it. The full lldb
python api would still be reachable by starting lldb and typing "script".

What this means is that if lua doesn't support (b) then the lua bindings
will need to be tested by driving lldb from within the lua interpreter
embedded within lldb -- which doesn't exactly sound like a win. I'm not
saying this means we *must* implement (b), or that the alternative
solution will be more complex than testing via (b) (though I'm sure we
could come up with something way simpler than dotest), but I think we
should try to come up with a good testing story very early on.

Speaking of testing, will there be any bot configured to build&test the
lua code?

Another question I'm interested in is how deeply will this
multi-interpreter thing go? Will it be a build time option, will it be
selectable at runtime, but we'll have only one script interpreter per
SBDebugger, or will we be able to freely mix'n'match scripting languages?

There is one script interpreter per debugger. As far as I can tell
from the code this is already enforced.

I think the last option would be best because of data formatters
(otherwise one would have a problem is some of his data formatters are
written in python and others in lua), but it would also create a lot
more of new api surface, as one would have to worry about consistency of
the lua and python views of lldb, etc.

That's an interesting problem I didn't think of. I'm definitely not
excited about having the same data formatter implemented in both
scripting languages. Mixing scripting languages makes sense for when
your LLDB is configured to support both Python and Lua, but what do
you do for people that want only Lua? They might still want to
re-implement some data formatters they care about...

Well, if they really have a lldb build which only supports one scripting
language, then yes, they'd have to reimplement something -- there isn't
anything else that can be done. But it'd be a pitty if someone had lldb
which supports *both* languages and he is forced to choose which data
structures he wants pretty-printed.

Anyway, given
that we don't maintain/ship data formatters in Python ourselves, maybe
this isn't that big of an issue at all?

Hard to say without this thing actually being used. I certainly don't
think this is something that we need to solve right now, though I think
it something that we should be aware of, and not close the door on that
possibility completely.

And BTW we do ship python data formatters right now. The libc++ and
libstdc++ have some formatters written in python -- with the choice of
formatters being pretty arbitrary.

This is an aside, but... We originally wrote all the data formatters in Python for dog fooding purposes. Then most of the ones relevant to macOS were replaced with C++ ones quite a while ago - partly so that there would be formatters for crucial objects that didn't depend on having Python around and partly as an experiment in the cost of C++ vrs. Python data formatters. The former purpose was obviated when ALL data formatters got errantly put under the 'ifdef PYTHON' - though that's been fixed recently - but the latter purpose was useful, mostly for showing that the C++ ones weren't noticeably faster... The only libcxx Python data formatter that actually gets added by lldb is the dequeue one. Looks like a few more get used from the gnu libstdcpp support (dequeue, vector, map and list). It is kind of nice to have a bunch of sophisticated examples out there for people to copy from, but I don't see any strong reason to be dogmatic about how to implement them.

To which point... I do agree with Pavel that if we support more than one scripting language fully, we should make it possible to load and use more than one script interpreter in a given lldb session. What gets added to lldb by the scripting interface is domain specific customizations, so in a working ecosystem for this facility, pieces will be coming from a wide variety of sources. If we require only one implementation language at a time, then eventually everyone will ignore one of the scripting languages, because the other has more interesting bits of functionality. Or worse we'll get divided into "my scripting language is better than yours" camps which while a constant source of fun in the CS community for years now, ends up not being very productive and in our case will reduce the overall utility of lldb.

The initial design of the scripting interpreters in lldb was to allow more than one to coexist. So for instance, "break command add" takes a scripting language as well as a script function. I don't think there's any fundamental reason why this couldn't be maintained.

Jim