Upstream an LLDB language plugin for D and support of custom expressions

Hi llvm-dev,

I'm writing here to discuss the addition of D language plugin to LLDB.
Following the issue #52223 from Bugzilla, we are currently using C/C++
language plugin for D. This project is part of the Symmetry Autumn of
Code 2021, which proposes to implement better integration for D into
LLDB.

This project is a highly requested feature for D developers who use
Apple-based devices since configuring GDB requires extra configuration
and self signing binaries.

One possible solution is to write a plugin using the Python public API,
although it has some limitations, since, AFAIK, custom expressions are
not currently well supported.

More context about the project milestones can be found
[here](lsferreira.net/public/assets/posts/d-saoc-2021-
01/milestones.md).

I would like to discuss the possibility of upstreaming the plugin in
C++ to the official tree and if there is anything in the roadmap to
support custom expressions via Python.

+lldb-dev

I think having more language plugins in LLDB is in general a good
thing. Given the past experience with other plugins (that have been
since removed) there are probably a few things that should be
clarified before we merge new plugins:

1. Who is going to maintain it?

In the past we had to remove the Go plugin because we couldn't find a
single person on this planet that wanted to maintain it after the
original author stopped working on it. It would be good to have some
confidence that the plugin doesn't end up without a maintainer in the
near future. Having multiple people involved seems like a good way to
show that this isn't likely to happen. I think having another person
reviewing the patches and being willing to maintain the code seems
also fine IMHO.

2. How is it going to be tested?

The most straightforward way of testing seems to be to just have some
D sources checked in, we call the D compiler to compile them and then
have a normal API test that tests the language-specific parts. But
that also means we would have a dependency on the D compiler to run
the tests which makes maintaining LLDB overall harder. We currently
get away with this approach for C/C++/Obj-C because we have a compiler
for those languages in LLVM itself.

The other solution would be to check in some pre-generated YAML'ified
debug info similar to what we are currently doing in some shell tests.
That means we don't have a dependency on the D compiler to run the
test suite, but those pregenerated tests have a tendency to be hard to
maintain (some existing tests require me to change my local username,
run macOS/Xcode and do a bunch of manual cleanup to end up with the
same output). Also they often clutter the repository with random
strings that show up in grep (I'm pretty sure we have at least one
test that contains the Google-internal network proxy config or
something similar). I think we can get something maintainable with
some simple script that can post-process whatever the D compiler
emits. Requiring people to install the D compiler to regenerate the
tests is IMHO a reasonable requirement as that rarely happens.

3. How is this going to be implemented?

Bit of a broad topic, but it would be good to know what's the general
plan for implementing the plugin. From what I can see we need at least
a DWARF parser, a lldb_private::Language-based plugin and a
TypeSystem-based plugin. I believe that's enough to get most of the
functionality in LLDB working (the only exception is the expression
parser), but there's a good chance I forgot something in that list.
The expression evaluator will probably be a whole topic on its own,
but I would expect it to be some small evaluator for simple D
expressions (+ maybe something relying on the Clang expression
evaluator for things like function calls, etc.).

Regarding the custom expression: I don't think there are any plans in
that direction, but I think having support for hooking in custom
expression evaluators seems like a reasonable idea.

- Raphael

I think having more language plugins in LLDB is in general a good
thing. Given the past experience with other plugins (that have been
since removed) there are probably a few things that should be
clarified before we merge new plugins:

I had the wrong impression about that by looking into the various
plugin removal from the official tree and Rust LLVM fork, but good to
hear that.

1. Who is going to maintain it?

In the past we had to remove the Go plugin because we couldn't find a
single person on this planet that wanted to maintain it after the
original author stopped working on it. It would be good to have some
confidence that the plugin doesn't end up without a maintainer in the
near future. Having multiple people involved seems like a good way to
show that this isn't likely to happen. I think having another person
reviewing the patches and being willing to maintain the code seems
also fine IMHO.

For now I'm commited to work on the plugin as part of the SAoC and, in
the future, as long as I keep interest in LLVM I can voluntarily
maintain it. I'm still getting traction with the LLVM codebase,
although.

The most concerning part for me about my position right now is the fact
that I finished school and not currently "working" and perspective of
free time can vary if I start doing something full-time.

I think there is also two potential reviewers/co-maintainers that can
possibly help:

1. Martin Kinkelin, part of the dlang organization and the main
developer of LDC, an LLVM-based D compiler.
2. Mathias LANG, part of the dlang organization and already made some
contributions to LLVM. He is also my mentor on this project.

I'm going to CC both to this email.

If we need, there is some names here
(https://github.com/dlang/projects/issues/81) that we can also contact.

2. How is it going to be tested?

The most straightforward way of testing seems to be to just have some
D sources checked in, we call the D compiler to compile them and then
have a normal API test that tests the language-specific parts. But
that also means we would have a dependency on the D compiler to run
the tests which makes maintaining LLDB overall harder. We currently
get away with this approach for C/C++/Obj-C because we have a compiler
for those languages in LLVM itself.

There is an LLVM-based D compiler that generates the most reliable
debug info, at the moment. LDC is based on the official reference
compiler frontend, with some modifications AFAIK.

Before tackling extensively on the plugin, I'm fixing some bugs on the
DWARF generation that I see that needs improvements. There are also
some things that are not implemented in LLVM such as immutable DWARF
tag, even though it is not a blocker.

DMD could also be an option, although for DWARF 5, it generates some
errors, irrelevant for LLDB (they can be ignored) but can cause some
trouble in the test suite. I'm havily improving the DMD backend to
generate better DWARF info, although pretty bizarre errors occurs and
some are out of my understanding, e.g. overlapping references/range
offsets across ELF sections.

I'm confident that LDC is the best choice here, even though not being
the official reference compiler.

The other solution would be to check in some pre-generated YAML'ified
debug info similar to what we are currently doing in some shell tests.
That means we don't have a dependency on the D compiler to run the
test suite, but those pregenerated tests have a tendency to be hard to
maintain (some existing tests require me to change my local username,
run macOS/Xcode and do a bunch of manual cleanup to end up with the
same output). Also they often clutter the repository with random
strings that show up in grep (I'm pretty sure we have at least one
test that contains the Google-internal network proxy config or
something similar). I think we can get something maintainable with
some simple script that can post-process whatever the D compiler
emits. Requiring people to install the D compiler to regenerate the
tests is IMHO a reasonable requirement as that rarely happens.

I think LDC is capable of generating LLVM IR as an output and that can
help on testing the plugin.

I'm not aware of that YAML debug info output, can you elaborate on that
or give me a reference to read more about it? Also a few questions:

1. Is this something that can be implemented in the frontend compiler?
2. Can LLVM IR be converted to that YAML format, easily?
3. Can ELF with DWARF binaries be converted to that YAML format,
easily?

3. How is this going to be implemented?

Bit of a broad topic, but it would be good to know what's the general
plan for implementing the plugin. From what I can see we need at least
a DWARF parser, a lldb_private::Language-based plugin and a
TypeSystem-based plugin. I believe that's enough to get most of the
functionality in LLDB working (the only exception is the expression
parser), but there's a good chance I forgot something in that list.

Yes, I have plans for implementing a custom TypeSystem and
DWARFFASTParser for D. For now, I'm using the Clang one, since most of
the DWARF generated by D compiler is C/C++ compatible. You can take a
look into the work in progress implementation here
(https://github.com/ljmf00/llvm-project/tree/llvm-plugin-d).

The expression evaluator will probably be a whole topic on its own,
but I would expect it to be some small evaluator for simple D
expressions (+ maybe something relying on the Clang expression
evaluator for things like function calls, etc.).

Regarding the custom expression: I don't think there are any plans in
that direction, but I think having support for hooking in custom
expression evaluators seems like a reasonable idea.

I have plans for implementing an expression evaluator, although I'm not
sure how much work I need to put in, to have decent expressions. I
though about extending from the Clang expression parser, but I'm not
sure if that is even possible, since it can be attached to the Clang
language plugin. That is my last milestone for the project.

I don't know if you read the milestone list, but if you don't, you can
take a look into it to have more context. I plan to support very simple
D expression as an extension to C/C++ expressions, such as array
slices, e.g. `arr[0 .. 10]`, or `arr[4]` instead of `arr.ptr[4]` (the C
/C++ compatible way).

If you have any idea on how achievable that is, I would appreciate
opinions on that.