Rust support in LLDB, again

Hi,

Last year there was an effort led by Tom Tromey to add Rust language support into LLDB. He had implemented a fairly complete language plugin, however it was not accepted into mainline because of supportability concerns. I guess these concerns had some merit, because this change did not survive even in Rust’s private branch due to the difficulty of rebasing on top of LLVM 9.

I am wondering if there’s a more limited version of this, that can be merged into mainline:
In terms of its memory model, Rust is not that far off from C++, so treating Rust types is if they were C++ types basically works. There is only one major problem: currently LLDB cannot deal with tagged unions, which Rust code uses quite heavily. When such a type is encountered, LLDB just emits an empty struct, which makes it impossible to examine the contents.

My tentative proposal is to modify LLDB’s DWARFASTParserClang to handle DW_TAG_variant et al, and create a C++ approximation of these types, e.g. as a polymorphic class, or just an untagged union. This would provide at least a minimal level of functionality for Rust (and possibly other languages) and be a much lesser maintenance burden on LLDB core team.

What would y’all say?

1 Like

Hi Vadim,

Hi,
Last year there was an effort led by Tom Tromey to add Rust language support into LLDB. He had implemented a fairly complete language plugin, however it was not accepted into mainline because of supportability concerns. I guess these concerns had some merit, because this change did not survive even in Rust's private branch due to the difficulty of rebasing on top of LLVM 9.

Unless my memory is failing me, I don't think we ever explicitly
rejected Rust's language plugin. We removed a few other language
plugins (Go, Java) that were not maintained and were becoming an
increasing burden on the community. At the same time we agreed that we
didn't want to make the same mistake again. Some of the things that
come to mind are having a working implementation, testing, CI, etc. If
the rust community can show that they're dedicated to maintaining Rust
support in LLDB, I wouldn't expect a lot of resistance. I just bring
this up because I don't want to discourage anyone from adding support
for new languages to LLDB.

I am wondering if there's a more limited version of this, that can be merged into mainline:
In terms of its memory model, Rust is not that far off from C++, so treating Rust types is if they were C++ types basically works. There is only one major problem: currently LLDB cannot deal with tagged unions, which Rust code uses quite heavily. When such a type is encountered, LLDB just emits an empty struct, which makes it impossible to examine the contents.

My tentative proposal is to modify LLDB's DWARFASTParserClang to handle DW_TAG_variant et al, and create a C++ approximation of these types, e.g. as a polymorphic class, or just an untagged union. This would provide at least a minimal level of functionality for Rust (and possibly other languages) and be a much lesser maintenance burden on LLDB core team.
What would y'all say?

The people that actually work on this code should answer this, but
personally I don't have strong objections to this. That said, of
course I would prefer to have a (maintained) language plugin instead.

PS: Are there other changes that live downstream that are not Rust
specific and would benefit upstream LLDB and would potentially improve
Rust debugging?

Jonas

+1 to everything that Jonas said.

Unless my memory is failing me, I don’t think we ever explicitly
rejected Rust’s language plugin. We removed a few other language
plugins (Go, Java) that were not maintained and were becoming an
increasing burden on the community. At the same time we agreed that we
didn’t want to make the same mistake again. Some of the things that
come to mind are having a working implementation, testing, CI, etc. If
the rust community can show that they’re dedicated to maintaining Rust
support in LLDB, I wouldn’t expect a lot of resistance. I just bring
this up because I don’t want to discourage anyone from adding support
for new languages to LLDB.

Do you have any thoughts on how this support should look like?

Realistically, though, I would expect this to go about as well as the previous two attempts you’ve mentioned. :frowning:

My tentative proposal is to modify LLDB’s DWARFASTParserClang to handle DW_TAG_variant et al, and create a C++ approximation of these types, e.g. as a polymorphic class, or just an untagged union. This would provide at least a minimal level of functionality for Rust (and possibly other languages) and be a much lesser maintenance burden on LLDB core team.

I looked at the code in more detail, and unfortunately it looks like C++ AST is not flexible enough to represent variants as polymorphic classes, so It’ll have to be just untagged unions. But I’d love to hear otherwise from people more familiar with that code.

PS: Are there other changes that live downstream that are not Rust
specific and would benefit upstream LLDB and would potentially improve
Rust debugging?

Nothing major I can think of. The rest of changes seem to be pretty Rust-specific.

My general feedback is that it would help a lot if LLDB were less C+± (and in particular clang-) centric. For example, right now LLDB converts various debug info formats directly into clang AST. As a result, other languages are forced to re-implement debug info parsing from scratch and soon as they need to customize anything that cannot be mapped to C/C++ concepts. There probably needs to be some sort of language-agnostic layer that abstracts debug info formats for use by language plugins. If this layer supported DWARF spec in its entirety, I expect that most languages would need little to no customization, at least until you get to implementing a REPL for that language.

A stable ABI for dynamically-loadable language plugins would be the best, of course.

> Unless my memory is failing me, I don't think we ever explicitly
> rejected Rust's language plugin. We removed a few other language
> plugins (Go, Java) that were not maintained and were becoming an
> increasing burden on the community. At the same time we agreed that we
> didn't want to make the same mistake again. Some of the things that
> come to mind are having a working implementation, testing, CI, etc. If
> the rust community can show that they're dedicated to maintaining Rust
> support in LLDB, I wouldn't expect a lot of resistance. I just bring
> this up because I don't want to discourage anyone from adding support
> for new languages to LLDB.

Do you have any thoughts on how this support should look like?

Realistically, though, I would expect this to go about as well as the previous two attempts you've mentioned. :frowning:

>> My tentative proposal is to modify LLDB's DWARFASTParserClang to handle DW_TAG_variant et al, and create a C++ approximation of these types, e.g. as a polymorphic class, or just an untagged union. This would provide at least a minimal level of functionality for Rust (and possibly other languages) and be a much lesser maintenance burden on LLDB core team.

I looked at the code in more detail, and unfortunately it looks like C++ AST is not flexible enough to represent variants as polymorphic classes, so It'll have to be just untagged unions. But I'd love to hear otherwise from people more familiar with that code.

> PS: Are there other changes that live downstream that are not Rust
> specific and would benefit upstream LLDB and would potentially improve
> Rust debugging?

Nothing major I can think of. The rest of changes seem to be pretty Rust-specific.

My general feedback is that it would help a lot if LLDB were less C++- (and in particular clang-) centric. For example, right now LLDB converts various debug info formats directly into clang AST. As a result, other languages are forced to re-implement debug info parsing from scratch and soon as they need to customize anything that cannot be mapped to C/C++ concepts. There probably needs to be some sort of language-agnostic layer that abstracts debug info formats for use by language plugins. If this layer supported DWARF spec in its entirety, I expect that most languages would need little to no customization, at least until you get to implementing a REPL for that language.

Swift has its own AST, and a separate DWARF->SwiftAST parsers, and an independent expression evaluator and runtime. Though the Swift support is in a separate repository, all the plugin code to handle an entirely non-C++ like language is also present in the llvm.org sources. That requires building a fair bit of the compiler into lldb - which may bring up license issues for Rust - but it is possible.

A stable ABI for dynamically-loadable language plugins would be the best, of course.

The interface for language plugins is still evolving, as we move the clang (and swift) dependencies out of generic code and into the plugins (shout out to Alex for his persistence in this effort!!!). And I don't think the language plugins will ever be stable in the way the SB API's are. They rely on too many llvm constructs, and those are not guaranteed to be stable. In that sense lldb will be like developing with the rest of the llvm infrastructure. Stable enough in API form, and factored out well enough, that keeping up with external changes won't be too much of a pain, but still requiring recompiles.

Jim

Hi,

Last year there was an effort led by Tom Tromey to add Rust language support into LLDB. He had implemented a fairly complete language plugin, however it was not accepted into mainline because of supportability concerns. I guess these concerns had some merit, because this change did not survive even in Rust’s private branch due to the difficulty of rebasing on top of LLVM 9.

I am wondering if there’s a more limited version of this, that can be merged into mainline:
In terms of its memory model, Rust is not that far off from C++, so treating Rust types is if they were C++ types basically works. There is only one major problem: currently LLDB cannot deal with tagged unions, which Rust code uses quite heavily. When such a type is encountered, LLDB just emits an empty struct, which makes it impossible to examine the contents.

My tentative proposal is to modify LLDB’s DWARFASTParserClang to handle DW_TAG_variant et al, and create a C++ approximation of these types, e.g. as a polymorphic class, or just an untagged union. This would provide at least a minimal level of functionality for Rust (and possibly other languages) and be a much lesser maintenance burden on LLDB core team.

What would y’all say?

So if Rust actually uses llvm and clang and Rust is supported by llvm and clang, this shouldn’t be an issue and should already work. But if you are having problems, then I am guessing that you have a compiler that isn’t based on llvm and clang? If that is the case, the best thing you can do is write a new TypeSystem subclass. Everywhere in LLDB, anytime we want to get type information or run an expression, we grab a TypeSytem for a given language enumeration. When we are stopped in a Rust stack frame, we will ask for the type system for the Rust language and hopefully we get something back.

For viewing types in a variable view, you can go the route of letting LLDB convert DWARF into clang AST types and letting that infrastructure display those types. But you can often run into issues, like you have seen with your DW_TAG_variant. If a user then types “p foo->bar”, it will invoke the clang expression parser and it will then play with the types that you have created. Clang has a lot of asserts and other things that can crash your debug session if you do anything to weird in your clang AST context.

So if Rust doesn’t use clang in its compiler

  • create a new TypeSystem for Rust that would convert DWARF into Rust AST types that are native to your Rust compiler that uses as much of the Rust compiler sources as possible
  • write a native Rust expression parser which hopefully uses your Rust compiler sources to evaluate and run your expression

It is good to note how the Swift language decided to do things differently. Swift decided that they would have the compiler/linker generate a blob of data that is embedded into the executable or stand alone debug information that contains a serialized AST of the program. The benefit of this approach is that when you debug your program, LLDB will hand this serialized blob back to the compiler. The DWARF information for Swift doesn’t need to encode the full type information in this case. It just has mangled names that uniquely identify the types. LLDB can then pass this mangled name to the compiler and say “please give me a type the ‘_SC3FooS3Bar’”. The other benefit if this approach is that the compiler can rapidly change language features and the debugger can keep up by recompiling. Any new language features or types are encoded in the data blob and the compiler can then extract them. The serialized Swift AST contexts are not portable between compiler versions though, and this is the drawback of this approach. The LLDB must be perfectly in sync with the tools that produce the binaries. Another benefit of this approach is that the entire AST of all types gets encoded. Many compilers will limit the amount of DWARF debug info they emit which means that they don’t emit every type, they try to only emit the types that are used. DWARF also doesn’t have great template support, so any templates that aren’t used, or code that is only inlined (std::vector::size() for example) won’t be callable in an expression. If you have the entire AST, you can synthesize these inlined functions and use all types that your program knew about when it was compiled. If you convert reduced DWARF into ASTs, you only have the information that is represented by the DWARF itself and nothing more.

All other languages convert DWARF back into clang AST types and then let the clang compiler evaluate expression using native clang AST types. The C and C++ languages have been pretty stable so this approach works well for C/C++/ObjC and more.

So the right answer depends on what the Rust community wants. Is the language changing rapidly where the debugger must be in sync to take advantage and use the latest language features? Or is it stable?

The other nice things about creating a new TypeSystem, is that it is a plugin and you don’t need to compile it in. cmake can be taught to conditionally compile in your type system if you want it. It would also allow you to have more than one Rust type system if needed for different Rust language versions that each could be exclusively compiled in. Having your sources be in a new TypeSystem plug-in ensure easy merging when it comes to different repositories.

Let us know about which approach sounds better to the Rust community and we can proceed from there!

Hi Greg,

So if Rust doesn’t use clang in its compiler

  • create a new TypeSystem for Rust that would convert DWARF into Rust AST types that are native to your Rust compiler that uses as much of the Rust compiler sources as possible
  • write a native Rust expression parser which hopefully uses your Rust compiler sources to evaluate and run your expression

This already exists, but is proving difficult to maintain out-of-tree. (I wish that people who modify plugin-related APIs would spend a few minutes documenting what each of those methods is supposed to do, so plugin maintainers wouldn’t need to reverse-engineer this info! /rant).

So the right answer depends on what the Rust community wants. Is the language changing rapidly where the debugger must be in sync to take advantage and use the latest language features? Or is it stable?

Debug info is mostly stable, but over time there will be changes, of course. For example, at some point we’ll want to change the symbol mangling scheme.

The other nice things about creating a new TypeSystem, is that it is a plugin and you don’t need to compile it in. cmake can be taught to conditionally compile in your type system if you want it. It would also allow you to have more than one Rust type system if needed for different Rust language versions that each could be exclusively compiled in. Having your sources be in a new TypeSystem plug-in ensure easy merging when it comes to different repositories.

Understood. I just want to point out that implementing a complete type system plugin just to support one or two type kinds incompatible with C++ is a pretty big burden on language implementors. If LLDB had a default “type system” geared towards representing types expressible in DWARF, not just those found in the C family, it would be usable with other languages without any custom work, at least until one starts getting into fancy features like REPL.

It might also serve as an abstraction layer for supporting different debug info formats. As I understand, right now, in order to support MS PDB, we’d need to implement a custom parser for it in addition to the DWARF one?

Let us know about which approach sounds better to the Rust community and we can proceed from there!

I guess we’d prefer to upstream the Rust plugin. But I’m not sure how keep it from breaking without requiring all LLVM devs to install a Rust compiler…

Hi Greg,

So if Rust doesn’t use clang in its compiler

  • create a new TypeSystem for Rust that would convert DWARF into Rust AST types that are native to your Rust compiler that uses as much of the Rust compiler sources as possible
  • write a native Rust expression parser which hopefully uses your Rust compiler sources to evaluate and run your expression

This already exists, but is proving difficult to maintain out-of-tree. (I wish that people who modify plugin-related APIs would spend a few minutes documenting what each of those methods is supposed to do, so plugin maintainers wouldn’t need to reverse-engineer this info! /rant).

So the right answer depends on what the Rust community wants. Is the language changing rapidly where the debugger must be in sync to take advantage and use the latest language features? Or is it stable?

Debug info is mostly stable, but over time there will be changes, of course. For example, at some point we’ll want to change the symbol mangling scheme.

The other nice things about creating a new TypeSystem, is that it is a plugin and you don’t need to compile it in. cmake can be taught to conditionally compile in your type system if you want it. It would also allow you to have more than one Rust type system if needed for different Rust language versions that each could be exclusively compiled in. Having your sources be in a new TypeSystem plug-in ensure easy merging when it comes to different repositories.

Understood. I just want to point out that implementing a complete type system plugin just to support one or two type kinds incompatible with C++ is a pretty big burden on language implementors.

It definitely is, but I believe that good debugging tools helps people adopt your language more quickly and if people invested a bit more into the debugging experience we would have more languages to play with.

If LLDB had a default “type system” geared towards representing types expressible in DWARF, not just those found in the C family, it would be usable with other languages without any custom work, at least until one starts getting into fancy features like REPL.

It might also serve as an abstraction layer for supporting different debug info formats. As I understand, right now, in order to support MS PDB, we’d need to implement a custom parser for it in addition to the DWARF one?

Yeah this is a tough tradeoff as this is what GDB does or at least did the last time I was working on it. In most debuggers, they make up their own internal type representation, and have a generic LALR recursive descent parser to evaluate single expression statements. This parser must work for multiple languages and problems arise when one language has a keyword or special operator that others don’t have. In the old days, if you wanted to call a C++ function in the GDB expression parser you would need to put quotes around any qualified names that contained : characters because that had been overloaded by the expression parser before C++ to allow getting to a static variable in a file (“g_foo:main.c” or “main.c:g_foo”), So you had to type ‘“foo::bar”()’ in your expression. The type system tries to avoid these issues by allowing each language to define the best functionality for each language.

So one route would be to allow your language to use an internal type system that isn’t based on any compiler back end. But that is going to be just as much work as making a new Rust type system and more problematic to maintain as other languages jump on board.

Let us know about which approach sounds better to the Rust community and we can proceed from there!

I guess we’d prefer to upstream the Rust plugin. But I’m not sure how keep it from breaking without requiring all LLVM devs to install a Rust compiler…

One idea is to add rust support in its own TypeSystem and require a cmake option to manually enable its compilation. If the cmake flag isn’t supplied, Rust is not compiled in by default. That way you can upstream it, people can try to enable it if any only if they download the rust compiler sources. Might be nice to have a rust compiler revision or hash that people can/should checkout that is documented in the checked out LLDB sources in the Rust type system README. This revision would be the last known good revision of the Rust compiler sources they should download in order to build the version of Rust that is in LLDB’s sources. This way it would not break if we had a script in the Rust LLDB sources that knew how to checkout the right version of Rust and then users can manually enable Rust. Any rust enabled buildbots could ensure the right Rust sources are checked out. Maybe you could integrate this with the LLVM mono repo where Rust could be a module that can be added so when you add the ‘-DLLVM_ENABLE_PROJECTS=“clang;libcxx;lldb;rust”’ flag, it would know to build the rust bits needed for LLDB?

I am happy to help in any way I can. Are the rust compiler sources setup in a way that the debugger could end up using the Rust AST and other data structures when converting types from DWARF? Could the rust compiler be used to evaluate expressions with the converted DWARF to Rust AST in a Rust LLDB type system?

Greg

Yeah this is a tough tradeoff as this is what GDB does or at least did the last time I was working on it. In most debuggers, they make up their own internal type representation, and have a generic LALR recursive descent parser to evaluate single expression statements. This parser must work for multiple languages and problems arise when one language has a keyword or special operator that others don’t have. In the old days, if you wanted to call a C++ function in the GDB expression parser you would need to put quotes around any qualified names that contained : characters because that had been overloaded by the expression parser before C++ to allow getting to a static variable in a file (“g_foo:main.c” or “main.c:g_foo”), So you had to type ‘“foo::bar”()’ in your expression. The type system tries to avoid these issues by allowing each language to define the best functionality for each language.

Going a bit on a tangent here, but it isn’t obvious that a debugger expression evaluator needs to match full syntax and capabilities of the source language. I’ve been able to get quite a bit of mileage out of a Python “expression evaluator” on top of a wrapper similar to lldb.value. Not optimal by any stretch, but quite acceptable for baseline experience, IMO.

One idea is to add rust support in its own TypeSystem and require a cmake option to manually enable its compilation. If the cmake flag isn’t supplied, Rust is not compiled in by default. That way you can upstream it, people can try to enable it if any only if they download the rust compiler sources. Might be nice to have a rust compiler revision or hash that people can/should checkout that is documented in the checked out LLDB sources in the Rust type system README. This revision would be the last known good revision of the Rust compiler sources they should download in order to build the version of Rust that is in LLDB’s sources. This way it would not break if we had a script in the Rust LLDB sources that knew how to checkout the right version of Rust and then users can manually enable Rust. Any rust enabled buildbots could ensure the right Rust sources are checked out. Maybe you could integrate this with the LLVM mono repo where Rust could be a module that can be added so when you add the ‘-DLLVM_ENABLE_PROJECTS=“clang;libcxx;lldb;rust”’ flag, it would know to build the rust bits needed for LLDB?

The Rust plugin is written in C++, so Rust compiler source won’t be necessary. However a binary compiler release would be needed for testing. It it certainly possible to make Rust support conditional on a cmake flag, but would it be getting enough (or any) testing if not enabled by default?

Yeah this is a tough tradeoff as this is what GDB does or at least did the last time I was working on it. In most debuggers, they make up their own internal type representation, and have a generic LALR recursive descent parser to evaluate single expression statements. This parser must work for multiple languages and problems arise when one language has a keyword or special operator that others don’t have. In the old days, if you wanted to call a C++ function in the GDB expression parser you would need to put quotes around any qualified names that contained : characters because that had been overloaded by the expression parser before C++ to allow getting to a static variable in a file (“g_foo:main.c” or “main.c:g_foo”), So you had to type ‘“foo::bar”()’ in your expression. The type system tries to avoid these issues by allowing each language to define the best functionality for each language.

Going a bit on a tangent here, but it isn’t obvious that a debugger expression evaluator needs to match full syntax and capabilities of the source language. I’ve been able to get quite a bit of mileage out of a Python “expression evaluator” on top of a wrapper similar to lldb.value. Not optimal by any stretch, but quite acceptable for baseline experience, IMO.

FYI: we do have this with “frame variable”. “frame variable foo->my_ptr[12]” will work for all child accesses, treating pointers as arrays, taking address of or dereferencing a pointer, etc. So we do have this functionality already with “frame variable”. So give “frame variable” a try and see how things go.

One idea is to add rust support in its own TypeSystem and require a cmake option to manually enable its compilation. If the cmake flag isn’t supplied, Rust is not compiled in by default. That way you can upstream it, people can try to enable it if any only if they download the rust compiler sources. Might be nice to have a rust compiler revision or hash that people can/should checkout that is documented in the checked out LLDB sources in the Rust type system README. This revision would be the last known good revision of the Rust compiler sources they should download in order to build the version of Rust that is in LLDB’s sources. This way it would not break if we had a script in the Rust LLDB sources that knew how to checkout the right version of Rust and then users can manually enable Rust. Any rust enabled buildbots could ensure the right Rust sources are checked out. Maybe you could integrate this with the LLVM mono repo where Rust could be a module that can be added so when you add the ‘-DLLVM_ENABLE_PROJECTS=“clang;libcxx;lldb;rust”’ flag, it would know to build the rust bits needed for LLDB?

The Rust plugin is written in C++, so Rust compiler source won’t be necessary. However a binary compiler release would be needed for testing. It it certainly possible to make Rust support conditional on a cmake flag, but would it be getting enough (or any) testing if not enabled by default?

It really depends on what you want to test. I am sure unit tests could be written that wouldn’t require a compiler using yaml2obj if needed.