Reviving the DebugIR pass

Hi,

I recently had the need to see the correspondence between some Clang
generated LLVM IR and the compiled machine code within a debugger
(lldb in this case). Unfortunately it looks like the functionality to
do this used to be in a pass called 'DebugIR' but was removed due to
the lack of a maintainer [1].

It appears an attempt was made [2] to revive this but it appears to
have stalled.

I am very keen to see this feature land back in LLVM, so I'd either
like to take over [2] (only if the original author is no longer
interested) or rewrite the feature from scratch based on feedback on
what the community needs.

The use cases for this feature I see are:

* Debugging instrumentation inserted by LLVM passes. This is my use
case. I need to debug ASan instrumentation so looking at the original
source code in lldb is pretty useless because I can't see the
instrumentation. Looking at the LLVM IR on the otherhand is very
useful because I can see the ASan instrumentation at a (slightly)
higher level than the native machine code assembly.

* Debugging JIT'ed code. The LLVM IR that is JIT'ed might not come
from a higher level language and might be generated directly. In this
case the only debug information that makes sense is the LLVM IR
representation of the JIT'ed LLVM IR. This might be a bit tricky
though because there might not be an on disk representation of the
LLVM IR.

* Writing portions of a runtime in LLVM IR. This is not something I
advise doing but sometimes its necessary to write parts of a compiler
runtime in LLVM IR when it is difficult to write the equivalent code
in a higher level language. This is something I've had to do in KLEE
before because I couldn't get Clang to emit the LLVM IR in the precise
form that I wanted.

Debug information is not really my expertise but I'm happy to become
the code owner/maintainer of whatever implementation we end up with
(so that it doesn't get removed again) if necessary.

This work could also be used to fix [3]. Although this probably needs
more though because there's the question of whether we should preserve
existing debug information in an LLVM IR file or write over it when it
is given to Clang.

Any thoughts on this?

[1] rG910f05d1814a
[2] ⚙ D40778 [DebugIR] Revive the Debug IR pass. [Added llvm-commits]
[3] https://bugs.llvm.org/show_bug.cgi?id=35770

Thanks,
Dan.

Hi Dan,

Having missed this functionality myself in the past I’m excited to see this gain traction. Let me know if there’s anything I can do to help this along.

Cheers,
Jonas

We have implemented a debug pass to enable debugging of jitted functions. The pass dumps the IR of the module at the beginning of the pass, which then becomes the source code for the debug metadata. Our solution focuses on a windows x86 solution but I’m sure it would be easy to extend to other platforms.

In order to load the resulting debugging info into Visual Studio, we jit to object file then offline use msvc linker to create a dll with debugging symbols. We then load the dll into the running executable. At that point most debugging facilities are available, such as setting breakpoints, stepping through IR or assembly, etc.

With our workflow we jit a substantial amount of functionality and having the ability to debug IR has been very useful. I could work on making this open source if there is interest.

Thanks,
Jason

  • 1 to having this functionality implemented as llvm-as -g or some analog (clang -g foo.ll).

As I outlined in https://reviews.llvm.org/D40778, I think such a solution can be made to work for JIT users who don’t have on-disk *.ll files.

We have implemented a debug pass to enable debugging of jitted functions. The pass dumps the IR of the module at the beginning of the pass, which then becomes the source code for the debug metadata. Our solution focuses on a windows x86 solution but I’m sure it would be easy to extend to other platforms.

In order to load the resulting debugging info into Visual Studio, we jit to object file then offline use msvc linker to create a dll with debugging symbols. We then load the dll into the running executable. At that point most debugging facilities are available, such as setting breakpoints, stepping through IR or assembly, etc.

With our workflow we jit a substantial amount of functionality and having the ability to debug IR has been very useful. I could work on making this open source if there is interest.

Thanks,
Jason

Hi Dan,

Having missed this functionality myself in the past I’m excited to see this gain traction. Let me know if there’s anything I can do to help this along.

Cheers,
Jonas

Hi,

I recently had the need to see the correspondence between some Clang
generated LLVM IR and the compiled machine code within a debugger
(lldb in this case). Unfortunately it looks like the functionality to
do this used to be in a pass called ‘DebugIR’ but was removed due to
the lack of a maintainer [1].

It appears an attempt was made [2] to revive this but it appears to
have stalled.

I am very keen to see this feature land back in LLVM, so I’d either
like to take over [2] (only if the original author is no longer
interested) or rewrite the feature from scratch based on feedback on
what the community needs.

The use cases for this feature I see are:

  • Debugging instrumentation inserted by LLVM passes. This is my use
    case. I need to debug ASan instrumentation so looking at the original
    source code in lldb is pretty useless because I can’t see the
    instrumentation. Looking at the LLVM IR on the otherhand is very
    useful because I can see the ASan instrumentation at a (slightly)
    higher level than the native machine code assembly.

  • Debugging JIT’ed code. The LLVM IR that is JIT’ed might not come
    from a higher level language and might be generated directly. In this
    case the only debug information that makes sense is the LLVM IR
    representation of the JIT’ed LLVM IR. This might be a bit tricky
    though because there might not be an on disk representation of the
    LLVM IR.

  • Writing portions of a runtime in LLVM IR. This is not something I
    advise doing but sometimes its necessary to write parts of a compiler
    runtime in LLVM IR when it is difficult to write the equivalent code
    in a higher level language. This is something I’ve had to do in KLEE
    before because I couldn’t get Clang to emit the LLVM IR in the precise
    form that I wanted.

Debug information is not really my expertise but I’m happy to become
the code owner/maintainer of whatever implementation we end up with
(so that it doesn’t get removed again) if necessary.

This work could also be used to fix [3]. Although this probably needs
more though because there’s the question of whether we should preserve
existing debug information in an LLVM IR file or write over it when it
is given to Clang.

If foo.ll contains edited debug info, clang -g shouldn’t silently drop the edits. A warning + no-op seems more appropriate.

Happy to help with code review!

thanks,
vedant

This work could also be used to fix [3]. Although this probably needs
more though because there's the question of whether we should preserve
existing debug information in an LLVM IR file or write over it when it
is given to Clang.

If foo.ll contains edited debug info, `clang -g` shouldn't silently drop
the edits. A warning + no-op seems more appropriate.

Happy to help with code review!

thanks,
vedant

Historically, the (mc) assembler has pretended the `-g` was not present
and simply used the debug info described in the assembler source, with no
diagnostic. I think that would be reasonable behavior for `llvm-as -g`
or `clang -g foo.ll` as well.
--paulr

Please do take over the pass revival. College hasn’t left me with the bandwidth to continue this side project :slight_smile:

If there’s small things here and there, I’d be happy to pitch in. However,.whipping the patch into shape is beyond me right now.

Thanks
Siddharth

This work could also be used to fix [3]. Although this probably needs
more though because there's the question of whether we should preserve
existing debug information in an LLVM IR file or write over it when it
is given to Clang.

If foo.ll contains edited debug info, `clang -g` shouldn't silently drop the
edits. A warning + no-op seems more appropriate.

By "edited" I assume you mean someone manually going in and changing
the debug info metadata?
I don't think there's a simple way for Clang to know that someone has done that.

I imagined that the most appropriate behaviour for `clang -g foo.ll`
(also `llvm-as -g`) would be something like.

* If `foo.ll` already contains debug information so leave it alone
(i.e. use what is already there)
* If `foo.ll` does **not** contain debug information use `foo.ll` for
debug locations. Possibly also emit a warning when this happens
because this is not a common case.

Thanks for confirming. I won't be able to take this up immediately but
I should have a little bit of free time next week to start looking at
this.

Dan.

Thanks for offering. I suspect that open sourcing this may take quite
some time. So the approach I'd like to take is to revive the existing
patch and add you as a reviewer with the goal of the getting the
functionality usable for your use case.
Of course if you think the pass can be upstreamed in a timely manner
I'd happily reconsider.

The other parts you mention sound like they belong as part of the JIT
itself. Given that I don't really work with LLVM's JIT I don't know
how useful it would be to have that support upstream (although it does
sound very useful to me).

Dan.

This work could also be used to fix [3]. Although this probably needs
more though because there's the question of whether we should preserve
existing debug information in an LLVM IR file or write over it when it
is given to Clang.

If foo.ll contains edited debug info, `clang -g` shouldn't silently drop the
edits. A warning + no-op seems more appropriate.

By "edited" I assume you mean someone manually going in and changing
the debug info metadata?

Yes, or the intrinsics.

I don't think there's a simple way for Clang to know that someone has done that.

Oh, I'm not suggesting you do that. It'd be fine to just check that M.getNamedMetadata("llvm.dbg.cu") is non-null before adding any new debug info.

I imagined that the most appropriate behaviour for `clang -g foo.ll`
(also `llvm-as -g`) would be something like.

* If `foo.ll` already contains debug information so leave it alone
(i.e. use what is already there)
* If `foo.ll` does **not** contain debug information use `foo.ll` for
debug locations. Possibly also emit a warning when this happens
because this is not a common case.

Sgtm. I don't anticipate case 2 being all that uncommon, but I'll leave it up to you, as you're doing the work :).

vedant