Reviving the DebugIR pass

dan_liew · March 21, 2018, 6:19pm

Hi,

I recently had the need to see the correspondence between some Clang
generated LLVM IR and the compiled machine code within a debugger
(lldb in this case). Unfortunately it looks like the functionality to
do this used to be in a pass called 'DebugIR' but was removed due to
the lack of a maintainer [1].

It appears an attempt was made [2] to revive this but it appears to
have stalled.

I am very keen to see this feature land back in LLVM, so I'd either
like to take over [2] (only if the original author is no longer
interested) or rewrite the feature from scratch based on feedback on
what the community needs.

The use cases for this feature I see are:

* Debugging instrumentation inserted by LLVM passes. This is my use
case. I need to debug ASan instrumentation so looking at the original
source code in lldb is pretty useless because I can't see the
instrumentation. Looking at the LLVM IR on the otherhand is very
useful because I can see the ASan instrumentation at a (slightly)
higher level than the native machine code assembly.

* Debugging JIT'ed code. The LLVM IR that is JIT'ed might not come
from a higher level language and might be generated directly. In this
case the only debug information that makes sense is the LLVM IR
representation of the JIT'ed LLVM IR. This might be a bit tricky
though because there might not be an on disk representation of the
LLVM IR.

* Writing portions of a runtime in LLVM IR. This is not something I
advise doing but sometimes its necessary to write parts of a compiler
runtime in LLVM IR when it is difficult to write the equivalent code
in a higher level language. This is something I've had to do in KLEE
before because I couldn't get Clang to emit the LLVM IR in the precise
form that I wanted.

Debug information is not really my expertise but I'm happy to become
the code owner/maintainer of whatever implementation we end up with
(so that it doesn't get removed again) if necessary.

This work could also be used to fix [3]. Although this probably needs
more though because there's the question of whether we should preserve
existing debug information in an LLVM IR file or write over it when it
is given to Clang.

Any thoughts on this?

[1] rG910f05d1814a
[2] ⚙ D40778 [DebugIR] Revive the Debug IR pass. [Added llvm-commits]
[3] https://bugs.llvm.org/show_bug.cgi?id=35770

Thanks,
Dan.

JDevlieghere · March 21, 2018, 11:51pm

Hi Dan,

Having missed this functionality myself in the past I’m excited to see this gain traction. Let me know if there’s anything I can do to help this along.

Cheers,
Jonas

Jason · March 22, 2018, 12:37am

We have implemented a debug pass to enable debugging of jitted functions. The pass dumps the IR of the module at the beginning of the pass, which then becomes the source code for the debug metadata. Our solution focuses on a windows x86 solution but I’m sure it would be easy to extend to other platforms.

In order to load the resulting debugging info into Visual Studio, we jit to object file then offline use msvc linker to create a dll with debugging symbols. We then load the dll into the running executable. At that point most debugging facilities are available, such as setting breakpoints, stepping through IR or assembly, etc.

With our workflow we jit a substantial amount of functionality and having the ability to debug IR has been very useful. I could work on making this open source if there is interest.

Thanks,
Jason

vedantk · March 22, 2018, 6:42pm

1 to having this functionality implemented as llvm-as -g or some analog (clang -g foo.ll).

As I outlined in https://reviews.llvm.org/D40778, I think such a solution can be made to work for JIT users who don’t have on-disk *.ll files.

We have implemented a debug pass to enable debugging of jitted functions. The pass dumps the IR of the module at the beginning of the pass, which then becomes the source code for the debug metadata. Our solution focuses on a windows x86 solution but I’m sure it would be easy to extend to other platforms.

In order to load the resulting debugging info into Visual Studio, we jit to object file then offline use msvc linker to create a dll with debugging symbols. We then load the dll into the running executable. At that point most debugging facilities are available, such as setting breakpoints, stepping through IR or assembly, etc.

With our workflow we jit a substantial amount of functionality and having the ability to debug IR has been very useful. I could work on making this open source if there is interest.

Thanks,
Jason

Hi Dan,

Having missed this functionality myself in the past I’m excited to see this gain traction. Let me know if there’s anything I can do to help this along.

Cheers,
Jonas

Hi,

I recently had the need to see the correspondence between some Clang
generated LLVM IR and the compiled machine code within a debugger
(lldb in this case). Unfortunately it looks like the functionality to
do this used to be in a pass called ‘DebugIR’ but was removed due to
the lack of a maintainer [1].

It appears an attempt was made [2] to revive this but it appears to
have stalled.

I am very keen to see this feature land back in LLVM, so I’d either
like to take over [2] (only if the original author is no longer
interested) or rewrite the feature from scratch based on feedback on
what the community needs.

The use cases for this feature I see are:

Debugging instrumentation inserted by LLVM passes. This is my use
case. I need to debug ASan instrumentation so looking at the original
source code in lldb is pretty useless because I can’t see the
instrumentation. Looking at the LLVM IR on the otherhand is very
useful because I can see the ASan instrumentation at a (slightly)
higher level than the native machine code assembly.

Debugging JIT’ed code. The LLVM IR that is JIT’ed might not come
from a higher level language and might be generated directly. In this
case the only debug information that makes sense is the LLVM IR
representation of the JIT’ed LLVM IR. This might be a bit tricky
though because there might not be an on disk representation of the
LLVM IR.

Writing portions of a runtime in LLVM IR. This is not something I
advise doing but sometimes its necessary to write parts of a compiler
runtime in LLVM IR when it is difficult to write the equivalent code
in a higher level language. This is something I’ve had to do in KLEE
before because I couldn’t get Clang to emit the LLVM IR in the precise
form that I wanted.

Debug information is not really my expertise but I’m happy to become
the code owner/maintainer of whatever implementation we end up with
(so that it doesn’t get removed again) if necessary.

This work could also be used to fix [3]. Although this probably needs
more though because there’s the question of whether we should preserve
existing debug information in an LLVM IR file or write over it when it
is given to Clang.

If foo.ll contains edited debug info, clang -g shouldn’t silently drop the edits. A warning + no-op seems more appropriate.

Happy to help with code review!

thanks,
vedant

pogo59 · March 22, 2018, 7:23pm

This work could also be used to fix [3]. Although this probably needs
more though because there's the question of whether we should preserve
existing debug information in an LLVM IR file or write over it when it
is given to Clang.

If foo.ll contains edited debug info, `clang -g` shouldn't silently drop
the edits. A warning + no-op seems more appropriate.

Happy to help with code review!

thanks,
vedant

Historically, the (mc) assembler has pretended the `-g` was not present
and simply used the debug info described in the assembler source, with no
diagnostic. I think that would be reasonable behavior for `llvm-as -g`
or `clang -g foo.ll` as well.
--paulr

bollu · March 23, 2018, 4:32am

Please do take over the pass revival. College hasn’t left me with the bandwidth to continue this side project

If there’s small things here and there, I’d be happy to pitch in. However,.whipping the patch into shape is beyond me right now.

Thanks
Siddharth

dan_liew · March 23, 2018, 1:47pm

This work could also be used to fix [3]. Although this probably needs
more though because there's the question of whether we should preserve
existing debug information in an LLVM IR file or write over it when it
is given to Clang.

If foo.ll contains edited debug info, `clang -g` shouldn't silently drop the
edits. A warning + no-op seems more appropriate.

By "edited" I assume you mean someone manually going in and changing
the debug info metadata?
I don't think there's a simple way for Clang to know that someone has done that.

I imagined that the most appropriate behaviour for `clang -g foo.ll`
(also `llvm-as -g`) would be something like.

* If `foo.ll` already contains debug information so leave it alone
(i.e. use what is already there)
* If `foo.ll` does **not** contain debug information use `foo.ll` for
debug locations. Possibly also emit a warning when this happens
because this is not a common case.

dan_liew · March 23, 2018, 1:48pm

Thanks for confirming. I won't be able to take this up immediately but
I should have a little bit of free time next week to start looking at
this.

Dan.

dan_liew · March 23, 2018, 1:54pm

Thanks for offering. I suspect that open sourcing this may take quite
some time. So the approach I'd like to take is to revive the existing
patch and add you as a reviewer with the goal of the getting the
functionality usable for your use case.
Of course if you think the pass can be upstreamed in a timely manner
I'd happily reconsider.

The other parts you mention sound like they belong as part of the JIT
itself. Given that I don't really work with LLVM's JIT I don't know
how useful it would be to have that support upstream (although it does
sound very useful to me).

Dan.

vedantk · March 23, 2018, 6:12pm

This work could also be used to fix [3]. Although this probably needs
more though because there's the question of whether we should preserve
existing debug information in an LLVM IR file or write over it when it
is given to Clang.

If foo.ll contains edited debug info, `clang -g` shouldn't silently drop the
edits. A warning + no-op seems more appropriate.

By "edited" I assume you mean someone manually going in and changing
the debug info metadata?

Yes, or the intrinsics.

I don't think there's a simple way for Clang to know that someone has done that.

Oh, I'm not suggesting you do that. It'd be fine to just check that M.getNamedMetadata("llvm.dbg.cu") is non-null before adding any new debug info.

I imagined that the most appropriate behaviour for `clang -g foo.ll`
(also `llvm-as -g`) would be something like.

* If `foo.ll` already contains debug information so leave it alone
(i.e. use what is already there)
* If `foo.ll` does **not** contain debug information use `foo.ll` for
debug locations. Possibly also emit a warning when this happens
because this is not a common case.

Sgtm. I don't anticipate case 2 being all that uncommon, but I'll leave it up to you, as you're doing the work :).

vedant

Topic		Replies	Views
Debugging LLVM IR with GDB LLVM Dev List Archives	13	381	December 24, 2012
Debugging LLVM IR - Reviving the DebugIR pass LLVM Dev List Archives	4	176	November 24, 2017
LLVM IR Debugger LLVM Dev List Archives	5	250	January 15, 2018
LLVM trunk generates different machine code for JCC instruction w/ or w/o debug info LLVM Dev List Archives	6	179	January 12, 2021
using -debug-ir to map identify IR mapping LLVM Dev List Archives	9	186	October 15, 2014

Reviving the DebugIR pass

Related topics