[PROPOSAL] Attach debugging information with LLVM instruction

Hi All,

Today, debugging information is encoded in LLVM IR using various
llvm.dbg intrinsics, such as llvm.dbg.stoppoint. For exmaple,

!1 = metadata !{i32 458769, i32 0, i32 12, metadata !"foo.c", metadata
!"/tmp", metadata !"clang 1.0", i1 true, i1 false, metadata !"", i32
0}

  ...
  call void @llvm.dbg.stoppoint(i32 5, i32 5, metadata !1)
  store i32 42, i32* %i
  call void @llvm.dbg.stoppoint(i32 6, i32 5, metadata !1)
  store i32 1, i32* %j.addr
  br label %if.end
  ...

This approach has several disadvantages.
- The llvm.dbg.stoppoint()s act like hurdles to the optimizer. The
LLVM customers expect that the optimizer does not trip over these
hurdles. They expect LLVM to produce same high quality code
irrespective of the presence of debug info. It is a tedious and never
ending task to ensure that the optimizer safely ignores these llvm.dbg
intrinsics.
- The instructions lose original location info when the optimizer
moves them around.
- It is extremely error prone to keep track of lexical scopes and
inlined functions using a pair of llvm.dbg intrinsics.

The proposed solution is to optionally attach debug information with
llvm instruction directly. A new keyword 'dbg' is used to identify
debugging information associated with an instruction. The debugging
information, if available, is printed after the last instruction
operand. The debugging information entry uses MDNode and it is not
counted as an instruction operand. For example,

!1 = metadata !{i32 458769, i32 0, i32 12, metadata !"foo.c", metadata
!"/tmp", metadata !"clang 1.0", i1 true, i1 false, metadata !"", i32
0}
!7 = metadata !{i32 5, i32 5, metadata !1, metadata !1}
!8 = metadata !{i32 6, i32 5, metadata !1, metadata !1}

  ...
  store i32 42, i32* %i, dbg metadata !7
  store i32 1, i32* %j.addr, dbg metadata !8
  br label %if.end, dbg metadata !8
  ...

Now, the optimizer does not need to worry about those llvm.dbg
hurdles. Instructions do not lose their location information when they
are rearranged in instruction stream. And the stage is set to produce,
preserve and emit accurate debug information for inlined functions.

Any thoughts/suggestions/questions ?

Sounds good to me.

Hi All,

Today, debugging information is encoded in LLVM IR using various
llvm.dbg intrinsics, such as llvm.dbg.stoppoint. For exmaple,

Right.

This approach has several disadvantages.
- The llvm.dbg.stoppoint()s act like hurdles to the optimizer. The
LLVM customers expect that the optimizer does not trip over these
hurdles. They expect LLVM to produce same high quality code
irrespective of the presence of debug info. It is a tedious and never
ending task to ensure that the optimizer safely ignores these llvm.dbg
intrinsics.

This is not a problem with stoppoints. Even after we eliminate stoppoints, we'll still have the same thing for other debug info.

- The instructions lose original location info when the optimizer
moves them around.
- It is extremely error prone to keep track of lexical scopes and
inlined functions using a pair of llvm.dbg intrinsics.

Right.

The proposed solution is to optionally attach debug information with
llvm instruction directly. A new keyword 'dbg' is used to identify
debugging information associated with an instruction. The debugging
information, if available, is printed after the last instruction
operand. The debugging information entry uses MDNode and it is not
counted as an instruction operand. For example,

!1 = metadata !{i32 458769, i32 0, i32 12, metadata !"foo.c", metadata
!"/tmp", metadata !"clang 1.0", i1 true, i1 false, metadata !"", i32
0}
!7 = metadata !{i32 5, i32 5, metadata !1, metadata !1}
!8 = metadata !{i32 6, i32 5, metadata !1, metadata !1}

...
store i32 42, i32* %i, dbg metadata !7
store i32 1, i32* %j.addr, dbg metadata !8
br label %if.end, dbg metadata !8
...

Instead of 'dbg metadata !7', is it sufficient to have 'dbg !7'?

I think this is a pretty reasonable syntax, we can even get the asmprinter to handle this as a special case and print it as:

store i32 42, i32* %i, dbg metadata !{i32 5, i32 5, metadata !1, metadata !1}

which makes it easier to read.

Now, the optimizer does not need to worry about those llvm.dbg
hurdles. Instructions do not lose their location information when they
are rearranged in instruction stream. And the stage is set to produce,
preserve and emit accurate debug information for inlined functions.

Sounds nice!

-Chris

So, if we later wanted to attach some other metadata to an instruction it would look something like:

store i32 42, i32* %i, dbg metadata !7, spork !15

or some such? And when you attach the metadata to the instruction how do you plan on making it evident as debug as opposed to spork?

-eric

Sounds good.

To ease transition from LLVM 2.6->2.7, could there be a pass that adds
back the llvm.dbg intrinsics based on the metadata on the instructions?
No in-tree pass should need that, but it could help external projects
that rely on stoppoints being present.

Also would it be possible to have source:line debuginfo generated for
macros (at least in clang)?
The debug info generated by gcc, llvm-gcc or clang doesn't deal with
macros in a way that would allow single-stepping through them.
-g3 in gcc allows me to expand a macro from gcc, but thats it, as far as
debugging is concerned it acts like a single instruction, not
single-steppable.

I generally tend to avoid the use of macros that do something
non-trivial (i.e. requires debugging), but unfortunately C doesn't
support templates,
so in some situations I am forced to use macros, instead of functions.

Best regards,
--Edwin

yes, I'll send out a proposal to cover this in the next couple days.

-Chris

It is! In fact, that's what my prototype does.

Devang Patel wrote:

  store i32 42, i32* %i, dbg metadata !7
  store i32 1, i32* %j.addr, dbg metadata !8
  br label %if.end, dbg metadata !8
  ...

Now, the optimizer does not need to worry about those llvm.dbg
hurdles. Instructions do not lose their location information when they
are rearranged in instruction stream. And the stage is set to produce,
preserve and emit accurate debug information for inlined functions.

I like this.

Would it be very ugly to treat stoppoint like a pseudo op that filled in the metadata for subsequent instructions until the end of the basic block? It might help for backward compatibility.

-Rich

Now, the optimizer does not need to worry about those llvm.dbg
hurdles. Instructions do not lose their location information when they
are rearranged in instruction stream. And the stage is set to produce,
preserve and emit accurate debug information for inlined functions.

Any thoughts/suggestions/questions ?

Sounds good.

To ease transition from LLVM 2.6->2.7, could there be a pass that adds
back the llvm.dbg intrinsics based on the metadata on the instructions?

I'd prefer to not overload llvm.dbg intrinsics, if possible.

No in-tree pass should need that, but it could help external projects
that rely on stoppoints being present.

.. just during transition or forever ?

So, if we later wanted to attach some other metadata to an instruction it would look something like:

For example, we may want to attach a list of typeinfos to invokes,
representing the catch clauses.

Ciao,

Duncan.

Hooray! I like this and the suggestions others have brought forward to
enhance it.

                               -Dave