Notes from dbg.value coffee chat

I chatted with Jeremy Morse, Orlando, and Stephen Tozer at the dev meeting, and wanted to summarize the conversation for our benefit, and to share it with others. I aim to be brief, so I apologize if these notes aren’t as helpful as they could be to folks who weren’t present.

Three project ideas, by priority:

  1. Address https://llvm.org/pr34136, improving quality of variable location info for variables in memory, by getting the frontend to pre-annotate assignments.
  2. Prototype “stop points”, or statement markers in the instruction stream. Use Dexter or other tools to measure potential improvements in stepping behavior, consider productionizing.
  3. Move all debug info intrinsics out of the instruction stream by adding a new all-powerful instruction (maybe dbg_point, dbg_label?) that essentially multiplexes one llvm::Instruction into multiple debug instructions.
  4. (bonus) Old idea, low-priority: Prototype a mode that models stop points as having side effects. Start the function by escaping all non-temporary local variables. Variable values should be accurate and writable at all statement start PCs.

Idea 1: Local variables in memory

Have clang emit dbg.value instructions directly after every assignment. We discussed the similarities of this idea to the idea of “Key Instructions” from Caroline Tice’s thesis, but I can’t claim this idea is totally faithful to it. For assignments to memory locations that are not local variables (*p = v, this->m = v), replace the local variable metadata argument with the value of the store destination, using ValueAsMetadata. Standard cleanup passes (instcombine, inliner?) should transform dbg.values with memory destinations that point into an alloca with the corresponding local variable for the alloca. This allows passes that delete stores other than mem2reg (DSE, Instcombine, GVN, anything using MemorySSA) to not worry about producing dbg.values because they already exist: the frontend has provided them. This was the fundamental reason why lowerDbgDeclare is called in Instcombine, so we can remove that, keep the dbg.declare instructions or something equivalent, and greatly expand the range over which the variable is known to live in stack memory. Variables which never participate in dead store elimination (hopefully many) are more likely to be entirely described by a memory location, and to not need a DWARF location list. They will be writable as well.

Idea 2: Stop points

This is an old idea: LLVM used to have stop point intrinsics before it had debug location instruction attachments. Given the new goals around profiling accuracy that we’ve declared for location information, perhaps we should reconsider the merits of the old design. A new intrinsic that functions similarly to dbg.value in that it produces no value, remains in the instruction stream, and is not removed by standard dead code elimination should be introduced. Perhaps dbg.stmt. This could be lowered down to power the .loc is_stmt bit in the DWARF line tables. We could also have a mode where the only information used to fill in the line tables comes from these instructions. Some data flow passes would be required to propagate the current location into blocks during codegen, similar to some of the existing debug value passes.

Idea 3: dbg_point

This is a representational change that is mostly meant to make LLVM more efficient. I don’t have data, but we believe that runs of dbg.value instructions slow down passes because they must be iterated over during optimization. We also believe that they are memory inefficient. A new representation would address that by allowing us to coalesce multiple logically distinct dbg.value operations into one llvm::Instruction. This instruction could be extended to contain all types of debug info instuctions: dbg.label, dbg.value, dbg.declare, dbg.stmt, or anything else. Having just watched the MLIR tutorial, it reminds me of MLIR regions.

Idea 4: Not much to say

1 Like

Hi Reid,

Thanks for sharing this. I plan to work on improving debug-info for variables

living in memory as my next “project” so I am very interested to hear with what

others have to say about “Idea 1”.

There is one part of the idea that confuses me. You say we could "keep the

dbg.declare instructions", but I don’t see where dbg.declare instructions - at

least with their current semantics - fit into the design if the frontend is

emitting dbg.values after every assignment. Could you please expand on this part
a little?

Thanks,

Orlando

I think what I meant is that we need to keep the association between the alloca and the local variable somewhere. The current implementation of dbg.declare is not what we want: when we have one, it overrides any dbg.values, and the alloca is used as the variable location for the full scope. That’s not what we want, but we do need an association between alloca and variable+scope somewhere. The dbg.values in the design as we discussed it would not contain the alloca, but the SSA value which is stored to the alloca. Maybe if we augmented the dbg.values with the store destination, we could get by without tracking that information separately.

The main design goal here was to have the variable location information be correct even when DSE occurs, without updating every pass that deletes stores, because there are many. Thinking about it today, I’m not sure this design is complete yet. Even if the frontend effectively emits two stores for every assignment, a real store, and dbg.value, the backend needs to determine if the real store survived optimization. If it did, then the variable value lives in memory. If it did not, then the variable value is the value which would’ve been stored, if it is available at this program point, or if it is available somewhere nearby. Maybe that’s acceptable, but it seems difficult. However, maybe that’s similar to what we already do for dbg.values uses that precede definitions.

The alternative to this design would be to say that stores to static allocas are special, and cannot be deleted without updating debug info. This is basically https://llvm.org/pr34136#c25. Thinking about it again today, maybe this approach is feasible. We could build tooling to audit for passes that delete these special “assigning stores”. Maybe that’s better than bending over backwards just to make it easy for the optimizers.

I think what I meant is that we need to keep the association between the alloca and the local variable somewhere. The current implementation of dbg.declare is not what we want: when we have one, it overrides any dbg.values, and the alloca is used as the variable location for the full scope. That’s not what we want, but we do need an association between alloca and variable+scope somewhere. The dbg.values in the design as we discussed it would not contain the alloca, but the SSA value which is stored to the alloca. Maybe if we augmented the dbg.values with the store destination, we could get by without tracking that information separately.

The main design goal here was to have the variable location information be correct even when DSE occurs, without updating every pass that deletes stores, because there are many. Thinking about it today, I’m not sure this design is complete yet. Even if the frontend effectively emits two stores for every assignment, a real store, and dbg.value, the backend needs to determine if the real store survived optimization. If it did, then the variable value lives in memory. If it did not, then the variable value is the value which would’ve been stored, if it is available at this program point, or if it is available somewhere nearby. Maybe that’s acceptable, but it seems difficult. However, maybe that’s similar to what we already do for dbg.values uses that precede definitions.

That makes sense, thank you. I had also thought about extending dbg.value to keep the store destination. One step further could be to introduce a mechanism to refer to the store itself instead of the store destination (possibly similar to Jeremy’s recent MIR debug instruction referencing work [1], but for this situation in IR). If the referenced store has been removed (e.g. by DSE) then we know to use the value instead, which may also have been salvaged or be undef at this point. And if the store has not been removed then we can grab the destination from the store itself.

I’m starting to form the opinion that the ideal solution will provide correctness by default without passes having to do anything, but allow the onus of improving coverage to fall upon those passes. We’ve recently taken a turn in this direction in D80264 [2] whereby we now replace an instruction’s ValueAsMetadata uses with undef on deletion. This provides correctness by default for debug intrinsics using the instruction, and the coverage may be improved by a pass appropriately calling salvageDebugInfo beforehand.

The alternative to this design would be to say that stores to static allocas are special, and cannot be deleted without updating debug info. This is basically https://llvm.org/pr34136#c25. Thinking about it again today, maybe this approach is feasible. We could build tooling to audit for passes that delete these special “assigning stores”. Maybe that’s better than bending over backwards just to make it easy for the optimizers.

The “static allocas are special” [3] approach is appealing because it seems like it may give promising results without too much work (compared with these other ideas, anyway). It does rely on passes to provide debug-info correctness, but tooling may be an appropriate stand in for correctness by default if we decide to go down this route. Did you have anything in mind for this?

Thanks,

Orlando

[1] http://lists.llvm.org/pipermail/llvm-dev/2020-February/139440.html

[2] https://reviews.llvm.org/D80264

[3] https://llvm.org/pr34136#c25

All I had in mind was something like:

  • add a pass that finds all “local variable stores” (stores to a static alloca used by a dbg.declare), and inserts markers (llvm.donothing?) carrying the local variable metadata after every such store
  • after optimization, check that every marker is preceded by either a store to a local variable of a dbg.value for the same local variable. Any marker with no dbg.value or store before it is a bug, unless the variable is available nowhere (I forget how that is represented)

Hi Reid,

Sorry for the long delay in getting back to you. I’ve been thinking about this problem a lot and I have some thoughts on the existing ideas, and have come up with another partial solution.

Thoughts on existing ideas

Currently I prefer the idea you proposed earlier in this thread (the “Coffee Chat” method, which I summarise later) to the “Bugzilla” method [1] based on the idea that there’s less chance of us generating wrong debug info, because - as you mentioned - most optimisation passes won’t need to worry about updating any debug info when messing with stores.

Additionally, regarding tooling for the Bugzilla method, you said:

All I had in mind was something like:

  • add a pass that finds all “local variable stores” (stores to a static alloca used by a dbg.declare), and inserts markers (llvm.donothing?) carrying the local variable metadata after every such store
  • after optimization, check that every marker is preceded by either a store to a local variable of a dbg.value for the same local variable. Any marker with no dbg.value or store before it is a bug, unless the variable is available nowhere (I forget how that is represented)

This itself sounds similar to the Coffee Chat method implementation, which makes the Bugzilla method less appealing IMO since we’d almost be implementing both solutions. Summarising the version of the Coffee Chat method from this email [2] as I understand it:

  1. Remove LowerDbgDeclare.

  2. Frontends emit a new kind of debug intrinsic (dbg.assign?) after every store, tracking both the stored value and store destination.

  3. When mem2reg promotes allocas it converts dbg.assign(value, dest) → dbg.value(value).

  4. If any dbg.assign remain when we get to ISel then we know the alloca is still used for at least some of the variable’s lifetime. We must choose to convert remaining dbg.assign(value, dest) intrinsics to either:

A) DBG_VALUE value

B) DBG_VALUE dest, DIExpression(DW_OP_deref))

where B is preferable if we know the dest address holds the correct value.

To be able to choose correctly in step 03 we need a way of knowing if a specific store has been deleted or moved. To keep the desirable property of allowing optimisation passes to remain ignorant of that requirement this needs to be handled automatically in the background. This would involve linking the store instruction to the dbg.assign, and I believe we can (and should) take advantage of existing infrastructure to do this. For instance, we could change the store instruction to return a special DebugToken type value which dbg.assign intrinsics take as an argument. If the store is deleted, the store-token argument gets RAUW(undef) as per D80264, and then in step 03 we simply look to see if that argument is undef or not. We can also reason about movement of the store relative to the position of the dbg.assign safely in step 03 because we know exactly which store the dbg.assign is associated with.

Here is an example showing what that process looks like with DSE:

=== IR snippet from the frontend ===

%dest = alloca i32

%one = …

%two = …

%token.first = store i32 %one, i32* %dest

dbg.assign(%token.first, %one, %dest, “my_var”, DIExpression())

%token.second = store i32 &two, i32* %dest

dbg.assign(%token.second, %two, %dest, “my_var”, DIExpression())

=== DSE the first store ===

%dest = alloca i32

%one = …

%two = …

; ### First store DSE’d, RAUW(undef)

dbg.assign(undef, %one, %dest, “my_var”, DIExpression())

%token.second = store i32 &two, i32* %dest

dbg.assign(%token.second, %two, %dest, “my_var”, DIExpression())

=== Lower to MIR ===

DBG_VALUE %one, “my_var”, DIExpression() ; Use value componnent because the first dbg.assign argument is undef.

MOV32mr %dest, %two

DBG_VALUE %dest, “my_var”, DIExpression(DW_OP_deref) ; Use dest component + deref because the first dbg.assign argument is not undef.

Finally, imagine that %one also gets DCE’d after the store using it is removed. Its metadata use in the dbg.assign will get RAUW(undef), and when we lower to MIR we’ll insert a DBG_VALUE $noreg just like we would for a dbg.value(undef, …), correctly indicting that there is no location for the variable available here.

An alternative idea

I also hoped to get your opinion on an idea for an alternative solution. I was trying to come up with something that would involve less work than these other ideas. It works in a similar way to your Coffee Chat idea, but doesn’t require that link back to stores. The idea is basically to use stores to track assignments to alloca variables, and solve the DSE problem by inserting a new intrinsic dbg.store(value, dest) every time a store is deleted.

When mem2reg promotes allocas it converts the dbg.declare to dbg.values as per usual, and also inserts dbg.values after each dbg.store using the alloca. Later, if the alloca hasn’t been promoted, we can determine whether a dbg.declare using it is still valid for the entire lifetime of the variable by looking for dbg.stores using the alloca. If there are none then no stores have been removed and we know the stack location is valid. Otherwise, we convert the dbg.stores to dbg.values and insert dbg.value+deref after the remaining (real) stores.

  1. Remove LowerDbgDeclare.

  2. Keep dbg.declare and dbg.value semantics as they are now.

  3. When a store is deleted insert a dbg.store like this:

store value, dest → dbg.store(value, dest, DIExpression())

  1. mem2reg keeps its existing pipeline for dbg.declare → dbg.value when promoting, with this addition:

  2. For each dbg.store(value, dest, expr) using the alloca:

  3. Insert a dbg.value(value, var, expr) before the dbg.store. var taken from the dbg.declare.

  4. Delete all dbg.store(_, dest, _) using the alloca.

  5. For each dbg.declare(dest, var, expr) that exists when we get to ISel:

  6. If there are no dbg.store(_, dest) using dest, emit frame-index variable.

  7. Else for each dbg.store(value, dest, expr) using dest:

  8. Insert a DBG_VALUE(value, var, expr) before the dbg.store.

  9. For each (real) store to dest:

  10. Insert DBG_VALUE(dest, var, DIExpression(DW_OP_deref)) before the store.

This solves the DSE problem, and gives us better location coverage than LowerDbgDeclare for the same reasons that the Coffee Chat and Bugzilla methods do: we’d be end up with dbg.value+deref after stores, and dbg.values after removed dead stores, instead of what we get now which is dbg.values after both stores and removed stores.

The advantage of this method is that it shouldn’t involve too much implementation work. The biggest downside is that it doesn’t account for code motion moving stores, which is a regression from LowerDbgDeclare and is a problem that the Coffee Chat method doesn’t suffer from. If both dbg.stores and (real) stores represent assignments to alloca variables, moving stores around causes the program location of the assignments to differ from those in the source code. Unfortunately, I don’t think this solution is viable unless we can adjust it to account for that. What do you think?

Many thanks,

Orlando

[1] Bugzilla method: https://bugs.llvm.org/show_bug.cgi?id=34136#c25
[2] Coffee Chat method: https://groups.google.com/g/llvm-dev/c/HX8HL1f3sw0/m/9ylo8-CnDAAJ