RFC: Introduce DW_OP_LLVM_memory to describe variables in memory with dbg.value

Debug info today handles two cases reasonably well:

  1. At -O0, dbg.declare does a good job describing variables that live at some known stack offset
  2. With optimizations, variables promoted to SSA can be described with dbg.value

This leaves behind a large hole in our optimized debug info: variables that cannot be promoted, typically because they are address-taken. This is https://llvm.org/pr34136, and this RFC is mostly about addressing that.

The status today is that instcombine removes all dbg.declares and heuristically inserts dbg.values where it can identify the value of the variable in question. This prevents us from having misleading debug info, but it throws away information about the variable’s location in memory.

Part of the reason that instcombine discards dbg.declares is that we can’t mix and match dbg.value with dbg.declare. If the backend sees a dbg.declare, it accepts that information as more reliable and discards all DBG_VALUE instructions associated with that variable. So, we need something we can mix. We need a way to say, the variable lives in memory at this program point, and it might live somewhere else later on. I propose that we introduce DW_OP_LLVM_memory for this purpose, and then we transition from dbg.declare to dbg.value+DW_OP_LLVM_memory.

Initially I believed that DW_OP_deref was the way to say this with existing DWARF expression opcodes, but I implemented that in https://reviews.llvm.org/D37311 and learned more about how DWARF expressions work. When a debugger begins evaluating a DWARF expression, it assumes that the resulting value will be a pointer to the variable in memory. For a debugger, this makes sense, because debug builds put things in memory and even after optimization many variables must be spilled. Only the special DW_OP_regN and DW_OP_stack_value expression opcodes change the location of the value from memory to register or stack value.

LLVM SSA values obviously do not have an address that we can take and they don’t live in registers, so neither the default memory location model nor DW_OP_regN make sense for LLVM’s dbg.value. We could hypothetically repurpose DW_OP_stack_value to indicate that the SSA value passed to llvm.dbg.value is the variable’s value, and if the expression lacks DW_OP_stack_value, it must be a the address of the value. However, that is backwards incompatible and it seems like quite a stretch.

DW_OP_LLVM_memory would be very similar to DW_OP_stack_value, though. It would only be valid at the end of a DIExpression. The backend will always remove it because the debugger will assume the variable lives in memory unless it is told otherwise.

For the original problem of improving optimized debug info while avoiding inaccurate information in the presence of dead store elimination, consider this C example:
int x = 42; // Can DSE
dostuff(x); // Can propagate 42
x = computation(); // Post-dominates x = 42 store
escape(&x);

We should be able to do this:
int x; // eliminate x = 42 store
dbg.value(!x, 42, !DIExpression()) // mark x as the constant 42 in debug info
dostuff(42); // propagate 42
dbg.value(!x, &x, !DIExpression(DW_OP_LLVM_memory)) // x is in memory again
x = computation();
escape(&x);

Passes that delete stores would be responsible for checking if the store destination is part of an alloca with associated dbg.value instructions. They would emit a new dbg.value instruction for that variable with the stored value, and clone the dbg.value instruction that puts the variable back in memory before the killing store. If the store is dead because variable lifetime is ending, the second dbg.value is unnecessary.

This will also allow us to fix debug info for px in this example:
void attribute((optnone, noinline)) usevar(int *x) {}
int main(int argc, char **argv) {
int x = 42;
int *px = &x;
usevar(&x);
if (argc) usevar(px);
}

Today, we emit a location for px like DW_OP_breg7 RSP+12, which gives it the incorrect value 42. This is because our DBG_VALUE instruction for px’s location uses a frame index, which we assume is in memory. This is not the case, px is not in memory, it’s value is a stack object pointer.

Please reply if you have any thoughts on this proposal. Adrian and I hashed this out over Bugzilla, IRC, and in person, so it shouldn’t be too surprising. Let me know if you want to be CC’d on the patches.

Hi Reid, thanks for writing such a clear summary of the problem that
this RFC addresses. I was wondering if there is any sort of
methodology for quantifying the "quality" of debug information? e.g.
the gap between the current debug information and 'ideal' debug info?

Thanks,

Alex

Debug info today handles two cases reasonably well:

  1. At -O0, dbg.declare does a good job describing variables that live at some known stack offset
  2. With optimizations, variables promoted to SSA can be described with dbg.value

This leaves behind a large hole in our optimized debug info: variables that cannot be promoted, typically because they are address-taken. This is https://llvm.org/pr34136, and this RFC is mostly about addressing that.

The status today is that instcombine removes all dbg.declares and heuristically inserts dbg.values where it can identify the value of the variable in question. This prevents us from having misleading debug info, but it throws away information about the variable’s location in memory.

Part of the reason that instcombine discards dbg.declares is that we can’t mix and match dbg.value with dbg.declare. If the backend sees a dbg.declare, it accepts that information as more reliable and discards all DBG_VALUE instructions associated with that variable. So, we need something we can mix. We need a way to say, the variable lives in memory at this program point, and it might live somewhere else later on. I propose that we introduce DW_OP_LLVM_memory for this purpose, and then we transition from dbg.declare to dbg.value+DW_OP_LLVM_memory.

Initially I believed that DW_OP_deref was the way to say this with existing DWARF expression opcodes, but I implemented that in https://reviews.llvm.org/D37311 and learned more about how DWARF expressions work. When a debugger begins evaluating a DWARF expression, it assumes that the resulting value will be a pointer to the variable in memory. For a debugger, this makes sense, because debug builds put things in memory and even after optimization many variables must be spilled. Only the special DW_OP_regN and DW_OP_stack_value expression opcodes change the location of the value from memory to register or stack value.

LLVM SSA values obviously do not have an address that we can take and they don’t live in registers, so neither the default memory location model nor DW_OP_regN make sense for LLVM’s dbg.value. We could hypothetically repurpose DW_OP_stack_value to indicate that the SSA value passed to llvm.dbg.value is the variable’s value, and if the expression lacks DW_OP_stack_value, it must be a the address of the value. However, that is backwards incompatible and it seems like quite a stretch.

Seems like a stretch in what sense? The backwards incompatibility is certainly something to consider (though we went through that with DW_OP_bit_piece too), but this seems like the design I’d go to first so I’d like to better understand why it’s not the path forward if there’s some more detail about that aspect of the design choice here.

I guess you described this already, but talking it through for myself/maybe others will find this useful:

So since we don’t have DW_OP_regN for LLVM registers, we could sort of assume the implicit first value on the stack is a pseudo-OP_regN of the LLVM SSA register.

To support that, all existing uses would need no changes to match the DWARF model of registers being implicitly direct values.

Code that wanted to describe the register as containing the memory address of the interesting thing would use DW_OP_stack_value to say “this location description that is a register is really an address you should follow to find the value, not a direct value itself”?

But code that wanted to describe a variable as being 3 bytes ahead of a pointer in an LLVM SSA register would only have “plus 3” in the expression stack, since then it’s no longer a direct value but is treated as a pointer to the value. I guess this is where the ambiguity would come in - currently how does “plus 3” get interpreted when seen in LLVM IR, I guess that’s meant to describe reg value + 3 as being the immediate value of the variable? (so it’s implicitly OP_stack_value? & OP_stack_value is added somewhere in the DWARF backend?)

Thanks,

  • Dave

Hi Alex,

The usual oracle for execution-related debug info is "what you get at -O0."
More abstractly, you would like to be able to stop at any source statement,
and have any in-scope variable's value available for display (and possibly
for modification).

At my previous employer we built a tool that compared the set of breakpoint
locations at -O0 to -O1/-O2, and also compared the ranges where a variable
had a defined location to the range of code for its lexical scope. This
gives you computable metrics such that you can readily tell when things have
gotten "better" or "worse" compared to some previous compiler version (or
competitor compiler).

There's no open-source tool that does this at the moment, that I'm aware of,
although we are working on publishing our DIVA tool (see our lightning talk
at EuroLLVM earlier this year).
--paulr

Hi Alex,

The usual oracle for execution-related debug info is "what you get at -O0."
More abstractly, you would like to be able to stop at any source statement,
and have any in-scope variable's value available for display (and possibly
for modification).

At my previous employer we built a tool that compared the set of breakpoint
locations at -O0 to -O1/-O2, and also compared the ranges where a variable
had a defined location to the range of code for its lexical scope. This
gives you computable metrics such that you can readily tell when things have
gotten "better" or "worse" compared to some previous compiler version (or
competitor compiler).

That sounds great. It would also be interesting to compare these
metrics between different targets, which might indicate whether a
target-specific optimisation is damaging debug info quality vs another
target. Of course this would only give a very rough comparison.

There's no open-source tool that does this at the moment, that I'm aware of,
although we are working on publishing our DIVA tool (see our lightning talk
at EuroLLVM earlier this year).

I'd seen the slides from that talk, and eagerly await the open sourcing of DIVA!

Thanks,

Alex

It’s worth remembering that there are two syntactically similar but semantically different kinds of “expression” in DWARF.

A DWARF expression computes a value; if the available value is a pointer, you add DW_OP_deref to express the pointed-to value. A DWARF location expression computes a location, and adds various operators to express locations that a (value) expression cannot, such as DW_OP_regx. You also have DW_OP_stack_value to say “just kidding, this location expression is a value expression.”

So, whether we want to start throwing around deref or stack_value or regx (implicit or explicit) really depends on whether we are going to be using value expressions or location expressions. Let’s not start mixing them up, it will just make the discussion more confusing.

–paulr

It’s worth remembering that there are two syntactically similar but semantically different kinds of “expression” in DWARF.

A DWARF expression computes a value; if the available value is a pointer, you add DW_OP_deref to express the pointed-to value. A DWARF location expression computes a location, and adds various operators to express locations that a (value) expression cannot, such as DW_OP_regx. You also have DW_OP_stack_value to say “just kidding, this location expression is a value expression.”

So, whether we want to start throwing around deref or stack_value or regx (implicit or explicit) really depends on whether we are going to be using value expressions or location expressions. Let’s not start mixing them up, it will just make the discussion more confusing.

Where do non-location DWARF expressions appear?

So, whether we want to start throwing around deref or stack_value or
regx (implicit or explicit) really depends on whether we are going to
be using value expressions or location expressions. Let's not start
mixing them up, it will just make the discussion more confusing.

Where do non-location DWARF expressions appear?

For things that might or might not be constants. I have used non-location
DWARF expressions to describe bounds for variable-size arrays. I can
imagine using them for constexpr situations as well.
--paulr

So, whether we want to start throwing around deref or stack_value or
regx (implicit or explicit) really depends on whether we are going to
be using value expressions or location expressions. Let’s not start
mixing them up, it will just make the discussion more confusing.

Where do non-location DWARF expressions appear?

For things that might or might not be constants. I have used non-location
DWARF expressions to describe bounds for variable-size arrays. I can
imagine using them for constexpr situations as well.

Fair enough. The only things in this discussion are location expressions, I believe - good to have the terminology to be clear about that.

  • Dave

LLVM SSA values obviously do not have an address that we can take and
they don’t live in registers, so neither the default memory location model
nor DW_OP_regN make sense for LLVM’s dbg.value. We could hypothetically
repurpose DW_OP_stack_value to indicate that the SSA value passed to
llvm.dbg.value *is* the variable’s value, and if the expression lacks
DW_OP_stack_value, it must be a the address of the value. However, that is
backwards incompatible and it seems like quite a stretch.

Seems like a stretch in what sense? The backwards incompatibility is
certainly something to consider (though we went through that with
DW_OP_bit_piece too), but this seems like the design I'd go to first so I'd
like to better understand why it's not the path forward if there's some
more detail about that aspect of the design choice here.

I guess you described this already, but talking it through for
myself/maybe others will find this useful:

So since we don't have DW_OP_regN for LLVM registers, we could sort of
assume the implicit first value on the stack is a pseudo-OP_regN of the
LLVM SSA register.

Yep, that's how we use DIExpressions in both IR and MIR: The LHS of the
dbg.value and DBG_VALUE instructions are a register-like value that gets
pushed onto the expression stack. The DWARF asmprinter does some expression
folding to clean things up, but that's the model.

To support that, all existing uses would need no changes to match the
DWARF model of registers being implicitly direct values.

Code that wanted to describe the register as containing the memory address
of the interesting thing would use DW_OP_stack_value to say "this location
description that is a register is really an address you should follow to
find the value, not a direct value itself"?

But code that wanted to describe a variable as being 3 bytes ahead of a
pointer in an LLVM SSA register would only have "plus 3" in the expression
stack, since then it's no longer a direct value but is treated as a pointer
to the value. I guess this is where the ambiguity would come in - currently
how does "plus 3" get interpreted when seen in LLVM IR, I guess that's
meant to describe reg value + 3 as being the immediate value of the
variable? (so it's implicitly OP_stack_value? & OP_stack_value is added
somewhere in the DWARF backend?)

Our model today is inconsistent. In LLVM IR today, the SSA value of the
dbg.value *is* the interesting value, it is not the address, and we
typically use empty DIExpressions. If the value is ultimately register
allocated and the DIExpression is empty, we will emit a DW_OP_regN location
expression. If the value is spilled, we usually don't need to append
DW_OP_stack_value because the location is now a memory location, which can
be described by DW_OP_[f]breg.

Today, passes that want to add "plus 3" to a DIExpression go out of their
way to add DW_OP_stack_value to the DIExpression because the backend won't
do it for us, even though dbg.value normally describes the value, not an
address.

To explore the alternative DW_OP_stack_value model, here's how I'd go about
it:
1. Replace llvm.dbg.value with new intrinsic, llvm.dbg.loc, to make the
semantic change clear. It can express both an address or a value, depending
on the DIExpression.
2. Auto-upgrade llvm.dbg.value to llvm.dbg.loc. Append DW_OP_stack_value to
the DIExpression argument of the intrinsic.
3. Auto-upgrade llvm.dbg.declare to llvm.dbg.loc, leave the DIExpression
alone. The LHS of llvm.dbg.declare is already the address of the variable.
4. Eliminate the second operand of DBG_VALUE MachineInstrs. Indirect
DBG_VALUES are now expressed with a DIExpression that lacks
DW_OP_stack_value at the end.
5. Teach our DWARF expression emitter to combine the new expressions as
necessary. In particular, we can elide DW_OP_stack_value for DBG_VALUEs
with physical register operands. They just use DW_OP_regN, which is
implicitly a value location.
6. Teach all passes that spill virtual registers used by DBG_VALUE to
remove DW_OP_stack_value from the DIExpression, or add DW_OP_deref as
appropriate.

This should be equivalent to DW_OP_LLVM_memory, and more inline with DWARF
location expression semantics, but it has a large migration cost.

How about adding dbg.loc without eliminating dbg.value? Then you can distinguish the cases without abusing DW_OP_stack_value so horribly.

–paulr

Inconsistent between what and what? LLVM and DWARF? Yeah, I guess there’s some mismatch between the semantics, though I’m still having trouble wrapping my head around it.

Don’t think I’ve got any mental model of what you mean by this phrase (‘chain of loads’) - could you provide an example or the like?

When would that not be valid today/without LLVM_memory? Sorry, again - it’s all a bit fuzzy in my head.

There’d be some canonicalization opportunities, but not seeing the correctness issues with being able to prepend onto the location list - seems like that might be true with LLVM_memory too… maybe?

nod my worry is ending up with 3 different representations - DWARF, CodeView, and the increasingly divergent IRDWARF (especially since it’s “sort of like DWARF” which makes the few divergences more costly/difficult).

What would that look like?

Would have to think some more - maybe there’s a way to avoid the rename? But yeah, don’t have a problem with llvm.dbg.loc - as you say/implied, it’d match the new semantics better.

But really, your original proposal’s probably OK/close enough to go ahead. I don’t feel that strongly.

  • Dave

I guess you described this already, but talking it through for
myself/maybe others will find this useful:

So since we don't have DW_OP_regN for LLVM registers, we could sort of
assume the implicit first value on the stack is a pseudo-OP_regN of the
LLVM SSA register.

Yep, that's how we use DIExpressions in both IR and MIR: The LHS of the
dbg.value and DBG_VALUE instructions are a register-like value that gets
pushed onto the expression stack. The DWARF asmprinter does some expression
folding to clean things up, but that's the model.

To support that, all existing uses would need no changes to match the
DWARF model of registers being implicitly direct values.

Code that wanted to describe the register as containing the memory
address of the interesting thing would use DW_OP_stack_value to say "this
location description that is a register is really an address you should
follow to find the value, not a direct value itself"?

But code that wanted to describe a variable as being 3 bytes ahead of a
pointer in an LLVM SSA register would only have "plus 3" in the expression
stack, since then it's no longer a direct value but is treated as a pointer
to the value. I guess this is where the ambiguity would come in - currently
how does "plus 3" get interpreted when seen in LLVM IR, I guess that's
meant to describe reg value + 3 as being the immediate value of the
variable? (so it's implicitly OP_stack_value? & OP_stack_value is added
somewhere in the DWARF backend?)

Our model today is inconsistent.

Inconsistent between what and what? LLVM and DWARF? Yeah, I guess there's
some mismatch between the semantics, though I'm still having trouble
wrapping my head around it.

I mean LLVM's model is internally inconsistent. We have bugs like the RVO
one that you filed (https://llvm.org/pr34513), where we forget if the debug
value is an address or a value.

In LLVM IR today, the SSA value of the dbg.value *is* the interesting

value, it is not the address, and we typically use empty DIExpressions. If
the value is ultimately register allocated and the DIExpression is empty,
we will emit a DW_OP_regN location expression. If the value is spilled, we
usually don't need to append DW_OP_stack_value because the location is now
a memory location, which can be described by DW_OP_[f]breg.

Today, passes that want to add "plus 3" to a DIExpression go out of their
way to add DW_OP_stack_value to the DIExpression because the backend won't
do it for us, even though dbg.value normally describes the value, not an
address.

To explore the alternative DW_OP_stack_value model, here's how I'd go
about it:
1. Replace llvm.dbg.value with new intrinsic, llvm.dbg.loc, to make the
semantic change clear. It can express both an address or a value, depending
on the DIExpression.
2. Auto-upgrade llvm.dbg.value to llvm.dbg.loc. Append DW_OP_stack_value
to the DIExpression argument of the intrinsic.
3. Auto-upgrade llvm.dbg.declare to llvm.dbg.loc, leave the DIExpression
alone. The LHS of llvm.dbg.declare is already the address of the variable.
4. Eliminate the second operand of DBG_VALUE MachineInstrs. Indirect
DBG_VALUES are now expressed with a DIExpression that lacks
DW_OP_stack_value at the end.
5. Teach our DWARF expression emitter to combine the new expressions as
necessary. In particular, we can elide DW_OP_stack_value for DBG_VALUEs
with physical register operands. They just use DW_OP_regN, which is
implicitly a value location.
6. Teach all passes that spill virtual registers used by DBG_VALUE to
remove DW_OP_stack_value from the DIExpression, or add DW_OP_deref as
appropriate.

This should be equivalent to DW_OP_LLVM_memory, and more inline with
DWARF location expression semantics, but it has a large migration cost.

---

I think part of the reason I wanted to move in the DW_OP_LLVM_memory
direction is that I originally wanted to add a memory offset operand to it.
Our actual use cases for complex DWARF expressions typically come from
things like safestack, ASan, and blocks. What these all have in common is
that they gather up a number of variables and move them off into a struct
in heap memory. This is very similar to what happens when we spill a
virtual register: instead of describing a register, we modify the
expression to load the value from some FP register with an offset. I think
the right representation for these transforms is basically a "chain of
loads".

Don't think I've got any mental model of what you mean by this phrase
('chain of loads') - could you provide an example or the like?

Suppose you have a captured variable with __block shared storage, and then
suppose you compile it with ASan and safestack, and then the safestack
pointer is spilled. To compute the value, the debugger starts from a
register, goes to an offset, and loads a pointer, repeating the process
until it finds the value. As we proceed through codegen, we effectively
build up the chain.

1. To implement __block in Clang, we use dbg.declare(%block_descriptor,
DW_OP_deref, DW_OP_constu_plus $offset)
2. Assuming the block descriptor lived in an alloca (which it doesn't, but
assume it does for argument's sake), asan will move that alloca onto the
heap to implement use-after-return detection. It will prepend "DW_OP_deref,
DW_OP_constu, $offset" to the DIExpression.
3. If ASan put its value in an alloca and safestack wanted to move that
alloca to the safe stack (again bear with me), it would do the same:
prepend deref+offset.
4. Finally, spilling the safe stack pointer to the control stack would mean
prepending deref+offset.

This seems like a really common pattern. Right now this offsetting and
loading has to be expressed as separate location expression opcodes. The
DW_OP_deref opcode functions like a load sequencing operation that can only
appear between two offsets, although an offset could be zero, in which case
there would be no DW_OP_constu_plus opcode. I'm suggesting we move to a
representation where the offset and the deref are one. Think "semicolon as
sequencing operator" vs. "semicolon as statement terminator". When we use
DW_OP_LLVM_memory or DW_OP_deref, the variable must always live in memory,
we're always doing address calculation. We should try to make our
representation more closely match the set of things we actually want to do
and support.

That's kind of the gist of what I had in mind. I didn't think it was worth
it, which is why I pared the proposal down to just "the opposite of
DW_OP_stack_value".

I was imagining that DW_OP_LLVM_memory with an offset would be that load

chain link.

The idea behind this representation is that it should make it easy for
spilling transforms to prepend a load chain onto the expression, rather
than them having to individually discover if DW_OP_deref is needed, or call
some common helper like DIExpression::prepend. It should always be valid to
push on a new load with an offset.

When would that not be valid today/without LLVM_memory? Sorry, again -
it's all a bit fuzzy in my head.

There'd be some canonicalization opportunities, but not seeing the
correctness issues with being able to prepend onto the location list -
seems like that might be true with LLVM_memory too... maybe?

The correctness issue with today's prepending of offsets and deref is that
it's hard to know when to insert deref, because we don't know if an
expression describes an address or a value. We have code like this in
buildDbgValueForSpill:
  // If the DBG_VALUE already was a memory location, add an extra
  // DW_OP_deref. Otherwise just turning this from a register into a
  // memory/indirect location is sufficient.
  if (IsIndirect)
    Expr = DIExpression::prepend(Expr, DIExpression::WithDeref);

We modify DBG_VALUEs for spills in several other places in codegen and they
don't all correctly insert DW_OP_deref. The load chain representation
should make it easy to just modify the offset on the front or add a new
load depending on what's being done.

It also has the advantage that it will be easier to translate to CodeView

than arbitrary DWARF expressions, which we are currently canonicalizing
into a load chain and then attempting to emit.

*nod* my worry is ending up with 3 different representations - DWARF,
CodeView, and the increasingly divergent IRDWARF (especially since it's
"sort of like DWARF" which makes the few divergences more costly/difficult).

Does that make sense? I'm starting to feel like I should either pursue
the more ambitious load chain design,

What would that look like?

Just `DW_OP_LLVM_memory, 8, DW_OP_LLVM_memory, 20, ...` in DIExpression
through IR. It's OK to insert more DWARF opcodes between the links, it's
just non-canonical if they are pointer offsetting opcodes that could be
folded into the memory opcode. The DWARF expression backend would fold it
into the same location expressions we have today.

or consistently apply DW_OP_stack_value to llvm.dbg.loc (alternative names

welcome).

Would have to think some more - maybe there's a way to avoid the rename?
But yeah, don't have a problem with llvm.dbg.loc - as you say/implied, it'd
match the new semantics better.

I don't think so. =/ I think the rename is the only safe way to maintain
bitcode compatibility.

But really, your original proposal's probably OK/close enough to go ahead.
I don't feel that strongly.

Makes sense. That's basically where I ended up, but now I'm reconsidering
the dbg.loc+DW_OP_stack_value thing, to bring DIExpressions closer to DWARF.

Feels to me like bugs rather than inconsistencies (I’d think of an inconsistency as “we do X over here intentionally and Y over here intentionally but they’re in contradiction to one another”)

Guessing it’s the other way (constu_plus(offset), deref)? But in any case…

Seems like a size optimization more than anything? Which I could get behind if there is data to support the optimization as being significantly valuable (& ideally I’d probably favor the general representation first, etc - but I understand if the special-cased representation has other side benefits (like reducing the cost to add support, etc) it might be valuable, but trades off with the “more special cases/non-DWARFian things in our pseudo-DWARF IR”)

Sure - this goes back to my adding the “indirect” flag to support C++ non-trivial pass by value in C++, before we had all the more general expression support, etc. (hilariously, what was there before was even more awesome: we encoded the type of the parameter in the DWARF as T& instead of T… literally changing the signature of the function… )

Either LLVM_memory or stack_value approaches would remove the prepending issues & I agree getting rid of the ad-hoc/separate ‘indirect’ flag is good, just haggling over how best to do that.

But I /think/ a stack_value based approach would allow that too, right? I’m probably missing something.

nod I’m not sure either way, maybe need to come back to a summary of this again after going through these discussions.

Feels to me like bugs rather than inconsistencies (I'd think of an
inconsistency as "we do X over here intentionally and Y over here
intentionally but they're in contradiction to one another")

The DBG_VALUE MachineInstr actually already has a way to indicate that the
computed value is an address, so I can accept that the backend design is
logically consistent with some bugs.

The middle-end, though, needs a new design for optimized debug info for
variables that live in memory. Right now LowerDbgDeclare creates dbg.value
instructions that use the *address* of a variable, rather than its value,
when it encounters a call that directly uses the variable's alloca. This is
inconsistent, and we need a way to say "address" not "value".

I would like whatever we do between the backend and the middle end to be
somewhat consistent.

In LLVM IR today, the SSA value of the dbg.value *is* the interesting

value, it is not the address, and we typically use empty DIExpressions. If
the value is ultimately register allocated and the DIExpression is empty,
we will emit a DW_OP_regN location expression. If the value is spilled, we
usually don't need to append DW_OP_stack_value because the location is now
a memory location, which can be described by DW_OP_[f]breg.

Today, passes that want to add "plus 3" to a DIExpression go out of
their way to add DW_OP_stack_value to the DIExpression because the backend
won't do it for us, even though dbg.value normally describes the value, not
an address.

To explore the alternative DW_OP_stack_value model, here's how I'd go
about it:
1. Replace llvm.dbg.value with new intrinsic, llvm.dbg.loc, to make the
semantic change clear. It can express both an address or a value, depending
on the DIExpression.
2. Auto-upgrade llvm.dbg.value to llvm.dbg.loc. Append
DW_OP_stack_value to the DIExpression argument of the intrinsic.
3. Auto-upgrade llvm.dbg.declare to llvm.dbg.loc, leave the
DIExpression alone. The LHS of llvm.dbg.declare is already the address of
the variable.
4. Eliminate the second operand of DBG_VALUE MachineInstrs. Indirect
DBG_VALUES are now expressed with a DIExpression that lacks
DW_OP_stack_value at the end.
5. Teach our DWARF expression emitter to combine the new expressions as
necessary. In particular, we can elide DW_OP_stack_value for DBG_VALUEs
with physical register operands. They just use DW_OP_regN, which is
implicitly a value location.
6. Teach all passes that spill virtual registers used by DBG_VALUE to
remove DW_OP_stack_value from the DIExpression, or add DW_OP_deref as
appropriate.

This should be equivalent to DW_OP_LLVM_memory, and more inline with
DWARF location expression semantics, but it has a large migration cost.

---

I think part of the reason I wanted to move in the DW_OP_LLVM_memory
direction is that I originally wanted to add a memory offset operand to it.
Our actual use cases for complex DWARF expressions typically come from
things like safestack, ASan, and blocks. What these all have in common is
that they gather up a number of variables and move them off into a struct
in heap memory. This is very similar to what happens when we spill a
virtual register: instead of describing a register, we modify the
expression to load the value from some FP register with an offset. I think
the right representation for these transforms is basically a "chain of
loads".

Don't think I've got any mental model of what you mean by this phrase
('chain of loads') - could you provide an example or the like?

Suppose you have a captured variable with __block shared storage, and
then suppose you compile it with ASan and safestack, and then the safestack
pointer is spilled. To compute the value, the debugger starts from a
register, goes to an offset, and loads a pointer, repeating the process
until it finds the value. As we proceed through codegen, we effectively
build up the chain.

1. To implement __block in Clang, we use dbg.declare(%block_descriptor,
DW_OP_deref, DW_OP_constu_plus $offset)

Guessing it's the other way (constu_plus(offset), deref)? But in any
case...

We'd have a deref if %block_descriptor was actually an alloca containing a
pointer, which is what I'm assuming for this example. Deref loads the
pointer, then we go to some offset, which is the final address. This is a
dbg.declare call, so this should compute an address.

2. Assuming the block descriptor lived in an alloca (which it doesn't, but

assume it does for argument's sake), asan will move that alloca onto the
heap to implement use-after-return detection. It will prepend "DW_OP_deref,
DW_OP_constu, $offset" to the DIExpression.
3. If ASan put its value in an alloca and safestack wanted to move that
alloca to the safe stack (again bear with me), it would do the same:
prepend deref+offset.
4. Finally, spilling the safe stack pointer to the control stack would
mean prepending deref+offset.

This seems like a really common pattern. Right now this offsetting and
loading has to be expressed as separate location expression opcodes. The
DW_OP_deref opcode functions like a load sequencing operation that can only
appear between two offsets, although an offset could be zero, in which case
there would be no DW_OP_constu_plus opcode. I'm suggesting we move to a
representation where the offset and the deref are one.

Seems like a size optimization more than anything? Which I could get
behind if there is data to support the optimization as being significantly
valuable (& ideally I'd probably favor the general representation first,
etc - but I understand if the special-cased representation has other side
benefits (like reducing the cost to add support, etc) it might be valuable,
but trades off with the "more special cases/non-DWARFian things in our
pseudo-DWARF IR")

It's supposed to be a canonicalization, a representation shift designed to
make it easier to think about what it is that we're actually doing. We
could replace LLVM IR GEP with something lower-level like ADD and MUL, but
we have the high-level thing because it makes it easier for compiler
writers to think about it.

If this design isn't helping you understand what's happening, though, it's
probably not useful. I'm coming at it from the angle of: DWARF location
expressions seem terribly general and unstructured. Pattern matching them
to codeview is hard. If we limited the representation to only represent the
kinds of expressions we care about, it would be easier to translate and
manipulate them.

Anyway, I'm pretty convinced we aren't going to pursue this design.

Think "semicolon as sequencing operator" vs. "semicolon as statement

terminator". When we use DW_OP_LLVM_memory or DW_OP_deref, the variable
must always live in memory, we're always doing address calculation. We
should try to make our representation more closely match the set of things
we actually want to do and support.

That's kind of the gist of what I had in mind. I didn't think it was
worth it, which is why I pared the proposal down to just "the opposite of
DW_OP_stack_value".

I was imagining that DW_OP_LLVM_memory with an offset would be that load

chain link.

The idea behind this representation is that it should make it easy for
spilling transforms to prepend a load chain onto the expression, rather
than them having to individually discover if DW_OP_deref is needed, or call
some common helper like DIExpression::prepend. It should always be valid to
push on a new load with an offset.

When would that not be valid today/without LLVM_memory? Sorry, again -
it's all a bit fuzzy in my head.

There'd be some canonicalization opportunities, but not seeing the
correctness issues with being able to prepend onto the location list -
seems like that might be true with LLVM_memory too... maybe?

The correctness issue with today's prepending of offsets and deref is
that it's hard to know when to insert deref, because we don't know if an
expression describes an address or a value. We have code like this in
buildDbgValueForSpill:
  // If the DBG_VALUE already was a memory location, add an extra
  // DW_OP_deref. Otherwise just turning this from a register into a
  // memory/indirect location is sufficient.
  if (IsIndirect)
    Expr = DIExpression::prepend(Expr, DIExpression::WithDeref);

Sure - this goes back to my adding the "indirect" flag to support C++
non-trivial pass by value in C++, before we had all the more general
expression support, etc. (hilariously, what was there before was even more
awesome: we encoded the type of the parameter in the DWARF as T& instead of
T... literally changing the signature of the function... )

Awesome, Bob just did the same thing for CodeView for NRVO in r312034,
because we don't have DW_OP_deref.

Either LLVM_memory or stack_value approaches would remove the prepending
issues & I agree getting rid of the ad-hoc/separate 'indirect' flag is
good, just haggling over how best to do that.

We modify DBG_VALUEs for spills in several other places in codegen and
they don't all correctly insert DW_OP_deref. The load chain representation
should make it easy to just modify the offset on the front or add a new
load depending on what's being done.

But I /think/ a stack_value based approach would allow that too, right?
I'm probably missing something.

These designs should be equivalent in power, the difference is supposed to
be in how easy they are to implement.

Oh, hey, right…

Is this what MSVC does? That does seem unfortunate.

They don't do NRVO in -O0 builds.

... and they don't describe variables affected by NRVO in optimized builds.

Our struggle is that Clang does NRVO at -O0 (that's good), and there is no
CV record capable of representing what we want to do in -O0, so we do the
reference thing.

Sorry for the delay in replying to this, I’ve been out sick for the last couple of days.

Debug info today handles two cases reasonably well:

  1. At -O0, dbg.declare does a good job describing variables that live at some known stack offset
  2. With optimizations, variables promoted to SSA can be described with dbg.value

This leaves behind a large hole in our optimized debug info: variables that cannot be promoted, typically because they are address-taken. This is https://llvm.org/pr34136, and this RFC is mostly about addressing that.

The status today is that instcombine removes all dbg.declares and heuristically inserts dbg.values where it can identify the value of the variable in question. This prevents us from having misleading debug info, but it throws away information about the variable’s location in memory.

Part of the reason that instcombine discards dbg.declares is that we can’t mix and match dbg.value with dbg.declare. If the backend sees a dbg.declare, it accepts that information as more reliable and discards all DBG_VALUE instructions associated with that variable. So, we need something we can mix. We need a way to say, the variable lives in memory at this program point, and it might live somewhere else later on. I propose that we introduce DW_OP_LLVM_memory for this purpose, and then we transition from dbg.declare to dbg.value+DW_OP_LLVM_memory.

Initially I believed that DW_OP_deref was the way to say this with existing DWARF expression opcodes, but I implemented that in https://reviews.llvm.org/D37311 and learned more about how DWARF expressions work. When a debugger begins evaluating a DWARF expression, it assumes that the resulting value will be a pointer to the variable in memory.

That is an oversimplification. DWARF distinguishes at least three different kinds of locations: register locations (for when a variable is in a particular register and only there, i.e., the debugger may write to the register to modify the variable’s value; think K&R C register variables), memory locations (which behave as you describe and allow the debugger to write to memory), implicit locations (DW_OP_stack_value, constants, …, where the debugger can’t write to modify). In LLVM we don’t distinguish these cases as much as we should.

For a debugger, this makes sense, because debug builds put things in memory and even after optimization many variables must be spilled. Only the special DW_OP_regN and DW_OP_stack_value expression opcodes change the location of the value from memory to register or stack value.

See above.

LLVM SSA values obviously do not have an address that we can take and they don’t live in registers, so neither the default memory location model nor DW_OP_regN make sense for LLVM’s dbg.value. We could hypothetically repurpose DW_OP_stack_value to indicate that the SSA value passed to llvm.dbg.value is the variable’s value, and if the expression lacks DW_OP_stack_value, it must be a the address of the value. However, that is backwards incompatible and it seems like quite a stretch.

I don’t think we should burden ourselves with backwards compatibility too much for this, if we have a good solution, I’m fine with having an upgrade that “looses” all incompatible dbg.value intrinsics once.
More to your point, in DWARF, DW_OP_stack_value means something different (see above) and I’d prefer to use the DWARF semantics for operators in DIEpressions and introduce custom DW_OP_LLVM_* ones where the semantics differ.

DW_OP_LLVM_memory would be very similar to DW_OP_stack_value, though. It would only be valid at the end of a DIExpression. The backend will always remove it because the debugger will assume the variable lives in memory unless it is told otherwise.

For the original problem of improving optimized debug info while avoiding inaccurate information in the presence of dead store elimination, consider this C example:
int x = 42; // Can DSE
dostuff(x); // Can propagate 42
x = computation(); // Post-dominates x = 42 store
escape(&x);

We should be able to do this:
int x; // eliminate x = 42 store
dbg.value(!x, 42, !DIExpression()) // mark x as the constant 42 in debug info
dostuff(42); // propagate 42
dbg.value(!x, &x, !DIExpression(DW_OP_LLVM_memory)) // x is in memory again
x = computation();
escape(&x);

Passes that delete stores would be responsible for checking if the store destination is part of an alloca with associated dbg.value instructions. They would emit a new dbg.value instruction for that variable with the stored value, and clone the dbg.value instruction that puts the variable back in memory before the killing store. If the store is dead because variable lifetime is ending, the second dbg.value is unnecessary.

This will also allow us to fix debug info for px in this example:
void attribute((optnone, noinline)) usevar(int *x) {}
int main(int argc, char **argv) {
int x = 42;
int *px = &x;
usevar(&x);
if (argc) usevar(px);
}

Today, we emit a location for px like DW_OP_breg7 RSP+12, which gives it the incorrect value 42. This is because our DBG_VALUE instruction for px’s location uses a frame index, which we assume is in memory. This is not the case, px is not in memory, it’s value is a stack object pointer.

Please reply if you have any thoughts on this proposal. Adrian and I hashed this out over Bugzilla, IRC, and in person, so it shouldn’t be too surprising. Let me know if you want to be CC’d on the patches.

I’m going to read through all the other replies now, before commenting on the concrete proposal.

thanks for writing this up,
Adrian

It is difficult. One thing we can do is measure the delta between the debug info quality of two compiler version (https://reviews.llvm.org/D36627 adds an option to collect various metrics to that end to llvm-dwarfdump).

– adrian