Defining the DIExpression operator DW_OP_LLVM_arg

Currently, two patches undergoing review wish to add a new operator for LLVM’s DIExpression: D70642[0], and D82363[1]. Though both of these cases use the same name, the operators have different meanings: in the former, it has the form DW_OP_LLVM_argN where N is in [0-7], and represents the Nth argument of the containing intrinsic; its purpose is to enable the intrinsic @llvm.dbg.derefval, which represents implicit pointers. In the latter, it has the form DW_OP_LLVM_arg, N, where N >= 0, and represents the Nth value in a variadic debug value, which is being added in the same patch. Functionally these are quite similar; the main difference is that the latter picks the Nth value from a subset of the intrinsic’s operands, while the former picks the Nth value from the entire containing intrinsic.

My personal opinion (possibly biased as the author of D82363) is that the best solution is to remove DW_OP_LLVM_argN from the implicit pointer work altogether. The operator introduced in that patch is more general-purpose than necessary; it is only used as DW_OP_LLVM_arg0 and in conjunction with @llvm.dbg.derefval, an intrinsic added in that patch. It seems at a glance that the functionality could be folded into the new intrinsic and operator pair by having them automatically refer to arg0. While it’s possible that this general-purpose operator could have value in the future, I think that it is better to have specific operators for each use case, rather than an operator that could refer to a machine value, a local variable, or even the DIExpression that contains it. Also of note, the main feature that this operator would be useful for implementing is the feature that D82363 is adding: referencing multiple machine values in a debug value. The difference is that the version of the operator in the more recent patch is explicitly used to represent machine values; it cannot refer to anything else.

Finally, the implicit pointer patches haven’t received an update in a while: the last comment or update was on January 10th. It is possible at this point that the work will not make it in for a very long time, and may be substantially changed when it does - I imagine there will be very little cost associated with using a different operator in that case.

I’m mainly looking to confirm that there are no objections to this, and also ensure that this change doesn’t sneak past anyone if it is accepted.

[0] https://reviews.llvm.org/D70642

[1] https://reviews.llvm.org/D82363

Thanks for bringing this up!

To summarize my understanding: Neither of these operators is strictly necessary, since the same effect can be achieved by implicitly pushing all operands of a DBG_VALUE to the stack, followed by a combination of DW_OP_dup, DW_OP_pick, DW_OP_swap, DW_OP_rot, and DW_OP_over. However, the resulting expressions can get very long and unwieldy and it is easier to generate efficient DWARF from DIExpressions that explicitly refer to arguments. DW_OP_LLVM_argN has the advantage of using less memory, at the cost of limiting the number of arguments to a hardcoded value.

By restricting their use to new intrinsics only (derefval and DBG_VALUE_VAR) we can avoid any confusion about whether argument 0 is implicitly pushed to the stack or not — in those new intrinsics, arguments always have to be referred to explicitly. Personally, I'm leaning more towards DW_OP_LLVM_argN, because of the space saving aspect.

Finally, to completely derail this discussion — once we have multi-arg support, we may be able to simplify parts of the debug intrinsic handling by combining multiple DW_OP_LLVM_fragment intrinsics describing the same variable into one multi-argument expression with multiple DW_OP_LLVM_fragments.

-- adrian

To summarize my understanding: Neither of these operators is strictly necessary, since the same effect can be achieved by implicitly pushing all operands of a DBG_VALUE to the stack, followed by a combination of DW_OP_dup, DW_OP_pick, DW_OP_swap, DW_OP_rot, and DW_OP_over. However, the resulting expressions can get very long and unwieldy and it is easier to generate efficient DWARF from DIExpressions that explicitly refer to arguments. DW_OP_LLVM_argN has the advantage of using less memory, at the cost of limiting the number of arguments to a hardcoded value.

I'd also add that since `DW_OP_pick N` picks the Nth argument from the top of the stack rather than the bottom, using it would require us to know how many elements are on the stack at each point in the expression, and update the pick depth if this changes. Also, for the sake of not bloating the final DWARF we wouldn't want to actually push the args onto the stack first and then pick them, so we'd basically be treating the final DW_OP_pick operators as DW_OP_LLVM_arg either way.

By restricting their use to new intrinsics only (derefval and DBG_VALUE_VAR) we can avoid any confusion about whether argument 0 is implicitly pushed to the stack or not — in those new intrinsics, arguments always have to be referred to explicitly. Personally, I'm leaning more towards DW_OP_LLVM_argN, because of the space saving aspect.

I think what you're saying here is that we could have them both use the new operator, assigning it a different meaning depending on the intrinsic it's used in? If so, I think it's unnecessary - the derefval case does not (as far as I understand it) need any value other than arg0, and will also always use arg0; we lose no information by omitting it. Having the meaning of the operand vary depending on the intrinsic that contains it is also something I think we should avoid if possible; if derefval was changed to be variadic (not an unreasonable possibility I think) and the first argument was passed as a variadic argument then it'd make more sense.

As for the operator, I'm not attached to either implementation; DW_OP_LLVM_argN saves memory during compilation (though the size of the DWARF output is identical), while `DW_OP_LLVM_arg N` is easier to work with in code as we don't need any helper functions to extract N, or to generate one from a given N value. The other question is whether we might actually need more than 8 arguments at some point; I suspect that for the vast majority of cases the answer will be "no", but it's hard to be sure exactly what cases may open up over time. Although...

Finally, to completely derail this discussion — once we have multi-arg support, we may be able to simplify parts of the debug intrinsic handling by combining multiple DW_OP_LLVM_fragment intrinsics describing the same variable into one multi-argument expression with multiple DW_OP_LLVM_fragments.

This makes for a good example of how a future change might require us to support more arguments in an intrinsic than initially seems necessary.

To summarize my understanding: Neither of these operators is strictly necessary, since the same effect can be achieved by implicitly pushing all operands of a DBG_VALUE to the stack, followed by a combination of DW_OP_dup, DW_OP_pick, DW_OP_swap, DW_OP_rot, and DW_OP_over. However, the resulting expressions can get very long and unwieldy and it is easier to generate efficient DWARF from DIExpressions that explicitly refer to arguments. DW_OP_LLVM_argN has the advantage of using less memory, at the cost of limiting the number of arguments to a hardcoded value.

I'd also add that since `DW_OP_pick N` picks the Nth argument from the top of the stack rather than the bottom, using it would require us to know how many elements are on the stack at each point in the expression, and update the pick depth if this changes. Also, for the sake of not bloating the final DWARF we wouldn't want to actually push the args onto the stack first and then pick them, so we'd basically be treating the final DW_OP_pick operators as DW_OP_LLVM_arg either way.

Agreed. I think everyone can agree that this operator is a useful addition.

By restricting their use to new intrinsics only (derefval and DBG_VALUE_VAR) we can avoid any confusion about whether argument 0 is implicitly pushed to the stack or not — in those new intrinsics, arguments always have to be referred to explicitly. Personally, I'm leaning more towards DW_OP_LLVM_argN, because of the space saving aspect.

I think what you're saying here is that we could have them both use the new operator, assigning it a different meaning depending on the intrinsic it's used in? If so, I think it's unnecessary - the derefval case does not (as far as I understand it) need any value other than arg0, and will also always use arg0; we lose no information by omitting it. Having the meaning of the operand vary depending on the intrinsic that contains it is also something I think we should avoid if possible; if derefval was changed to be variadic (not an unreasonable possibility I think) and the first argument was passed as a variadic argument then it'd make more sense.

Is your intention to keep the implicit DW_OP_LLVM_arg 0 at the beginning of all DIExpressions referenced by debug intrinsics? From an ergonomics perspective it would bother me if the first argument of a 2-argument debug intrinsic is pushed implicitly and the second one isn't. However, transition dbg.value to a first-arg-is-explicit form is difficult, if we don't want to break LTO and bitcode compatibility. The cleanest way of doing this would be to create a dbg.value.version-2.0 intrinsic that uses the explicit form, and auto-upgrade the old dbg.value.

As for the operator, I'm not attached to either implementation; DW_OP_LLVM_argN saves memory during compilation (though the size of the DWARF output is identical), while `DW_OP_LLVM_arg N` is easier to work with in code as we don't need any helper functions to extract N, or to generate one from a given N value. The other question is whether we might actually need more than 8 arguments at some point; I suspect that for the vast majority of cases the answer will be "no", but it's hard to be sure exactly what cases may open up over time. Although...

Taking a step back, I think it is possible to separate the in-memory encoding from the visual representation. There is no reason why we couldn't encode DW_OP_LLVM_arg N in one 64-bit integer field. The DIExpression iterator can hide this detail. More generally, we could encode *all* fields in DIExpressions as, e.g, ULEB values in memory and hide that behind the iterator. With that perspective I think DW_OP_LLVM_arg N is the way to go, and if we are really bothered by the storage size, we can address this separately.

thanks,
adrian

>
>> To summarize my understanding: Neither of these operators is strictly necessary, since the same effect can be achieved by implicitly pushing all operands of a DBG_VALUE to the stack, followed by a combination of DW_OP_dup, DW_OP_pick, DW_OP_swap, DW_OP_rot, and DW_OP_over. However, the resulting expressions can get very long and unwieldy and it is easier to generate efficient DWARF from DIExpressions that explicitly refer to arguments. DW_OP_LLVM_argN has the advantage of using less memory, at the cost of limiting the number of arguments to a hardcoded value.
>
> I'd also add that since `DW_OP_pick N` picks the Nth argument from the top of the stack rather than the bottom, using it would require us to know how many elements are on the stack at each point in the expression, and update the pick depth if this changes. Also, for the sake of not bloating the final DWARF we wouldn't want to actually push the args onto the stack first and then pick them, so we'd basically be treating the final DW_OP_pick operators as DW_OP_LLVM_arg either way.

Agreed. I think everyone can agree that this operator is a useful addition.

>
>> By restricting their use to new intrinsics only (derefval and DBG_VALUE_VAR) we can avoid any confusion about whether argument 0 is implicitly pushed to the stack or not — in those new intrinsics, arguments always have to be referred to explicitly. Personally, I'm leaning more towards DW_OP_LLVM_argN, because of the space saving aspect.
>
> I think what you're saying here is that we could have them both use the new operator, assigning it a different meaning depending on the intrinsic it's used in? If so, I think it's unnecessary - the derefval case does not (as far as I understand it) need any value other than arg0, and will also always use arg0; we lose no information by omitting it. Having the meaning of the operand vary depending on the intrinsic that contains it is also something I think we should avoid if possible; if derefval was changed to be variadic (not an unreasonable possibility I think) and the first argument was passed as a variadic argument then it'd make more sense.

Is your intention to keep the implicit DW_OP_LLVM_arg 0 at the beginning of all DIExpressions referenced by debug intrinsics? From an ergonomics perspective it would bother me if the first argument of a 2-argument debug intrinsic is pushed implicitly and the second one isn't. However, transition dbg.value to a first-arg-is-explicit form is difficult, if we don't want to break LTO and bitcode compatibility. The cleanest way of doing this would be to create a dbg.value.version-2.0 intrinsic that uses the explicit form, and auto-upgrade the old dbg.value.

Not sure I'm quite following this bit - what are the particular
challenges of detecting that we're parsing old bitcode that relies on
the implicit style and upgrading it to the explicit style?

To summarize my understanding: Neither of these operators is strictly necessary, since the same effect can be achieved by implicitly pushing all operands of a DBG_VALUE to the stack, followed by a combination of DW_OP_dup, DW_OP_pick, DW_OP_swap, DW_OP_rot, and DW_OP_over. However, the resulting expressions can get very long and unwieldy and it is easier to generate efficient DWARF from DIExpressions that explicitly refer to arguments. DW_OP_LLVM_argN has the advantage of using less memory, at the cost of limiting the number of arguments to a hardcoded value.

I'd also add that since `DW_OP_pick N` picks the Nth argument from the top of the stack rather than the bottom, using it would require us to know how many elements are on the stack at each point in the expression, and update the pick depth if this changes. Also, for the sake of not bloating the final DWARF we wouldn't want to actually push the args onto the stack first and then pick them, so we'd basically be treating the final DW_OP_pick operators as DW_OP_LLVM_arg either way.

Agreed. I think everyone can agree that this operator is a useful addition.

By restricting their use to new intrinsics only (derefval and DBG_VALUE_VAR) we can avoid any confusion about whether argument 0 is implicitly pushed to the stack or not — in those new intrinsics, arguments always have to be referred to explicitly. Personally, I'm leaning more towards DW_OP_LLVM_argN, because of the space saving aspect.

I think what you're saying here is that we could have them both use the new operator, assigning it a different meaning depending on the intrinsic it's used in? If so, I think it's unnecessary - the derefval case does not (as far as I understand it) need any value other than arg0, and will also always use arg0; we lose no information by omitting it. Having the meaning of the operand vary depending on the intrinsic that contains it is also something I think we should avoid if possible; if derefval was changed to be variadic (not an unreasonable possibility I think) and the first argument was passed as a variadic argument then it'd make more sense.

Is your intention to keep the implicit DW_OP_LLVM_arg 0 at the beginning of all DIExpressions referenced by debug intrinsics? From an ergonomics perspective it would bother me if the first argument of a 2-argument debug intrinsic is pushed implicitly and the second one isn't. However, transition dbg.value to a first-arg-is-explicit form is difficult, if we don't want to break LTO and bitcode compatibility. The cleanest way of doing this would be to create a dbg.value.version-2.0 intrinsic that uses the explicit form, and auto-upgrade the old dbg.value.

Not sure I'm quite following this bit - what are the particular
challenges of detecting that we're parsing old bitcode that relies on
the implicit style and upgrading it to the explicit style?

To implement the "detecting that we're parsing old bitcode" bit, we have to set a bit somewhere to distinguish between old and new format, and I thought that somewhere could be the *name* of the intrinsic. However, writing this made me realize that DIExpressions also appear outside of debug intrinsics, namely in DIGlobalVariableExpression(). There we run into this ambiguity already, since this form

  @i = global i32, !dbg !0
  !0 = !DIGlobalVariableExpression(var: !DIGlobalVariable(...), expr: !DIExpression())

implicitly pushes the first arg, but this form

!0 = distinct !DICompileUnit(..., globals: !{!1}, ...)
!1 = !DIGlobalVariableExpression(var: !DIGlobalVariable(...), expr: !DIExpression(DW_OP_constu, 42, DW_OP_stack_value))

doesn't. So considering that, we may have to move the bit into the expression itself (e.g., by adding a version number to the bitcode for DIExpression).

-- adrian

>
>>
>>
>>
>>>
>>>> To summarize my understanding: Neither of these operators is strictly necessary, since the same effect can be achieved by implicitly pushing all operands of a DBG_VALUE to the stack, followed by a combination of DW_OP_dup, DW_OP_pick, DW_OP_swap, DW_OP_rot, and DW_OP_over. However, the resulting expressions can get very long and unwieldy and it is easier to generate efficient DWARF from DIExpressions that explicitly refer to arguments. DW_OP_LLVM_argN has the advantage of using less memory, at the cost of limiting the number of arguments to a hardcoded value.
>>>
>>> I'd also add that since `DW_OP_pick N` picks the Nth argument from the top of the stack rather than the bottom, using it would require us to know how many elements are on the stack at each point in the expression, and update the pick depth if this changes. Also, for the sake of not bloating the final DWARF we wouldn't want to actually push the args onto the stack first and then pick them, so we'd basically be treating the final DW_OP_pick operators as DW_OP_LLVM_arg either way.
>>
>> Agreed. I think everyone can agree that this operator is a useful addition.
>>
>>>
>>>> By restricting their use to new intrinsics only (derefval and DBG_VALUE_VAR) we can avoid any confusion about whether argument 0 is implicitly pushed to the stack or not — in those new intrinsics, arguments always have to be referred to explicitly. Personally, I'm leaning more towards DW_OP_LLVM_argN, because of the space saving aspect.
>>>
>>> I think what you're saying here is that we could have them both use the new operator, assigning it a different meaning depending on the intrinsic it's used in? If so, I think it's unnecessary - the derefval case does not (as far as I understand it) need any value other than arg0, and will also always use arg0; we lose no information by omitting it. Having the meaning of the operand vary depending on the intrinsic that contains it is also something I think we should avoid if possible; if derefval was changed to be variadic (not an unreasonable possibility I think) and the first argument was passed as a variadic argument then it'd make more sense.
>>
>> Is your intention to keep the implicit DW_OP_LLVM_arg 0 at the beginning of all DIExpressions referenced by debug intrinsics? From an ergonomics perspective it would bother me if the first argument of a 2-argument debug intrinsic is pushed implicitly and the second one isn't. However, transition dbg.value to a first-arg-is-explicit form is difficult, if we don't want to break LTO and bitcode compatibility. The cleanest way of doing this would be to create a dbg.value.version-2.0 intrinsic that uses the explicit form, and auto-upgrade the old dbg.value.
>
> Not sure I'm quite following this bit - what are the particular
> challenges of detecting that we're parsing old bitcode that relies on
> the implicit style and upgrading it to the explicit style?

To implement the "detecting that we're parsing old bitcode" bit, we have to set a bit somewhere to distinguish between old and new format, and I thought that somewhere could be the *name* of the intrinsic.

Why would we need that? Wouldn't it be in the bitcode elsewhere? (I
guess not, not without reving bitcode itself, which isn't a thing
that'd be done for this - my mistake)

Yeah, if it was in the DICompileUnit hierarchy, I guess we could
address it with a "Debug Version" rev, but since it's an intrinsic in
the IR it's either rev the IR itself (expensive) or use a new name.

However, writing this made me realize that DIExpressions also appear outside of debug intrinsics, namely in DIGlobalVariableExpression(). There we run into this ambiguity already, since this form

  @i = global i32, !dbg !0
  !0 = !DIGlobalVariableExpression(var: !DIGlobalVariable(...), expr: !DIExpression())

implicitly pushes the first arg, but this form

!0 = distinct !DICompileUnit(..., globals: !{!1}, ...)
!1 = !DIGlobalVariableExpression(var: !DIGlobalVariable(...), expr: !DIExpression(DW_OP_constu, 42, DW_OP_stack_value))

Ah.

doesn't. So considering that, we may have to move the bit into the expression itself (e.g., by adding a version number to the bitcode for DIExpression).

Hrm, fair enough. Thanks for the context!

- Dave

Is your intention to keep the implicit DW_OP_LLVM_arg 0 at the beginning of all DIExpressions referenced by debug intrinsics? From an ergonomics perspective it would bother me if the first argument of a 2-argument debug intrinsic is pushed implicitly and the second one isn't. However, transition dbg.value to a first-arg-is-explicit form is difficult, if we don't want to break LTO and bitcode compatibility. The cleanest way of doing this would be to create a dbg.value.version-2.0 intrinsic that uses the explicit form, and auto-upgrade the old dbg.value.

To clarify, the implicit DW_OP_LLVM_arg 0 is only for the non-variadic intrinsics, i.e. dbg.value and dbg.derefval. The version-2.0 intrinsic is more or less what I'm doing right now with dbg.value.list: a new version of dbg.value that uses explicit arguments and can have more than 1 of them. This means the intrinsic itself is used to distinguish between implicit and explicit, and it's my opinion that it may at some point be suitable to completely replace dbg.value. With that said, I haven't tried to do so in this stack of patches, as I suspect that trying to replace the intrinsic may result in a lengthy discussion that shouldn't need to block the introduction of the new intrinsic.

Taking a step back, I think it is possible to separate the in-memory encoding from the visual representation. There is no reason why we couldn't encode DW_OP_LLVM_arg N in one 64-bit integer field. The DIExpression iterator can hide this detail. More generally, we could encode *all* fields in DIExpressions as, e.g, ULEB values in memory and hide that behind the iterator. With that perspective I think DW_OP_LLVM_arg N is the way to go, and if we are really bothered by the storage size, we can address this separately.

Agreed, I'm fine with either representation.

To summarize my understanding: Neither of these operators is strictly necessary, since the same effect can be achieved by implicitly pushing all operands of a DBG_VALUE to the stack, followed by a combination of DW_OP_dup, DW_OP_pick, DW_OP_swap, DW_OP_rot, and DW_OP_over. However, the resulting expressions can get very long and unwieldy and it is easier to generate efficient DWARF from DIExpressions that explicitly refer to arguments. DW_OP_LLVM_argN has the advantage of using less memory, at the cost of limiting the number of arguments to a hardcoded value.

I'd also add that since `DW_OP_pick N` picks the Nth argument from the top of the stack rather than the bottom, using it would require us to know how many elements are on the stack at each point in the expression, and update the pick depth if this changes. Also, for the sake of not bloating the final DWARF we wouldn't want to actually push the args onto the stack first and then pick them, so we'd basically be treating the final DW_OP_pick operators as DW_OP_LLVM_arg either way.

Agreed. I think everyone can agree that this operator is a useful addition.

By restricting their use to new intrinsics only (derefval and DBG_VALUE_VAR) we can avoid any confusion about whether argument 0 is implicitly pushed to the stack or not — in those new intrinsics, arguments always have to be referred to explicitly. Personally, I'm leaning more towards DW_OP_LLVM_argN, because of the space saving aspect.

I think what you're saying here is that we could have them both use the new operator, assigning it a different meaning depending on the intrinsic it's used in? If so, I think it's unnecessary - the derefval case does not (as far as I understand it) need any value other than arg0, and will also always use arg0; we lose no information by omitting it. Having the meaning of the operand vary depending on the intrinsic that contains it is also something I think we should avoid if possible; if derefval was changed to be variadic (not an unreasonable possibility I think) and the first argument was passed as a variadic argument then it'd make more sense.

Is your intention to keep the implicit DW_OP_LLVM_arg 0 at the beginning of all DIExpressions referenced by debug intrinsics? From an ergonomics perspective it would bother me if the first argument of a 2-argument debug intrinsic is pushed implicitly and the second one isn't. However, transition dbg.value to a first-arg-is-explicit form is difficult, if we don't want to break LTO and bitcode compatibility. The cleanest way of doing this would be to create a dbg.value.version-2.0 intrinsic that uses the explicit form, and auto-upgrade the old dbg.value.

Not sure I'm quite following this bit - what are the particular
challenges of detecting that we're parsing old bitcode that relies on
the implicit style and upgrading it to the explicit style?

To implement the "detecting that we're parsing old bitcode" bit, we have to set a bit somewhere to distinguish between old and new format, and I thought that somewhere could be the *name* of the intrinsic.

Why would we need that? Wouldn't it be in the bitcode elsewhere? (I
guess not, not without reving bitcode itself, which isn't a thing
that'd be done for this - my mistake)

Yeah, if it was in the DICompileUnit hierarchy, I guess we could
address it with a "Debug Version" rev, but since it's an intrinsic in
the IR it's either rev the IR itself (expensive) or use a new name.

Debug info metadata bitcode changes are currently implemented on a node-by-node basis. A global IR version would be a nightmare for anyone maintaining their own stable branch, since it would make cherry-picking bugfixes that need bitcode changes dangerous, and consequently the version number meaningless. Imagine your branch was cut at version 1, upstream has bugfixes introducing version 2 and 3, and you need to cherry-pick bugfix 3. What version number do you pick?

Instead, each DINode has either feature flags or a version number (I know, that has the same general problem, but it's more localized!) in its bitcode representation.

-- adrian

>
>>
>>
>>
>>>
>>>>
>>>>
>>>>
>>>>>
>>>>>> To summarize my understanding: Neither of these operators is strictly necessary, since the same effect can be achieved by implicitly pushing all operands of a DBG_VALUE to the stack, followed by a combination of DW_OP_dup, DW_OP_pick, DW_OP_swap, DW_OP_rot, and DW_OP_over. However, the resulting expressions can get very long and unwieldy and it is easier to generate efficient DWARF from DIExpressions that explicitly refer to arguments. DW_OP_LLVM_argN has the advantage of using less memory, at the cost of limiting the number of arguments to a hardcoded value.
>>>>>
>>>>> I'd also add that since `DW_OP_pick N` picks the Nth argument from the top of the stack rather than the bottom, using it would require us to know how many elements are on the stack at each point in the expression, and update the pick depth if this changes. Also, for the sake of not bloating the final DWARF we wouldn't want to actually push the args onto the stack first and then pick them, so we'd basically be treating the final DW_OP_pick operators as DW_OP_LLVM_arg either way.
>>>>
>>>> Agreed. I think everyone can agree that this operator is a useful addition.
>>>>
>>>>>
>>>>>> By restricting their use to new intrinsics only (derefval and DBG_VALUE_VAR) we can avoid any confusion about whether argument 0 is implicitly pushed to the stack or not — in those new intrinsics, arguments always have to be referred to explicitly. Personally, I'm leaning more towards DW_OP_LLVM_argN, because of the space saving aspect.
>>>>>
>>>>> I think what you're saying here is that we could have them both use the new operator, assigning it a different meaning depending on the intrinsic it's used in? If so, I think it's unnecessary - the derefval case does not (as far as I understand it) need any value other than arg0, and will also always use arg0; we lose no information by omitting it. Having the meaning of the operand vary depending on the intrinsic that contains it is also something I think we should avoid if possible; if derefval was changed to be variadic (not an unreasonable possibility I think) and the first argument was passed as a variadic argument then it'd make more sense.
>>>>
>>>> Is your intention to keep the implicit DW_OP_LLVM_arg 0 at the beginning of all DIExpressions referenced by debug intrinsics? From an ergonomics perspective it would bother me if the first argument of a 2-argument debug intrinsic is pushed implicitly and the second one isn't. However, transition dbg.value to a first-arg-is-explicit form is difficult, if we don't want to break LTO and bitcode compatibility. The cleanest way of doing this would be to create a dbg.value.version-2.0 intrinsic that uses the explicit form, and auto-upgrade the old dbg.value.
>>>
>>> Not sure I'm quite following this bit - what are the particular
>>> challenges of detecting that we're parsing old bitcode that relies on
>>> the implicit style and upgrading it to the explicit style?
>>
>> To implement the "detecting that we're parsing old bitcode" bit, we have to set a bit somewhere to distinguish between old and new format, and I thought that somewhere could be the *name* of the intrinsic.
>
> Why would we need that? Wouldn't it be in the bitcode elsewhere? (I
> guess not, not without reving bitcode itself, which isn't a thing
> that'd be done for this - my mistake)
>
> Yeah, if it was in the DICompileUnit hierarchy, I guess we could
> address it with a "Debug Version" rev, but since it's an intrinsic in
> the IR it's either rev the IR itself (expensive) or use a new name.

Debug info metadata bitcode changes are currently implemented on a node-by-node basis. A global IR version would be a nightmare for anyone maintaining their own stable branch, since it would make cherry-picking bugfixes that need bitcode changes dangerous, and consequently the version number meaningless. Imagine your branch was cut at version 1, upstream has bugfixes introducing version 2 and 3, and you need to cherry-pick bugfix 3. What version number do you pick?

So the "Debug Info Version" module metadata ( !{i32 2, !"Debug Info
Version", i32 3}) is essentially unused/saved for some really big
changes only? *nod*

To summarize my understanding: Neither of these operators is strictly necessary, since the same effect can be achieved by implicitly pushing all operands of a DBG_VALUE to the stack, followed by a combination of DW_OP_dup, DW_OP_pick, DW_OP_swap, DW_OP_rot, and DW_OP_over. However, the resulting expressions can get very long and unwieldy and it is easier to generate efficient DWARF from DIExpressions that explicitly refer to arguments. DW_OP_LLVM_argN has the advantage of using less memory, at the cost of limiting the number of arguments to a hardcoded value.

I'd also add that since `DW_OP_pick N` picks the Nth argument from the top of the stack rather than the bottom, using it would require us to know how many elements are on the stack at each point in the expression, and update the pick depth if this changes. Also, for the sake of not bloating the final DWARF we wouldn't want to actually push the args onto the stack first and then pick them, so we'd basically be treating the final DW_OP_pick operators as DW_OP_LLVM_arg either way.

Agreed. I think everyone can agree that this operator is a useful addition.

By restricting their use to new intrinsics only (derefval and DBG_VALUE_VAR) we can avoid any confusion about whether argument 0 is implicitly pushed to the stack or not — in those new intrinsics, arguments always have to be referred to explicitly. Personally, I'm leaning more towards DW_OP_LLVM_argN, because of the space saving aspect.

I think what you're saying here is that we could have them both use the new operator, assigning it a different meaning depending on the intrinsic it's used in? If so, I think it's unnecessary - the derefval case does not (as far as I understand it) need any value other than arg0, and will also always use arg0; we lose no information by omitting it. Having the meaning of the operand vary depending on the intrinsic that contains it is also something I think we should avoid if possible; if derefval was changed to be variadic (not an unreasonable possibility I think) and the first argument was passed as a variadic argument then it'd make more sense.

Is your intention to keep the implicit DW_OP_LLVM_arg 0 at the beginning of all DIExpressions referenced by debug intrinsics? From an ergonomics perspective it would bother me if the first argument of a 2-argument debug intrinsic is pushed implicitly and the second one isn't. However, transition dbg.value to a first-arg-is-explicit form is difficult, if we don't want to break LTO and bitcode compatibility. The cleanest way of doing this would be to create a dbg.value.version-2.0 intrinsic that uses the explicit form, and auto-upgrade the old dbg.value.

Not sure I'm quite following this bit - what are the particular
challenges of detecting that we're parsing old bitcode that relies on
the implicit style and upgrading it to the explicit style?

To implement the "detecting that we're parsing old bitcode" bit, we have to set a bit somewhere to distinguish between old and new format, and I thought that somewhere could be the *name* of the intrinsic.

Why would we need that? Wouldn't it be in the bitcode elsewhere? (I
guess not, not without reving bitcode itself, which isn't a thing
that'd be done for this - my mistake)

Yeah, if it was in the DICompileUnit hierarchy, I guess we could
address it with a "Debug Version" rev, but since it's an intrinsic in
the IR it's either rev the IR itself (expensive) or use a new name.

Debug info metadata bitcode changes are currently implemented on a node-by-node basis. A global IR version would be a nightmare for anyone maintaining their own stable branch, since it would make cherry-picking bugfixes that need bitcode changes dangerous, and consequently the version number meaningless. Imagine your branch was cut at version 1, upstream has bugfixes introducing version 2 and 3, and you need to cherry-pick bugfix 3. What version number do you pick?

So the "Debug Info Version" module metadata ( !{i32 2, !"Debug Info
Version", i32 3}) is essentially unused/saved for some really big
changes only? *nod*

That's correct.

-- adrian