Distinction of DIExpression node

phyBrackets · July 3, 2023, 10:19am

Hi Everyone,

There was a small discussion over the discord about llvm-project/llvm/lib/Bitcode/Reader/MetadataLoader.cpp at 4f065fcb5779a7bcdd02f00789ecc827c7a8e426 · llvm/llvm-project · GitHub , do we really need to keep track of distinct for DIExpression ? Because DIExpression will never appear in the top-level list of metadata in the module as it is an inlined metadata. I think we can remove it but not sure if it serves any other purpose, does anyone have a better idea about it, if removing this would break anything?

My thought is the presence of IsDistinct and its usage may be a leftover from a more general implementation that handles other metadata nodes that can be either distinct or inlined (also if i remember correctly, previously DIExpression used to be in a separate slot in the module but not now so it may be leftover since then). I think in the case of DIExpression, it serves no practical purpose and can be safely removed from the code without affecting the correctness of the deserialization process for DIExpression metadata nodes.

cc @jryans @hnrklssn

hnrklssn · July 3, 2023, 10:48am

If existing tests pass after removing it I’d be inclined to agree. Is the value always true or always false, currently?

phyBrackets · July 3, 2023, 12:25pm

Yeah, I checked, all the test case are passing(although i didn’t find any test case which exists as
distinct !DIExpression() ) and it should always be false, as distinct appear in top-level metadata lists, and not on the inline metadata.

jmorse · July 3, 2023, 4:49pm

For what it’s worth, I’ve been using DIExpressions for a while and have never discovered a scenario where having a distinct DIExpression would be necessary. CC @StephenTozer who’s done a lot more.

StephenTozer · July 4, 2023, 11:06am

I also don’t think there’s a case where we would need a distinct !DIExpression() - since a DIExpression’s only data is a list of uint64_t values, it is trivially always resolved and uniquable; furthermore, if a DIExpression is distinct, that property will be dropped when printing back to IR. @slinder1 created a patch some time ago that would have adjusted the semantics of uniqued and distinct in a way that explicitly disabled distinct for DIExpressions; the discussion on this patch got rather gummed up and it never landed, but the justification for not having distinct DIExpressions is clear.

phyBrackets · July 6, 2023, 11:53am

Thanks Stephen for this patch information.
I looked at it, as discussed in the patch, that DIExpression doesn’t currently rely on function state, it is primarily used in the context of a function and switching to explicit function-local metadata for DIExpression could have benefits.

DIExpression being data-driven justifies its always uniqued nature. The suggestion to reparent DIExpression directly under metadata instead of inheriting from MDNode is sounds like a reasonable approach.

I am not sure for what reason it didn’t get landed even after fixing the tests? And also It would be great to start the discussion on the last comment on the patch by @dexonsmith , I found it interesting, I will try to look much deeper into the details of it.

I think @slinder1 and @dexonsmith have much better understanding about the patch and why it get lost and didn’t land?

dexonsmith · July 6, 2023, 2:03pm

Making DIExpression not a MDNode sounds fine to me in principle! That was my original plan when I landed the new debug info stuff.

Best to bring in @aprantl and others more involved in maintaining debug info for any changes though.

rnk · July 12, 2023, 8:45pm

I agree we don’t need to support distinct DIExpressions. In fact, does the IR assembly language even support it anymore, now that they are printed inline?

StephenTozer · July 13, 2023, 10:27am

From Scott’s patch mentioned above, the distinct property is legal on a DIExpression; it looks like it will be printed to bitcode, but will not be printed to textual IR.

slinder1 · July 17, 2023, 6:16pm

Correct, it is possible for a DIExpression to be distinct in memory, and in the bitcode format, but we lack a syntax for an “inline distinct” node so in IR text a DIExpression is always uniqued. (Unless, of course, something has changed since I last looked more closely at this). This means “round-tripping” via bitcode differs from “round-tripping” via textual IR, which at the very least feels wrong.

I don’t remembewr exactly why it stalled, I think I just let perfect be the enemy of good and focused on generalizing the bespoke support for inline nodes and linking it with “non-distinctness” in a comprehensive way. That pushed up against some of my misapprehensions around how metadata actually works, and I never dedicated the time to resolving the issues. I would be happy to see something addressing just the DIExpression issue land!

For DIArgList and any other nodes which are printed inline but can be distinct because of things like RAUW I still don’t know that I have a strong enough understanding to propose a really comprehensive solution. In any event, it seems wrong that serialization via bitcode would fundamentally differ from textual IR, so if that is still the case for some metadata nodes I think something should eventually be done. It might even be that we should just limit the bitcode representation to match the IR limitation, rather than the other way around.

StephenTozer · July 20, 2023, 11:51am

Alright, it sounds like there’s a solid consensus that distinct DIExpressions should not exist; more comprehensive reform of distinct/unique is a bit more complicated, though something that will probably be tackled at some point. @phyBrackets do you (or anyone in the discord) plan to write (or have already written) a patch for this, or are you simply raising it as an issue? I’m happy to take a swing at this in the next few weeks if nobody else is specifically working on it.

phyBrackets · July 20, 2023, 2:20pm

Hi Stephen, feel free to take this up. I would love to explore it by myself but i doubt that I’m not free.
Thanks

phyBrackets · January 20, 2024, 9:51am

Hi Stephan, Have you done any work on it? I might missed!

StephenTozer · January 22, 2024, 1:43pm

Hi - apologies, I’d left this one by the wayside while working on other projects. I’ve done some other small metadata rewrites recently (DIArgList and DIAssignID) and so would be quite comfortable picking this up in a few weeks when I’m finished with my current work. It should be a fairly simple patch - fundamentally I don’t believe DIExpression should inherit from MDNode, but if we ignore that part I think there will only be a few places that need to be modified - notably I think it is safe to change the metadata loader to always fetch a uniqued instance and ignore the distinct field.

phyBrackets · January 22, 2024, 1:51pm

If it’s fine, I can look over it. I was looking at the same thing that DIExpression does not necessarily inherit MDNode or removing the couple of storage types for it, distinct and temporary.

phyBrackets · January 24, 2024, 5:29pm

Hi @StephenTozer , I have created a patch for this [DebugInfo] Make DIExpression inherit from Metadata and it always should be unique by phyBrackets · Pull Request #79335 · llvm/llvm-project · GitHub

tromey · March 19, 2025, 2:59pm

I found this topic recently. For my use, I think I do need DIExpression (or something similar) to reference other metadata nodes. I’m not completely sure I understand this correctly but I think this means that I’d need this class to continue to derive from MDNode and be “distinct-able”.

I’m working on DWARF support for gnat-llvm, an Ada compiler. In Ada, it’s possible for subrange types to have dynamic bounds, sometimes defined in terms of other variables or expressions. I’ve been looking at how to implement this, and I’ve been thinking of enhancing DIExpression to allow a DW_OP_call, referencing some other DIE.

If needed I can show an Ada test case where the compiler wants to emit an expression like 2 * Variable as an array bound. This would wind up as something like DW_OP_call <Variable's DIE> DW_OP_const2u DW_OP_mul.

I see the patch discussed in this thread never landed, and so maybe my plan is still ok. Otherwise I guess I could make a new node that’s similar to DIExpression.

rnk · March 20, 2025, 8:45pm

I think having DIExpression reference other metadata nodes that eventually refer to Values through ValueAsMetadata wrappers is probably the wrong direction. If the debug info needs to combine multiple machine locations together to compute the variable value or location, I think it would be more consistent with our current design to have #dbg_value records take multiple ValueAsMetadata inputs, or maybe to have some multiplexed ValueAsMetadata node.

One way that I’ve understood #dbg_value/llvm.dbg.value in the past is that it takes the referenced value, puts it on the DWARF expression stack, applies the operations in the DIExpression, and the final value on the stack is the final machine location. Internally, the backend does a fair bit of pattern matching to handle simple expressions with simple DWARF machine location expressions.

Is it possible to extend that model by pushing multiple values onto the DWARF expression stack and making expressions compatible with that?

StephenTozer · March 21, 2025, 9:57am

Is it possible to extend that model by pushing multiple values onto the DWARF expression stack and making expressions compatible with that?

Good opportunity to plug the variadic debug values feature! In the case of dbg_values, instead of using a single value as the first argument, e.g. i32 %a, it is possible to use the DIArgList metadata that can reference multiple values, e.g. !DIArgList(i32 %a, i32 %b). This does not push all of those values onto the DWARF stack, but instead creates a referenceable list for the corresponding DIExpression - so the 0th argument can be inserted into the expression with DW_OP_LLVM_arg 0, and so on for all other arguments.

@tromey Does this seem like it potentially addresses your requirement?

tromey · March 21, 2025, 1:48pm

Thanks for the pointers.

DIArgList looks promising, but I don’t immediately understand how to make a type reference one of these. I will keep reading.

Topic		Replies	Views
"distinct" metadata nodes are ...? LLVM Dev List Archives	4	88	April 7, 2015
llvm-diff LLVM Dev List Archives	7	67	March 8, 2011
!llvm.loop ID metadata clarification LLVM Dev List Archives	3	107	February 19, 2021
[RFC] Semantic changes in the Metadata/Value split LLVM Dev List Archives	7	114	December 5, 2014
difference between MDNode and NamedMDNode LLVM Dev List Archives	2	133	February 25, 2015

Distinction of DIExpression node

Related topics