My thought is the presence of IsDistinct and its usage may be a leftover from a more general implementation that handles other metadata nodes that can be either distinct or inlined (also if i remember correctly, previously DIExpression used to be in a separate slot in the module but not now so it may be leftover since then). I think in the case of DIExpression, it serves no practical purpose and can be safely removed from the code without affecting the correctness of the deserialization process for DIExpression metadata nodes.
Yeah, I checked, all the test case are passing(although i didn’t find any test case which exists as distinct !DIExpression() ) and it should always be false, as distinct appear in top-level metadata lists, and not on the inline metadata.
For what it’s worth, I’ve been using DIExpressions for a while and have never discovered a scenario where having a distinct DIExpression would be necessary. CC @StephenTozer who’s done a lot more.
I also don’t think there’s a case where we would need a distinct !DIExpression() - since a DIExpression’s only data is a list of uint64_t values, it is trivially always resolved and uniquable; furthermore, if a DIExpression is distinct, that property will be dropped when printing back to IR. @slinder1 created a patch some time ago that would have adjusted the semantics of uniqued and distinct in a way that explicitly disabled distinct for DIExpressions; the discussion on this patch got rather gummed up and it never landed, but the justification for not having distinct DIExpressions is clear.
Thanks Stephen for this patch information.
I looked at it, as discussed in the patch, that DIExpression doesn’t currently rely on function state, it is primarily used in the context of a function and switching to explicit function-local metadata for DIExpression could have benefits.
DIExpression being data-driven justifies its always uniqued nature. The suggestion to reparent DIExpression directly under metadata instead of inheriting from MDNode is sounds like a reasonable approach.
I am not sure for what reason it didn’t get landed even after fixing the tests? And also It would be great to start the discussion on the last comment on the patch by @dexonsmith , I found it interesting, I will try to look much deeper into the details of it.
I think @slinder1 and @dexonsmith have much better understanding about the patch and why it get lost and didn’t land?
I agree we don’t need to support distinct DIExpressions. In fact, does the IR assembly language even support it anymore, now that they are printed inline?
From Scott’s patch mentioned above, the distinct property is legal on a DIExpression; it looks like it will be printed to bitcode, but will not be printed to textual IR.
Correct, it is possible for a DIExpression to be distinct in memory, and in the bitcode format, but we lack a syntax for an “inline distinct” node so in IR text a DIExpression is always uniqued. (Unless, of course, something has changed since I last looked more closely at this). This means “round-tripping” via bitcode differs from “round-tripping” via textual IR, which at the very least feels wrong.
I don’t remembewr exactly why it stalled, I think I just let perfect be the enemy of good and focused on generalizing the bespoke support for inline nodes and linking it with “non-distinctness” in a comprehensive way. That pushed up against some of my misapprehensions around how metadata actually works, and I never dedicated the time to resolving the issues. I would be happy to see something addressing just the DIExpression issue land!
For DIArgList and any other nodes which are printed inline but can be distinct because of things like RAUW I still don’t know that I have a strong enough understanding to propose a really comprehensive solution. In any event, it seems wrong that serialization via bitcode would fundamentally differ from textual IR, so if that is still the case for some metadata nodes I think something should eventually be done. It might even be that we should just limit the bitcode representation to match the IR limitation, rather than the other way around.
Alright, it sounds like there’s a solid consensus that distinct DIExpressions should not exist; more comprehensive reform of distinct/unique is a bit more complicated, though something that will probably be tackled at some point. @phyBrackets do you (or anyone in the discord) plan to write (or have already written) a patch for this, or are you simply raising it as an issue? I’m happy to take a swing at this in the next few weeks if nobody else is specifically working on it.
Hi - apologies, I’d left this one by the wayside while working on other projects. I’ve done some other small metadata rewrites recently (DIArgList and DIAssignID) and so would be quite comfortable picking this up in a few weeks when I’m finished with my current work. It should be a fairly simple patch - fundamentally I don’t believe DIExpression should inherit from MDNode, but if we ignore that part I think there will only be a few places that need to be modified - notably I think it is safe to change the metadata loader to always fetch a uniqued instance and ignore the distinct field.
If it’s fine, I can look over it. I was looking at the same thing that DIExpression does not necessarily inherit MDNode or removing the couple of storage types for it, distinct and temporary.
I found this topic recently. For my use, I think I do need DIExpression (or something similar) to reference other metadata nodes. I’m not completely sure I understand this correctly but I think this means that I’d need this class to continue to derive from MDNode and be “distinct-able”.
I’m working on DWARF support for gnat-llvm, an Ada compiler. In Ada, it’s possible for subrange types to have dynamic bounds, sometimes defined in terms of other variables or expressions. I’ve been looking at how to implement this, and I’ve been thinking of enhancing DIExpression to allow a DW_OP_call, referencing some other DIE.
If needed I can show an Ada test case where the compiler wants to emit an expression like 2 * Variable as an array bound. This would wind up as something like DW_OP_call <Variable's DIE> DW_OP_const2u DW_OP_mul.
I see the patch discussed in this thread never landed, and so maybe my plan is still ok. Otherwise I guess I could make a new node that’s similar to DIExpression.
I think having DIExpression reference other metadata nodes that eventually refer to Values through ValueAsMetadata wrappers is probably the wrong direction. If the debug info needs to combine multiple machine locations together to compute the variable value or location, I think it would be more consistent with our current design to have #dbg_value records take multiple ValueAsMetadata inputs, or maybe to have some multiplexed ValueAsMetadata node.
One way that I’ve understood #dbg_value/llvm.dbg.value in the past is that it takes the referenced value, puts it on the DWARF expression stack, applies the operations in the DIExpression, and the final value on the stack is the final machine location. Internally, the backend does a fair bit of pattern matching to handle simple expressions with simple DWARF machine location expressions.
Is it possible to extend that model by pushing multiple values onto the DWARF expression stack and making expressions compatible with that?
Is it possible to extend that model by pushing multiple values onto the DWARF expression stack and making expressions compatible with that?
Good opportunity to plug the variadic debug values feature! In the case of dbg_values, instead of using a single value as the first argument, e.g. i32 %a, it is possible to use the DIArgList metadata that can reference multiple values, e.g. !DIArgList(i32 %a, i32 %b). This does not push all of those values onto the DWARF stack, but instead creates a referenceable list for the corresponding DIExpression - so the 0th argument can be inserted into the expression with DW_OP_LLVM_arg 0, and so on for all other arguments.
@tromey Does this seem like it potentially addresses your requirement?