"distinct" metadata nodes are ...?

I'm encountering a merge issue whose root cause has to do with "distinct"
metadata nodes. I see that distinct-ness is an intentional concept, but
the explanation in the LLVM Language Reference is not very enlightening.

    distinct nodes are useful when nodes shouldn't be merged based on
    their content.

The notion of "merged" metadata is not discussed elsewhere on the page,
except for Objective-C garbage collection; I'm looking at debug location
metadata, so that's not relevant.

I understand that distinct-ness was invented as a replacement for a
self-reference hack, but that just begs the question. Why is this a
useful concept? What is it used for? Why shouldn't certain nodes be
merged based on their content?

My specific issue has to do with inlined-at chains. If I have
    return inlined_func() + 1;
the inlined-at chain for inlined_func() [and whatever else is inlined
into inlined_func()] terminates in a node that is 'distinct' from the
node for the calling statement, even though they describe the same
source location. This didn't used to be a problem, chasing the chain
ended up with something that compared equal to the calling statement's
source location.

Thanks,
--paulr

I'm encountering a merge issue whose root cause has to do with "distinct"
metadata nodes. I see that distinct-ness is an intentional concept, but
the explanation in the LLVM Language Reference is not very enlightening.

    distinct nodes are useful when nodes shouldn't be merged based on
    their content.

The notion of "merged" metadata is not discussed elsewhere on the page,
except for Objective-C garbage collection; I'm looking at debug location
metadata, so that's not relevant.

I understand that distinct-ness was invented as a replacement for a
self-reference hack, but that just begs the question. Why is this a
useful concept? What is it used for? Why shouldn't certain nodes be
merged based on their content?

My specific issue has to do with inlined-at chains. If I have
    return inlined_func() + 1;
the inlined-at chain for inlined_func() [and whatever else is inlined
into inlined_func()] terminates in a node that is 'distinct' from the
node for the calling statement, even though they describe the same
source location. This didn't used to be a problem, chasing the chain
ended up with something that compared equal to the calling statement's
source location.

http://llvm.org/viewvc/llvm-project?rev=226736&view=rev is the change that
caused this & has some context on why it's necessary.

The issue is that the scope change of debuglocs is how we build scopes,
including inline scopes (DW_TAG_inlined_subroutine). If the call site
locations aren't uniqued, then two calls from the same line to the same
function would have the same location and thus be the same scope - so we'd
only have one DW_TAG_inlined_subroutine, instead of two.

Clang worked around this for a while by putting column info on call sites
to help give them unique call sites, but this was insufficient (the two
calls could've come from within a macro, in which case they'd be attributed
to the same line/column again).

- David

Aha, okay. I had noticed that the column-info hack went away. So the distinct-ness implies the scope implicit in the inlined call, which later on will be turned into the explicit inlined_subroutine entry. That seems… indirect.

I have to say, the LangRef page’s words about “merge based on content” is not really to the point. It’s like saying the purpose of a street-corner STOP sign is to make you stop. That’s the mechanism it uses, but it’s not why the sign is there. It would be great if somebody would clarify what distinct-ness is actually good for.

Thanks,

–paulr

+dexonsmith

– Sean Silva

+dexonsmith

-- Sean Silva

Aha, okay. I had noticed that the column-info hack went away. So the distinct-ness implies the scope implicit in the inlined call, which later on will be turned into the explicit inlined_subroutine entry. That seems… indirect.

I have to say, the LangRef page's words about "merge based on content" is not really to the point. It's like saying the purpose of a street-corner STOP sign is to make you stop. That's the mechanism it uses, but it's not why the sign is there. It would be great if somebody would clarify what distinct-ness is actually good for.

`MDNode`s have a constant-like mode, where they try to unique themselves.
This never worked all the time, and `MDNode`s don't really behave like
constants anyway (since they can be changed freely after creation,
affecting other `llvm::Module`s in the same `LLVMContext` in potentially
horrible ways). They also have another mode where they're not uniqued,
but it used to require "strange" (actually somewhat common) operations to
get into it, and wouldn't be serialized in bitcode/assembly.

During the `Metadata`/`Value` split I formalized this other mode, called
it "distinct", made it serialize correctly to/from bitcode and assembly,
and added a way to ask for it explicitly.

The short version: "distinct" means "not uniqued".

It's not really clear to me what would be better for LangRef. Patches
welcome!