Issues with clang-llvm debug info validity

Hi all, sorry to post to both lists, but I’m running into an issue where clang-generated debug info is deemed to be invalid by LLVM tools (throws an assertion error in both llc and mcjit), and I’m not sure what the proper resolution is.

Here’s a test case; I last tested it on revision r210953:

$ cat test1.cpp

#include “test.h”
test::Test foo1() {
return test::Test();
}

$ cat test2.cpp

namespace test {
}
#include “test.h”
test::Test foo2() {
return test::Test();
}

$ cat test.h

namespace test {
template
class Test {
};
}

$ clang++ -g -emit-llvm test1.cpp -S -o test1.ll
$ clang++ -g -emit-llvm test2.cpp -S -o test2.ll
$ llvm-link test1.ll test2.ll -S -o linked.ll
$ llc linked.ll

The last step raises an assertion error, assuming you have assertions enabled:
llc: lib/CodeGen/AsmPrinter/DwarfUnit.cpp:953: llvm::DIE* llvm::DwarfUnit::getOrCreateTypeDIE(const llvm::MDNode*): Assertion `Ty == resolve(Ty.getRef()) && “type was not uniqued, possible ODR violation.”’ failed

The initial introduction of the “test” namespace in test2.cpp seems to confuse clang – test1.cpp and test2.cpp have non-compatible debug information for the test::Test type. test1.cpp says that test::Test’s namespace is defined in test.h, and test2.cpp says that test::Test’s namespace is defined in test2.cpp (they both say that the class itself is defined in test.h). Regardless of whether LLVM tools should allow this, this seems like the wrong output for these files?

Another related example, is if we simply defined the test::Test template in both files (ie, manually executed the #include “test.h”) and got rid of the test2.cpp “namespace test {}” definition, we’d run into the same assertion, without hitting the weird clang behavior with the empty namespace pre-declaration.

It’s probably also worth noting that compiling directly with clang, ie adding a “int main() {}” function and then doing
$ clang++ -g test1.cpp test2.cpp -o test
works fine.

I’m not a C++ expert and I don’t know if these template issues really are violations of the ODR, but it feels like this should be allowed, and somewhere in the llvm toolchain (llvm-link? DwarfUnit.cpp?) this should be handled. What do people think?

kmod

Yep, looks like a legit bug to me. I'll try to reproduce it and reduce
a test case, etc.

(the ODR doesn't require that the two copies of the Test class
template appear on the same line in the same file in two distinct
translation units, it only requires that the two copies have the same
sequence of tokens - which they do in your example)

+Adrian & Manman,

Looks like this is a case of non-DIRef references ending up in the IR,
and thus the references not being deduplicated. This could lead to
some of the IR bloat that you guys implemented the DIRef stuff to
reduce/avoid.

The specific issue is that the type is slightly different in the two
TUs (since the namespace scope it's contained within is different in
the two TUs - note the namespace preceeding the header include in
test2.cpp) and the actual DIRef-usage failure is when building
subroutine types. The subroutine types are made with direct references
to types, not with DIRef references to types. This means they won't be
deduplicated/resolved to a single type during linking correctly.

It also leads to an assertion as per the original email. Figured you
guys might want to look into/fix this (I assume the right fix is to
use DIRef more pervasively in the frontend - I'm doing a little
experimenting with clang now to see how that looks).

- David

I will take this.

!5 = metadata !{metadata !"./test.h", metadata !"/Volumes/Data/radar/17052973"}
!6 = metadata !{i32 786489, metadata !5, null, metadata !"test", i32 1} ; [ DW_TAG_namespace ] [test] [line 1]
!4 = metadata !{i32 786434, metadata !5, metadata !6, metadata !"Test<int>", i32 3, i64 8, i64 8, i32 0, i32 0, null, metadata !2, i32 0, null, metadata !7, metadata !"_ZTSN4test4TestIiEE"} ; [ DW_TAG_class_type ] [Test<int>] [line 3, size 8, align 8, offset 0] [def] [from ]

!15 = metadata !{metadata !"test2.cpp", metadata !"/Volumes/Data/radar/17052973”}
!18 = metadata !{i32 786489, metadata !15, null, metadata !"test", i32 1} ; [ DW_TAG_namespace ] [test] [line 1]
!17 = metadata !{i32 786434, metadata !5, metadata !18, metadata !"Test<int>", i32 3, i64 8, i64 8, i32 0, i32 0, null, metadata !2, i32 0, null, metadata !7, metadata !”_ZTSN4test4TestIiEE”}

I think we will need to have some kind of first-come-first-serve policy when it comes to the decl_file/decl_line of namespaces. The way to implement this would be to use DIRefs for namespaces.

Question to the C++ experts: Is the name (“test” in this case) a sufficient appropriate unique ID for a C++ namespace?

-- adrian

I will take this.

!5 = metadata !{metadata !"./test.h", metadata !"/Volumes/Data/radar/17052973"}
!6 = metadata !{i32 786489, metadata !5, null, metadata !"test", i32 1} ; [ DW_TAG_namespace ] [test] [line 1]
!4 = metadata !{i32 786434, metadata !5, metadata !6, metadata !"Test<int>", i32 3, i64 8, i64 8, i32 0, i32 0, null, metadata !2, i32 0, null, metadata !7, metadata !"_ZTSN4test4TestIiEE"} ; [ DW_TAG_class_type ] [Test<int>] [line 3, size 8, align 8, offset 0] [def] [from ]

!15 = metadata !{metadata !"test2.cpp", metadata !"/Volumes/Data/radar/17052973”}
!18 = metadata !{i32 786489, metadata !15, null, metadata !"test", i32 1} ; [ DW_TAG_namespace ] [test] [line 1]
!17 = metadata !{i32 786434, metadata !5, metadata !18, metadata !"Test<int>", i32 3, i64 8, i64 8, i32 0, i32 0, null, metadata !2, i32 0, null, metadata !7, metadata !”_ZTSN4test4TestIiEE”}

I think we will need to have some kind of first-come-first-serve policy when it comes to the decl_file/decl_line of namespaces. The way to implement this would be to use DIRefs for namespaces.

I'm not sure what you're getting at/why this would be necessary. If
you break the reference to the type itself, it won't matter that
sometimes the type is defined in a different place, will it?

For example, the following patch (actually only the first part is
needed for this specific test case) causes the assertion not to fire:

diff --git lib/CodeGen/CGDebugInfo.cpp lib/CodeGen/CGDebugInfo.cpp
index 048c8f8..2bded23 100644
--- lib/CodeGen/CGDebugInfo.cpp
+++ lib/CodeGen/CGDebugInfo.cpp
@@ -771,7 +771,7 @@ llvm::DIType CGDebugInfo::CreateType(const FunctionType *Ty,
   SmallVector<llvm::Value *, 16> EltTys;

   // Add the result type at least.
- EltTys.push_back(getOrCreateType(Ty->getReturnType(), Unit));
+ EltTys.push_back(getOrCreateType(Ty->getReturnType(), Unit).getRef());

   // Set up remainder of arguments if there is a prototype.
   // otherwise emit it as a variadic function.
@@ -779,7 +779,7 @@ llvm::DIType CGDebugInfo::CreateType(const FunctionType *Ty,
     EltTys.push_back(DBuilder.createUnspecifiedParameter());
   else if (const FunctionProtoType *FPT = dyn_cast<FunctionProtoType>(Ty)) {
     for (unsigned i = 0, e = FPT->getNumParams(); i != e; ++i)
- EltTys.push_back(getOrCreateType(FPT->getParamType(i), Unit));
+ EltTys.push_back(getOrCreateType(FPT->getParamType(i), Unit).getRef());
     if (FPT->isVariadic())
       EltTys.push_back(DBuilder.createUnspecifiedParameter());
   }

Without needing to change how namespaces are handled.

Yes, using DIRefs for namespaces would cause the actual type nodes to
be deduplicated when they aren't even with the patch above, and that
would help reduce debug info metadata further if that's
useful/necessary.

Question to the C++ experts: Is the name (“test” in this case) a sufficient appropriate unique ID for a C++ namespace?

Potentially - assuming you don't do this for anything in an anonymous
namespace. I'd probably see about using the standard mangling
machinery as we do for other things, though, if possible. Might keep
it more consistent.

I will take this.

!5 = metadata !{metadata !"./test.h", metadata !"/Volumes/Data/radar/17052973"}
!6 = metadata !{i32 786489, metadata !5, null, metadata !"test", i32 1} ; [ DW_TAG_namespace ] [test] [line 1]
!4 = metadata !{i32 786434, metadata !5, metadata !6, metadata !"Test<int>", i32 3, i64 8, i64 8, i32 0, i32 0, null, metadata !2, i32 0, null, metadata !7, metadata !"_ZTSN4test4TestIiEE"} ; [ DW_TAG_class_type ] [Test<int>] [line 3, size 8, align 8, offset 0] [def] [from ]

!15 = metadata !{metadata !"test2.cpp", metadata !"/Volumes/Data/radar/17052973”}
!18 = metadata !{i32 786489, metadata !15, null, metadata !"test", i32 1} ; [ DW_TAG_namespace ] [test] [line 1]
!17 = metadata !{i32 786434, metadata !5, metadata !18, metadata !"Test<int>", i32 3, i64 8, i64 8, i32 0, i32 0, null, metadata !2, i32 0, null, metadata !7, metadata !”_ZTSN4test4TestIiEE”}

I think we will need to have some kind of first-come-first-serve policy when it comes to the decl_file/decl_line of namespaces. The way to implement this would be to use DIRefs for namespaces.

I'm not sure what you're getting at/why this would be necessary. If
you break the reference to the type itself, it won't matter that
sometimes the type is defined in a different place, will it?

For example, the following patch (actually only the first part is
needed for this specific test case) causes the assertion not to fire:

diff --git lib/CodeGen/CGDebugInfo.cpp lib/CodeGen/CGDebugInfo.cpp
index 048c8f8..2bded23 100644
--- lib/CodeGen/CGDebugInfo.cpp
+++ lib/CodeGen/CGDebugInfo.cpp
@@ -771,7 +771,7 @@ llvm::DIType CGDebugInfo::CreateType(const FunctionType *Ty,
  SmallVector<llvm::Value *, 16> EltTys;

  // Add the result type at least.
- EltTys.push_back(getOrCreateType(Ty->getReturnType(), Unit));
+ EltTys.push_back(getOrCreateType(Ty->getReturnType(), Unit).getRef());

  // Set up remainder of arguments if there is a prototype.
  // otherwise emit it as a variadic function.
@@ -779,7 +779,7 @@ llvm::DIType CGDebugInfo::CreateType(const FunctionType *Ty,
    EltTys.push_back(DBuilder.createUnspecifiedParameter());
  else if (const FunctionProtoType *FPT = dyn_cast<FunctionProtoType>(Ty)) {
    for (unsigned i = 0, e = FPT->getNumParams(); i != e; ++i)
- EltTys.push_back(getOrCreateType(FPT->getParamType(i), Unit));
+ EltTys.push_back(getOrCreateType(FPT->getParamType(i), Unit).getRef());
    if (FPT->isVariadic())
      EltTys.push_back(DBuilder.createUnspecifiedParameter());
  }

Without needing to change how namespaces are handled.

Yes, using DIRefs for namespaces would cause the actual type nodes to
be deduplicated when they aren't even with the patch above, and that
would help reduce debug info metadata further if that's
useful/necessary.

Exactly. While the above patch will cover up the problem, we will still have many duplicate type entries during LTO that will take up memory, even though only the last one to be entered into DIRefMap will actually be used for emitting DWARF. Since DIRefs are also more expensive in terms of lookup speed, I think I’d prefer to only use them only where necessary (=as far down the MDNode DAG as possible).

Question to the C++ experts: Is the name (“test” in this case) a sufficient appropriate unique ID for a C++ namespace?

Potentially - assuming you don't do this for anything in an anonymous
namespace. I'd probably see about using the standard mangling

Good point! We should avoid unifying the anonymous namespace :slight_smile:

machinery as we do for other things, though, if possible. Might keep
it more consistent.

I will do some experiments with nested namespaces, but it seems as if name mangling does not buy us much: this will merely turn into N4test.

-- adrian

Without needing to change how namespaces are handled.

Yes, using DIRefs for namespaces would cause the actual type nodes to
be deduplicated when they aren't even with the patch above, and that
would help reduce debug info metadata further if that's
useful/necessary.

Exactly. While the above patch will cover up the problem, we will still have many duplicate type entries during LTO that will take up memory, even though only the last one to be entered into DIRefMap will actually be used for emitting DWARF. Since DIRefs are also more expensive in terms of lookup speed, I think I’d prefer to only use them only where necessary (=as far down the MDNode DAG as possible).

Right, though, to be fair, some numbers here might be nice. The
tradeoff of "extra lookup for ref" versus "extra memory allocation for
larger debug info".

I also don't think it'll matter for speed :slight_smile:

-eric

Without needing to change how namespaces are handled.

Yes, using DIRefs for namespaces would cause the actual type nodes to
be deduplicated when they aren't even with the patch above, and that
would help reduce debug info metadata further if that's
useful/necessary.

Exactly. While the above patch will cover up the problem, we will still have many duplicate type entries during LTO that will take up memory, even though only the last one to be entered into DIRefMap will actually be used for emitting DWARF. Since DIRefs are also more expensive in terms of lookup speed, I think I’d prefer to only use them only where necessary (=as far down the MDNode DAG as possible).

Right, though, to be fair, some numbers here might be nice. The
tradeoff of "extra lookup for ref" versus "extra memory allocation for
larger debug info”.

Then again, it’s not like the extra lookups introduced by David’s proposed patch would get rid of those extra memory allocations. Granted, they do get rid of the assertion, which obviously provides some value :wink:

I also don't think it'll matter for speed :slight_smile:

You’re probably right about the lookup, but depending on the project you’re building the memory usage could matter.

-- adrian

I looked into this a little. My proposal is this:

- extend DINamespace with an extra UniqueID filed (analogous to DICompositeType)
- add an extra case to DIScope::getRef()
- Let CGDebugInfo create a mangled name for each namespace to use as UID
  -> the UID of the anonymous namespace is null.

For example:

namespace a { namespace b {} } -> “_ZN1a", “_ZN1a1b"

!0 = [compile unit]
!1 = metadata !{i32 786489, metadata !0, metadata !”_ZN1a", metadata !”b", i32 1, metadata !”_ZN1a1b"} ; [ DW_TAG_namespace ] [a]
!2 = metadata !{i32 786489, metadata !0, null, metadata !”a", i32 1, metadata !”_ZN1a"} ; [ DW_TAG_namespace ] [a] [line 1]

namespace a1b {} -> “_ZN3a1b"
namespace {} -> null (!= “”) // Anonymous namespaces will remain old-fashioned MDNode references to prevent uniquing.

With this change we will get the full type uniquing benefits. As a side effect, definitions that are in the same namespace will be children of the same DW_TAG_namespace in the debug info, which I’d argue is the expected behavior for LTO anyway.

-- adrian

I looked into this a little.

Thanks for taking a look!

My proposal is this:

- extend DINamespace with an extra UniqueID filed (analogous to DICompositeType)
- add an extra case to DIScope::getRef()
- Let CGDebugInfo create a mangled name for each namespace to use as UID
  -> the UID of the anonymous namespace is null.

For example:

namespace a { namespace b {} } -> “_ZN1a", “_ZN1a1b"

!0 = [compile unit]
!1 = metadata !{i32 786489, metadata !0, metadata !”_ZN1a", metadata !”b", i32 1, metadata !”_ZN1a1b"} ; [ DW_TAG_namespace ] [a]
!2 = metadata !{i32 786489, metadata !0, null, metadata !”a", i32 1, metadata !”_ZN1a"} ; [ DW_TAG_namespace ] [a] [line 1]

namespace a1b {} -> “_ZN3a1b"
namespace {} -> null (!= “”) // Anonymous namespaces will remain old-fashioned MDNode references to prevent uniquing.

Will this do the right thing if there's a named namespace inside or
outside the anonymous namespace?

With this change we will get the full type uniquing benefits. As a side effect, definitions that are in the same namespace will be children of the same DW_TAG_namespace in the debug info, which I’d argue is the expected behavior for LTO anyway.

Sort of - what happens to functions defined in that namespace? Will
they all end up in one CU with one instance of the namespace under
LTO? Could do weird things to address ranges? (would we end up with
the address range in the CU the function was originally in - but the
subprogram DIE ends up in some other CU that doesn't include that
address range)

- David

I looked into this a little.

Thanks for taking a look!

My proposal is this:

- extend DINamespace with an extra UniqueID filed (analogous to DICompositeType)
- add an extra case to DIScope::getRef()
- Let CGDebugInfo create a mangled name for each namespace to use as UID
-> the UID of the anonymous namespace is null.

For example:

namespace a { namespace b {} } -> “_ZN1a", “_ZN1a1b"

!0 = [compile unit]
!1 = metadata !{i32 786489, metadata !0, metadata !”_ZN1a", metadata !”b", i32 1, metadata !”_ZN1a1b"} ; [ DW_TAG_namespace ] [a]
!2 = metadata !{i32 786489, metadata !0, null, metadata !”a", i32 1, metadata !”_ZN1a"} ; [ DW_TAG_namespace ] [a] [line 1]

namespace a1b {} -> “_ZN3a1b"
namespace {} -> null (!= “”) // Anonymous namespaces will remain old-fashioned MDNode references to prevent uniquing.

Will this do the right thing if there's a named namespace inside or
outside the anonymous namespace?

This is actually really tricky. Normally the ValueSymbolTable takes care of adding a unique suffix to all conflicting symbols. i think the most reasonable thing to do here is to add a rule:
  -> the UID of any namespace that has an anonymous namespace in its parent chain is null.

With this change we will get the full type uniquing benefits. As a side effect, definitions that are in the same namespace will be children of the same DW_TAG_namespace in the debug info, which I’d argue is the expected behavior for LTO anyway.

Sort of - what happens to functions defined in that namespace? Will
they all end up in one CU with one instance of the namespace under
LTO? Could do weird things to address ranges? (would we end up with
the address range in the CU the function was originally in - but the
subprogram DIE ends up in some other CU that doesn't include that
address range)

I think you just convinced me to save this for later and clone a namespace for every compile unit instead. The decl_file/decl_line will have to identical for all CUs though (and come from the module linked in first).

thanks!
adrian

I looked into this a little.

Thanks for taking a look!

My proposal is this:

- extend DINamespace with an extra UniqueID filed (analogous to DICompositeType)
- add an extra case to DIScope::getRef()
- Let CGDebugInfo create a mangled name for each namespace to use as UID
-> the UID of the anonymous namespace is null.

For example:

namespace a { namespace b {} } -> “_ZN1a", “_ZN1a1b"

!0 = [compile unit]
!1 = metadata !{i32 786489, metadata !0, metadata !”_ZN1a", metadata !”b", i32 1, metadata !”_ZN1a1b"} ; [ DW_TAG_namespace ] [a]
!2 = metadata !{i32 786489, metadata !0, null, metadata !”a", i32 1, metadata !”_ZN1a"} ; [ DW_TAG_namespace ] [a] [line 1]

namespace a1b {} -> “_ZN3a1b"
namespace {} -> null (!= “”) // Anonymous namespaces will remain old-fashioned MDNode references to prevent uniquing.

Will this do the right thing if there's a named namespace inside or
outside the anonymous namespace?

This is actually really tricky. Normally the ValueSymbolTable takes care of adding a unique suffix to all conflicting symbols. i think the most reasonable thing to do here is to add a rule:
  -> the UID of any namespace that has an anonymous namespace in its parent chain is null.

Yep - I assume that's the right thing to do, just not sure if it
would/will require a little extra work & wanted to make sure it was on
the radar.

With this change we will get the full type uniquing benefits. As a side effect, definitions that are in the same namespace will be children of the same DW_TAG_namespace in the debug info, which I’d argue is the expected behavior for LTO anyway.

Sort of - what happens to functions defined in that namespace? Will
they all end up in one CU with one instance of the namespace under
LTO? Could do weird things to address ranges? (would we end up with
the address range in the CU the function was originally in - but the
subprogram DIE ends up in some other CU that doesn't include that
address range)

I think you just convinced me to save this for later and clone a namespace for every compile unit instead. The decl_file/decl_line will have to identical for all CUs though (and come from the module linked in first).

OK - so we'll put one namespace in the metadata and just use that for
each CU? Should be easy enough.

One issue: what keeps the namespace metadata nodes live? Where do we
find them? Are we going to add another list to the CUs?

I looked into this a little.

Thanks for taking a look!

My proposal is this:

- extend DINamespace with an extra UniqueID filed (analogous to DICompositeType)
- add an extra case to DIScope::getRef()
- Let CGDebugInfo create a mangled name for each namespace to use as UID
-> the UID of the anonymous namespace is null.

For example:

namespace a { namespace b {} } -> “_ZN1a", “_ZN1a1b"

!0 = [compile unit]
!1 = metadata !{i32 786489, metadata !0, metadata !”_ZN1a", metadata !”b", i32 1, metadata !”_ZN1a1b"} ; [ DW_TAG_namespace ] [a]
!2 = metadata !{i32 786489, metadata !0, null, metadata !”a", i32 1, metadata !”_ZN1a"} ; [ DW_TAG_namespace ] [a] [line 1]

namespace a1b {} -> “_ZN3a1b"
namespace {} -> null (!= “”) // Anonymous namespaces will remain old-fashioned MDNode references to prevent uniquing.

Will this do the right thing if there's a named namespace inside or
outside the anonymous namespace?

This is actually really tricky. Normally the ValueSymbolTable takes care of adding a unique suffix to all conflicting symbols. i think the most reasonable thing to do here is to add a rule:
  -> the UID of any namespace that has an anonymous namespace in its parent chain is null.

Yep - I assume that's the right thing to do, just not sure if it
would/will require a little extra work & wanted to make sure it was on
the radar.

With this change we will get the full type uniquing benefits. As a side effect, definitions that are in the same namespace will be children of the same DW_TAG_namespace in the debug info, which I’d argue is the expected behavior for LTO anyway.

Sort of - what happens to functions defined in that namespace? Will
they all end up in one CU with one instance of the namespace under
LTO? Could do weird things to address ranges? (would we end up with
the address range in the CU the function was originally in - but the
subprogram DIE ends up in some other CU that doesn't include that
address range)

I think you just convinced me to save this for later and clone a namespace for every compile unit instead. The decl_file/decl_line will have to identical for all CUs though (and come from the module linked in first).

OK - so we'll put one namespace in the metadata and just use that for
each CU? Should be easy enough.

One issue: what keeps the namespace metadata nodes live? Where do we
find them? Are we going to add another list to the CUs?

Alternatively... does anyone really care about the source location of
a namespace? (given that we have to pick one arbitrary location that a
namespace starts at?) Should we just omit it, then namespace nodes
would deduplicate simply.

- David

I looked into this a little.

Thanks for taking a look!

My proposal is this:

- extend DINamespace with an extra UniqueID filed (analogous to DICompositeType)
- add an extra case to DIScope::getRef()
- Let CGDebugInfo create a mangled name for each namespace to use as UID
-> the UID of the anonymous namespace is null.

For example:

namespace a { namespace b {} } -> “_ZN1a", “_ZN1a1b"

!0 = [compile unit]
!1 = metadata !{i32 786489, metadata !0, metadata !”_ZN1a", metadata !”b", i32 1, metadata !”_ZN1a1b"} ; [ DW_TAG_namespace ] [a]
!2 = metadata !{i32 786489, metadata !0, null, metadata !”a", i32 1, metadata !”_ZN1a"} ; [ DW_TAG_namespace ] [a] [line 1]

namespace a1b {} -> “_ZN3a1b"
namespace {} -> null (!= “”) // Anonymous namespaces will remain old-fashioned MDNode references to prevent uniquing.

Will this do the right thing if there's a named namespace inside or
outside the anonymous namespace?

This is actually really tricky. Normally the ValueSymbolTable takes care of adding a unique suffix to all conflicting symbols. i think the most reasonable thing to do here is to add a rule:
  -> the UID of any namespace that has an anonymous namespace in its parent chain is null.

Yep - I assume that's the right thing to do, just not sure if it
would/will require a little extra work & wanted to make sure it was on
the radar.

With this change we will get the full type uniquing benefits. As a side effect, definitions that are in the same namespace will be children of the same DW_TAG_namespace in the debug info, which I’d argue is the expected behavior for LTO anyway.

Sort of - what happens to functions defined in that namespace? Will
they all end up in one CU with one instance of the namespace under
LTO? Could do weird things to address ranges? (would we end up with
the address range in the CU the function was originally in - but the
subprogram DIE ends up in some other CU that doesn't include that
address range)

I think you just convinced me to save this for later and clone a namespace for every compile unit instead. The decl_file/decl_line will have to identical for all CUs though (and come from the module linked in first).

OK - so we'll put one namespace in the metadata and just use that for
each CU? Should be easy enough.

One issue: what keeps the namespace metadata nodes live? Where do we
find them? Are we going to add another list to the CUs?

Alternatively... does anyone really care about the source location of
a namespace? (given that we have to pick one arbitrary location that a
namespace starts at?) Should we just omit it, then namespace nodes
would deduplicate simply.

Can't come up with a use for it off the top of my head, no.

-eric

I looked into this a little.

Thanks for taking a look!

My proposal is this:

- extend DINamespace with an extra UniqueID filed (analogous to DICompositeType)
- add an extra case to DIScope::getRef()
- Let CGDebugInfo create a mangled name for each namespace to use as UID
-> the UID of the anonymous namespace is null.

For example:

namespace a { namespace b {} } -> “_ZN1a", “_ZN1a1b"

!0 = [compile unit]
!1 = metadata !{i32 786489, metadata !0, metadata !”_ZN1a", metadata !”b", i32 1, metadata !”_ZN1a1b"} ; [ DW_TAG_namespace ] [a]
!2 = metadata !{i32 786489, metadata !0, null, metadata !”a", i32 1, metadata !”_ZN1a"} ; [ DW_TAG_namespace ] [a] [line 1]

namespace a1b {} -> “_ZN3a1b"
namespace {} -> null (!= “”) // Anonymous namespaces will remain old-fashioned MDNode references to prevent uniquing.

Will this do the right thing if there's a named namespace inside or
outside the anonymous namespace?

This is actually really tricky. Normally the ValueSymbolTable takes care of adding a unique suffix to all conflicting symbols. i think the most reasonable thing to do here is to add a rule:
-> the UID of any namespace that has an anonymous namespace in its parent chain is null.

Yep - I assume that's the right thing to do, just not sure if it
would/will require a little extra work & wanted to make sure it was on
the radar.

With this change we will get the full type uniquing benefits. As a side effect, definitions that are in the same namespace will be children of the same DW_TAG_namespace in the debug info, which I’d argue is the expected behavior for LTO anyway.

Sort of - what happens to functions defined in that namespace? Will
they all end up in one CU with one instance of the namespace under
LTO? Could do weird things to address ranges? (would we end up with
the address range in the CU the function was originally in - but the
subprogram DIE ends up in some other CU that doesn't include that
address range)

I think you just convinced me to save this for later and clone a namespace for every compile unit instead. The decl_file/decl_line will have to identical for all CUs though (and come from the module linked in first).

OK - so we'll put one namespace in the metadata and just use that for
each CU? Should be easy enough.

One issue: what keeps the namespace metadata nodes live? Where do we
find them? Are we going to add another list to the CUs?

Alternatively... does anyone really care about the source location of
a namespace? (given that we have to pick one arbitrary location that a
namespace starts at?) Should we just omit it, then namespace nodes
would deduplicate simply.

At least for C++ the source location for a namespace is arbitrary and not very useful. We would have to do something with the context field though (the second one): for a top-level namespace it currently points to the compile unit, which would counter any uniquing attempts.

-- adrian

From: llvmdev-bounces@cs.uiuc.edu [mailto:llvmdev-bounces@cs.uiuc.edu] On
Behalf Of Adrian Prantl

>>>
>>>>
>>>>> I looked into this a little.
>>>>
>>>> Thanks for taking a look!
>>>>
>>>>> My proposal is this:
>>>>>
>>>>> - extend DINamespace with an extra UniqueID filed (analogous to
DICompositeType)
>>>>> - add an extra case to DIScope::getRef()
>>>>> - Let CGDebugInfo create a mangled name for each namespace to use as
UID
>>>>> -> the UID of the anonymous namespace is null.
>>>>>
>>>>> For example:
>>>>>
>>>>> namespace a { namespace b {} } -> "_ZN1a", "_ZN1a1b"
>>>>>
>>>>> !0 = [compile unit]
>>>>> !1 = metadata !{i32 786489, metadata !0, metadata !"_ZN1a", metadata
!"b", i32 1, metadata !"_ZN1a1b"} ; [ DW_TAG_namespace ] [a]
>>>>> !2 = metadata !{i32 786489, metadata !0, null, metadata !"a", i32 1,
metadata !"_ZN1a"} ; [ DW_TAG_namespace ] [a] [line 1]
>>>>>
>>>>> namespace a1b {} -> "_ZN3a1b"
>>>>> namespace {} -> null (!= "") // Anonymous namespaces will remain
old-fashioned MDNode references to prevent uniquing.
>>>>
>>>> Will this do the right thing if there's a named namespace inside or
>>>> outside the anonymous namespace?
>>>
>>> This is actually really tricky. Normally the ValueSymbolTable takes
care of adding a unique suffix to all conflicting symbols. i think the
most reasonable thing to do here is to add a rule:
>>> -> the UID of any namespace that has an anonymous namespace in its
parent chain is null.
>>
>> Yep - I assume that's the right thing to do, just not sure if it
>> would/will require a little extra work & wanted to make sure it was on
>> the radar.
>>
>>>>> With this change we will get the full type uniquing benefits. As a
side effect, definitions that are in the same namespace will be children
of the same DW_TAG_namespace in the debug info, which I'd argue is the
expected behavior for LTO anyway.
>>>>
>>>> Sort of - what happens to functions defined in that namespace? Will
>>>> they all end up in one CU with one instance of the namespace under
>>>> LTO? Could do weird things to address ranges? (would we end up with
>>>> the address range in the CU the function was originally in - but the
>>>> subprogram DIE ends up in some other CU that doesn't include that
>>>> address range)
>>>
>>> I think you just convinced me to save this for later and clone a
namespace for every compile unit instead. The decl_file/decl_line will
have to identical for all CUs though (and come from the module linked in
first).
>>
>> OK - so we'll put one namespace in the metadata and just use that for
>> each CU? Should be easy enough.
>>
>> One issue: what keeps the namespace metadata nodes live? Where do we
>> find them? Are we going to add another list to the CUs?
>
> Alternatively... does anyone really care about the source location of
> a namespace? (given that we have to pick one arbitrary location that a
> namespace starts at?) Should we just omit it, then namespace nodes
> would deduplicate simply.

At least for C++ the source location for a namespace is arbitrary and not
very useful. We would have to do something with the context field though
(the second one): for a top-level namespace it currently points to the
compile unit, which would counter any uniquing attempts.

-- Adrian

Source location is useless on namespace entries, it would save a small amount
of space to nuke it. I'd say go ahead.
--paulr

I looked into this a little.

Thanks for taking a look!

My proposal is this:

- extend DINamespace with an extra UniqueID filed (analogous to DICompositeType)
- add an extra case to DIScope::getRef()
- Let CGDebugInfo create a mangled name for each namespace to use as UID
-> the UID of the anonymous namespace is null.

For example:

namespace a { namespace b {} } -> “_ZN1a", “_ZN1a1b"

!0 = [compile unit]
!1 = metadata !{i32 786489, metadata !0, metadata !”_ZN1a", metadata !”b", i32 1, metadata !”_ZN1a1b"} ; [ DW_TAG_namespace ] [a]
!2 = metadata !{i32 786489, metadata !0, null, metadata !”a", i32 1, metadata !”_ZN1a"} ; [ DW_TAG_namespace ] [a] [line 1]

namespace a1b {} -> “_ZN3a1b"
namespace {} -> null (!= “”) // Anonymous namespaces will remain old-fashioned MDNode references to prevent uniquing.

Will this do the right thing if there's a named namespace inside or
outside the anonymous namespace?

This is actually really tricky. Normally the ValueSymbolTable takes care of adding a unique suffix to all conflicting symbols. i think the most reasonable thing to do here is to add a rule:
-> the UID of any namespace that has an anonymous namespace in its parent chain is null.

Yep - I assume that's the right thing to do, just not sure if it
would/will require a little extra work & wanted to make sure it was on
the radar.

With this change we will get the full type uniquing benefits. As a side effect, definitions that are in the same namespace will be children of the same DW_TAG_namespace in the debug info, which I’d argue is the expected behavior for LTO anyway.

Sort of - what happens to functions defined in that namespace? Will
they all end up in one CU with one instance of the namespace under
LTO? Could do weird things to address ranges? (would we end up with
the address range in the CU the function was originally in - but the
subprogram DIE ends up in some other CU that doesn't include that
address range)

I think you just convinced me to save this for later and clone a namespace for every compile unit instead. The decl_file/decl_line will have to identical for all CUs though (and come from the module linked in first).

OK - so we'll put one namespace in the metadata and just use that for
each CU? Should be easy enough.

One issue: what keeps the namespace metadata nodes live? Where do we
find them? Are we going to add another list to the CUs?

Alternatively... does anyone really care about the source location of
a namespace? (given that we have to pick one arbitrary location that a
namespace starts at?) Should we just omit it, then namespace nodes
would deduplicate simply.

At least for C++ the source location for a namespace is arbitrary and not very useful. We would have to do something with the context field though (the second one): for a top-level namespace it currently points to the compile unit, which would counter any uniquing attempts.

Actually I had the same thought & had tested it. It seems we don't
link top level namespaces to compile units... see
DIBuilder::createNameSpace and its use of getNonCompileUnitScope. If
you specify a scope with the DW_TAG_compile_unit, DIBuilder uses a
null scope instead. (not sure why we do this for namespaces and
perhaps not for types, functions, etc - we could probably just fix
that in the frontend (getContextDescriptor) and drop this
workaround/conditional in LLVM)

- David