RFC: metadata attachments for global variables

Hi all,

I’d like to add support for metadata attachments for global variables in the same way as we did for functions.

Syntax would be pretty simple:
@foo = global i32 0, !foo !0, !bar !1
(the extra commas are required to disambiguate from a named metadata on the next line)

Benefits:

  1. Lets us reverse the DIGlobalVariable → GlobalVariable edge, which should hopefully clear the way for removing the llvm.dbg.cu named metadata node.

  2. Allows for bitset metadata to be attached to globals rather than being represented as a named metadata node. This means that bitset metadata is naturally dropped when globals are dropped, and makes it easier to manipulate bitset metadata and build summaries of bitset definitions.

Comments appreciated.

Thanks,

Hi all,

I'd like to add support for metadata attachments for global variables in the same way as we did for functions.

Syntax would be pretty simple:
@foo = global i32 0, !foo !0, !bar !1
(the extra commas are required to disambiguate from a named metadata on the next line)

SGTM.

Benefits:
1) Lets us reverse the DIGlobalVariable -> GlobalVariable edge, which should hopefully clear the way for removing the llvm.dbg.cu named metadata node.

A little harder than it sounds (need to somehow support a GlobalVariable that is RAUW'ed with a ConstantInt), but I think this is important to do.

You have my full support! I actually started implementing this to fix the problem outlined in http://reviews.llvm.org/D18109, but then was side-tracked by some ThinLTO work. I’ve attached my early stage WIP patch to the D18109, in case you are interested in looking at it and/or picking up from there. There are some interesting challenges, such as the fact that we need to support more than one !dbg attachement per global and the need to replace constants in the global field with DIExpressions that I haven’t tackled yet.

-- adrian

Thanks Adrian.

The ASan issue in that code review actually seems very similar to some of the global variable manipulation that I’m doing as part of whole-program vtable optimization and CFI (see e.g. DevirtModule::rebuildGlobal). I’ll have to think about whether there’s maybe some solution to this problem that could be used by ASan, vtable opt and CFI.

How can a GlobalVariable be replaced with a ConstantInt? Wouldn't that just
be an absolute address?

Thanks,

>
> Hi all,
>
> I'd like to add support for metadata attachments for global variables
in the same way as we did for functions.
>
> Syntax would be pretty simple:
> @foo = global i32 0, !foo !0, !bar !1
> (the extra commas are required to disambiguate from a named metadata on
the next line)

SGTM.

> Benefits:
> 1) Lets us reverse the DIGlobalVariable -> GlobalVariable edge, which
should hopefully clear the way for removing the llvm.dbg.cu named
metadata node.

A little harder than it sounds (need to somehow support a GlobalVariable
that is RAUW'ed with a ConstantInt), but I think this is important to do.

How can a GlobalVariable be replaced with a ConstantInt? Wouldn't that
just be an absolute address?

If the global variable has a known constant value & the address is never
needed, then we might optimize away the storage and still want debug info
describing the constant.

>
> Hi all,
>
> I'd like to add support for metadata attachments for global variables
in the same way as we did for functions.
>
> Syntax would be pretty simple:
> @foo = global i32 0, !foo !0, !bar !1
> (the extra commas are required to disambiguate from a named metadata
on the next line)

SGTM.

> Benefits:
> 1) Lets us reverse the DIGlobalVariable -> GlobalVariable edge, which
should hopefully clear the way for removing the llvm.dbg.cu named
metadata node.

A little harder than it sounds (need to somehow support a GlobalVariable
that is RAUW'ed with a ConstantInt), but I think this is important to do.

How can a GlobalVariable be replaced with a ConstantInt? Wouldn't that
just be an absolute address?

If the global variable has a known constant value & the address is never
needed, then we might optimize away the storage and still want debug info
describing the constant.

I'm not sure under what conditions, if any, that currently works - so this
may be more of a "nice to have" or "be good to think about how any future
design wouldn't preclude fixing this gap" sort of thing.

- David

I see. This does seem a lot like something like macro expansions where the
entity doesn't have a direct correspondent in the IR. So perhaps any pass
that would want to preserve debug info in such a case could call some
utility function that would move the DIGlobalVariable to (I suppose it
would have to be) some named metadata node to which we'd attach things like
macro expansions.

Thanks,

+1. I’ve seen a couple of cases where metadata on a global would be useful as an optimization hint as well.

Philip

+1 or attaching metadata to globals. I’d be curious if we can attach !range to a global representing an enum for example. But…

Should we care about GlobalMerge for !dbg here? Currently if we merge globals ‘a’ and ‘b’ in to __merged_global, i guess the DIGlobalVariable’s for ‘a’ and ‘b’ would now both point to the new __merged_global.

If thats the case, then by having the !dbg on the global, we wouldn’t be able to handle this case, unless you allow the !dbg to (optionally) point to a list of DIGlobalVariable’s.

BTW, this isn’t something I think should block this bug, but just wanted to point out that it could be an issue. MergeFunction’s likely has a similar issue with !dbg on functions and I guess we don’t handle it there either.

Cheers
Pete

>
> Hi all,
>
> I'd like to add support for metadata attachments for global
variables in the same way as we did for functions.
>
> Syntax would be pretty simple:
> @foo = global i32 0, !foo !0, !bar !1
> (the extra commas are required to disambiguate from a named metadata
on the next line)

SGTM.

> Benefits:
> 1) Lets us reverse the DIGlobalVariable -> GlobalVariable edge,
which should hopefully clear the way for removing the llvm.dbg.cu
named metadata node.

A little harder than it sounds (need to somehow support a
GlobalVariable that is RAUW'ed with a ConstantInt), but I think this is
important to do.

How can a GlobalVariable be replaced with a ConstantInt? Wouldn't that
just be an absolute address?

If the global variable has a known constant value & the address is never
needed, then we might optimize away the storage and still want debug info
describing the constant.

I'm not sure under what conditions, if any, that currently works - so
this may be more of a "nice to have" or "be good to think about how any
future design wouldn't preclude fixing this gap" sort of thing.

I see. This does seem a lot like something like macro expansions where the
entity doesn't have a direct correspondent in the IR.

Ish - macros disappear in the frontend, so, yes, when we're preserving them
we just attach them to the CU.

So perhaps any pass that would want to preserve debug info in such a case
could call some utility function that would move the DIGlobalVariable to (I
suppose it would have to be) some named metadata node to which we'd attach
things like macro expansions.

Yep, something like that. But many passes might just blindly RAUW loads of
the pointer to the global with the constant & then the global just quietly
disappears due to lack of use (probably some cleanup path). But this seems
to already be happening today - Duncan, do you know any cases where we do
successfully describe a global via a constant value that would be the value
of the global, not its address? My simple cases just seemed to lose the
location entirely if the constant's storage was optimized away.

- David

Agree, I think at least in the long term we’ll want to have metadata attachments on globalobjects be represented with something equivalent to a multimap. The bitset metadata could also potentially benefit from a concept of multiple attachments per global, so I’ll see if it would be worth doing it now.

Peter

+1 or attaching metadata to globals. I’d be curious if we can attach !range to a global representing an enum for example. But…

Should we care about GlobalMerge for !dbg here? Currently if we merge globals ‘a’ and ‘b’ in to __merged_global, i guess the DIGlobalVariable’s for ‘a’ and ‘b’ would now both point to the new __merged_global.

If thats the case, then by having the !dbg on the global, we wouldn’t be able to handle this case, unless you allow the !dbg to (optionally) point to a list of DIGlobalVariable’s.

Users seem to care. I did receive several independent bug reports about LLVM not handling debug info for global merge even before this change, so I definitely want to at least have a story for supporting it before we proceed.

Thanks Adrian.

The ASan issue in that code review actually seems very similar to some of the global variable manipulation that I'm doing as part of whole-program vtable optimization and CFI (see e.g. DevirtModule::rebuildGlobal). I'll have to think about whether there's maybe some solution to this problem that could be used by ASan, vtable opt and CFI.

How can we end up with more than one !dbg per global? Because of global merging?

Basically, yes. I thought I remembered that there where other situations but I failed to construct one at the moment.

Not to spoil all the fun, but I’m not sure if this will bring us much closer to removing the llvm.dbg.cu node. The reason the llvm.dbg.cu node exits is so we can find all DICompileUnits, because the DICompileUnit holds debug info that is not referenced by any IR. This includes things like DIImportedEntity (think C++ “using”), enums, and macros.
We will also need a story for preserving DIGlobalVariables that are constants and have their GlobalObject optimized away. We can definitely remove all the global variables that are referenced from GlobalObjects’ !dbg attachments from the DICompileUnit’s list of globals, but we will need to retain all constants. Really, we should also retain all non-constant globals that had their storage optimized away, because they may shadow other variables. For example:

  int i;
  void f() {
    static int i; // may get optimized away
    // if I’m stopped here, in the debugger, “i” should always refer to the inner i;
  }

-- adrian

I understand the fact that DICompileUnit holds debug information that are useful and not referenced directly from the IR.

What I wonder is, after all edges are pointing from the IR toward the metadata graph, why do you need to keep a compile unit (and the associated retained informations) if nothing in the IR (transitively) points to this compile unit? (i.e. it is unreachable from the IR)

Yes, that does seem to make things more complicated than I thought. I
suppose we would have to at least for the moment preserve the global named
metadata node and the global variable list on DISubprogram. In the long
term we could perhaps consider preserving these variables on a global named
metadata node like I was mentioning on the other subthread with David. We'd
still need to have a global metadata node, so maybe it wouldn't be much
better.

Thanks,

> 1) Lets us reverse the DIGlobalVariable -> GlobalVariable edge, which
should hopefully clear the way for removing the llvm.dbg.cu named
metadata node.

Not to spoil all the fun, but I’m not sure if this will bring us much
closer to removing the llvm.dbg.cu node. The reason the llvm.dbg.cu node
exits is so we can find all DICompileUnits, because the DICompileUnit holds
debug info that is not referenced by any IR. This includes things like
DIImportedEntity (think C++ “using”), enums, and macros.

Sure - but there's probably a fair question whether any of that stuff is
relevant if /nothing/ was ever emitted for the CU. Most of those things
should only be used for lookup within the scope of the file they were
defined in, for example - so if there's no function from that CU, no line
table, etc, you can never be "in" that file, etc.

But I haven't thought about it hard - there may be good reasons for CUs
with no live code/data.

We will also need a story for preserving DIGlobalVariables that are
constants and have their GlobalObject optimized away. We can definitely
remove all the global variables that are referenced from GlobalObjects’
!dbg attachments from the DICompileUnit’s list of globals, but we will need
to retain all constants. Really, we should also retain all non-constant
globals that had their storage optimized away, because they may shadow
other variables. For example:

  int i;
  void f() {
    static int i; // may get optimized away
    // if I’m stopped here, in the debugger, “i” should always refer to
the inner i;

I've probably made this argument myself - but I wonder how strong it is.
Name lookup's never going to be perfect because some things can be
optimized out (or never code generated in the first place) - for example
functions:

namespace x {
void f();
namespace y {
inline void f() {
}
void g();
}
}

If you're in the definition of g, but no one ever called f (or it got
optimized away), 'f' is going to find x::f, not x::y::f...

But I agree, we could probably make the argument that within a function we
preserve all the names.

1) Lets us reverse the DIGlobalVariable -> GlobalVariable edge, which should hopefully clear the way for removing the llvm.dbg.cu named metadata node.

Not to spoil all the fun, but I’m not sure if this will bring us much closer to removing the llvm.dbg.cu node. The reason the llvm.dbg.cu node exits is so we can find all DICompileUnits, because the DICompileUnit holds debug info that is not referenced by any IR. This includes things like DIImportedEntity (think C++ “using”), enums, and macros.

I understand the fact that DICompileUnit holds debug information that are useful and not referenced directly from the IR.

What I wonder is, after all edges are pointing from the IR toward the metadata graph, why do you need to keep a compile unit (and the associated retained informations) if nothing in the IR (transitively) points to this compile unit? (i.e. it is unreachable from the IR)

Couple of examples:
- because the compile unit could contain definitions for an enum that is useful during expression evaluation
- because it contains constants that are useful (XNU for example has this version.c file: http://opensource.apple.com/source/xnu/xnu-1456.1.26/config/version.c).
- When clang compiles debug info for a module it compiles an otherwise empty CU with debug info for all types in the module.

-- adrian

1) Lets us reverse the DIGlobalVariable -> GlobalVariable edge, which should hopefully clear the way for removing the llvm.dbg.cu named metadata node.

Not to spoil all the fun, but I’m not sure if this will bring us much closer to removing the llvm.dbg.cu node. The reason the llvm.dbg.cu node exits is so we can find all DICompileUnits, because the DICompileUnit holds debug info that is not referenced by any IR. This includes things like DIImportedEntity (think C++ “using”), enums, and macros.

I understand the fact that DICompileUnit holds debug information that are useful and not referenced directly from the IR.

What I wonder is, after all edges are pointing from the IR toward the metadata graph, why do you need to keep a compile unit (and the associated retained informations) if nothing in the IR (transitively) points to this compile unit? (i.e. it is unreachable from the IR)

Couple of examples:
- because the compile unit could contain definitions for an enum that is useful during expression evaluation
- because it contains constants that are useful (XNU for example has this version.c file: http://opensource.apple.com/source/xnu/xnu-1456.1.26/config/version.c).
- When clang compiles debug info for a module it compiles an otherwise empty CU with debug info for all types in the module.

- (something we recently discussed on cfe-commits for better supporting dtrace/ctfconvert) because the frontend implemented a -gfull option that added all types to the CU’s list retained types.