[proposal] Extensible IR metadata

Chris_Lattner · September 11, 2009, 4:57pm

Devang's work on debug info prompted this, thoughts welcome:
http://nondot.org/sabre/LLVMNotes/ExtensibleMetadata.txt

-Chris

Kenneth_Uildriks · September 11, 2009, 6:08pm

So this particular metadata would be an extension of the type? And
get propagated through as you create instructions that depend on other
instructions which had the type metadata attached?

I found myself wishing for such a thing several times recently. I
ended up using a "type tag" of type [0 x opaque*] in my structs to
force the type system to differentiate them from each other and make
unique classes out of them. It works, but I need a separate hashtable
to get back to a "class description" from a Value of a given Type.

Chris_Lattner · September 11, 2009, 6:29pm

No, this would be a property of the operation. In a dynamically typed language like python (and many others) a naive translation will turn all python objects into "void*"s. However, with some static or dynamic analysis, many types can be guessed at or inferred. This is a property of various operations, not about "void*".

-Chris

Jeffrey_Yasskin1 · September 11, 2009, 8:20pm

I've got a suggestion for a refinement:

Right now, we have classes like DILocation that wrap an MDNode* to
provide more convenient access to it. It would be more convenient for
users if instead of

MDNode *DbgInfo = Inst->getMD(MDKind::DbgTag);
Inst2->setMD(MDKind::DbgTag, DbgInfo);

they could write:

DILocation DbgInfo = Inst->getMD<DILocation>();
inst2->setMD(DbgInfo);

we'd use TheContext->RegisterMDKind<MyKindWrapper>() or
...Register<MyKindWrapper>("name"); to register a new kind. (I prefer
the first.)

These kind wrappers need a couple methods to make them work:

const StringRef KindWrapper::name("...");
KindWrapper(MDNode*); // Except for special cases like LdStAlign.
KindWrapper::operator bool() {return mdnode!=NULL;} // ??
int StaticTypeId<KindWrapper>::value; // Used for the proposal's MDKind
KindWrapper::ValidOnValue(const Value*);
MDNode* KindWrapper::merge(MDNode*, MDNode*) // For the optimizers

StaticTypeId is a new class that maps each of its template arguments
to a small, unique integer, which may be different in different
executions.

Since the optimizers may want more methods over time, but we don't
really want to require users to extend their wrappers, we should say
that all wrappers must inherit from a particular type. I'd name this
type "MDKind" and rename the proposed MDKind to MDKindID. Then we can
add defaults to MDKind over time. Nothing needs to be virtual since
these types are all used as template arguments.

We could either use a global list of IDs for the MDKinds or have
separate lists for each Context. StaticTypeId can only provide a
global list, so giving each Context its own list would take an extra
lookup, and wouldn't provide any benefit I can see.

Chris mentioned that .bc files would store the mapping from name->ID,
so the fact that StaticTypeId changes its values between runs isn't a
problem.

Thoughts?

Dan_Gohman3 · September 11, 2009, 9:11pm

The document mentions "instructions" a lot. We'll want to be able to
apply metadata to ConstantExprs as well at least, if not also Arguments
(think noalias) and other stuff, so it seems best to just talk about
"values" instead, and DenseMap<Value *, ...> instead of
DenseMap<Instruction *, ...>.

Dan

David_A_Greene · September 11, 2009, 10:23pm

I've got a suggestion for a refinement:

Right now, we have classes like DILocation that wrap an MDNode* to
provide more convenient access to it. It would be more convenient for
users if instead of

  MDNode *DbgInfo = Inst->getMD(MDKind::DbgTag);
  Inst2->setMD(MDKind::DbgTag, DbgInfo);

they could write:

  DILocation DbgInfo = Inst->getMD<DILocation>();
  inst2->setMD(DbgInfo);

we'd use TheContext->RegisterMDKind<MyKindWrapper>() or
...Register<MyKindWrapper>("name"); to register a new kind. (I prefer
the first.)

Yes, this is very convenient. This along with the rest of Chris' proposal is
very similar to the way we handled metadata in a compiler I worked on years
ago. It was so useful we even used it to stash dataflow information away as
we did analysis. Of course we had metatadat tagged on control structures as
well. I'd like to see the currently proposal extended to other constructs as
Chris notes.

StaticTypeId is a new class that maps each of its template arguments
to a small, unique integer, which may be different in different
executions.

How does this work across compilation units? How about with shared LLVM
libraries? These kinds of global unique IDs are notoriously difficult
to get right. I'd suggest using a third-party unique-id library. Boost.UUID
is one possibility but not the only one.

I have a few questions and comments about Chris' initial proposal as well.

- I don't like the separation between "built-in" metadata and "extended"
  metadata. Why not make all metadata use the RegisterMDKind interface and
  just have the LLVM libraries do it automatically for the "built-in" stuff?
  Having a separate namespace of enums is going to get confusing. Practically
  every day I curse the fact that "int" is different than "MyInt" in C++. :-/

- Defaulting alignment to 1 when metatadata is not present is going to be a
  huge performance hit on many architectures. I hope we can find a better
  solution. I'm not sure what it is yet because we have to maintain safety.
  I just fear a Pass inadvertantly dropping metadata and really screwing
  things up.

This looks very promising!

-Dave

Chris_Lattner · September 11, 2009, 11:47pm

I wrote: "Note that this document talks about metadata for instructions, it might make sense to generalize this to being metadata for all non-uniqued values (global variables, functions, basic blocks, arguments), but I'm just keeping it simple for now."

However, constant exprs are uniqued. What would you find it useful for?

-Chris

Dan_Gohman3 · September 11, 2009, 11:55pm

Devang's work on debug info prompted this, thoughts welcome:

http://nondot.org/sabre/LLVMNotes/ExtensibleMetadata.txt

The document mentions "instructions" a lot. We'll want to be able to

apply metadata to ConstantExprs as well at least, if not also Arguments

(think noalias) and other stuff, so it seems best to just talk about

"values" instead, and DenseMap<Value *, ...> instead of

DenseMap<Instruction *, ...>.

I wrote: "Note that this document talks about metadata for instructions, it might make sense to generalize this to being metadata for all non-uniqued values (global variables, functions, basic blocks, arguments), but I'm just keeping it simple for now."

I missed that part.

However, constant exprs are uniqued. What would you find it useful for?

We have inbounds on ConstantExprs today, for example.

Dan

Chris_Lattner · September 11, 2009, 11:57pm

... and it was an interesting source of problems. Do you think that inbounds on constantexprs is really a good idea? It means that we can get into a world where we have: "gep p, 0, 1" and "gep inbounds p, 0, 1" not be uniqued.

The impact of this is somewhat reduced by libanalysis and vmcore trying to infer inbounds etc. Instead of putting inbounds on the constantexpr, why not make that "inference" be a predicate that any client could ask of the constantexpr?

-Chris

Jeffrey_Yasskin1 · September 12, 2009, 12:15am

template<typename T>
class StaticTypeId {
static int id;
}
extern int NextStaticTypeId; // Initialized to 0. Possibly an atomic
type instead.
template<typename T> int StaticTypeId<T>::id = NextStaticTypeId++;

This relies on the compiler uniquing static member variables across
translation units, and I've never tested that across shared library
boundaries. The initializer didn't work with gcc-2 (there was a
workaround), but I believe it works with gcc-4. I've never tested it
with MSVC. We can also use static local variables, which would have a
different set of bugs, but they're very slightly slower to access.

Since there's a registration step, we could also use Pass-style IDs,
and have the registration fill them in, which would avoid uniquing
problems.

Chris_Lattner · September 12, 2009, 12:22am

I have a few questions and comments about Chris' initial proposal as well.

- I don't like the separation between "built-in" metadata and "extended"
metadata. Why not make all metadata use the RegisterMDKind interface and
just have the LLVM libraries do it automatically for the "built-in" stuff?
Having a separate namespace of enums is going to get confusing. Practically
every day I curse the fact that "int" is different than "MyInt" in C++. :-/

"builtin" metadata would also be registered, the only magic would be that the encoding would be smaller in the IR.

- Defaulting alignment to 1 when metatadata is not present is going to be a
huge performance hit on many architectures. I hope we can find a better
solution. I'm not sure what it is yet because we have to maintain safety.
I just fear a Pass inadvertantly dropping metadata and really screwing
things up.

I don't expect metadata to be commonly stripped. This could be just as bad a perf problem for other things like TBAA or high level type information for a dynamic language. I think it is important that the IR is possible to reason about even in uncommon cases though.

-Chris

David_A_Greene · September 12, 2009, 12:54am

> - I don't like the separation between "built-in" metadata and
> "extended"
> metadata. Why not make all metadata use the RegisterMDKind
> interface and
> just have the LLVM libraries do it automatically for the "built-in"
> stuff?
> Having a separate namespace of enums is going to get confusing.
> Practically
> every day I curse the fact that "int" is different than "MyInt" in C
> ++. :-/

"builtin" metadata would also be registered, the only magic would be
that the encoding would be smaller in the IR.

Except the API is different. Built-in types use a well-known enum
value not available to extended metadata. I have no problem with a
smaller IR encoding. It's the programming interface I'm concerned
about. I'd rather it be the same for everything.

I don't expect metadata to be commonly stripped. This could be just
as bad a perf problem for other things like TBAA or high level type
information for a dynamic language. I think it is important that the
IR is possible to reason about even in uncommon cases though.

Sure. Just something we need to be aware of.

-Dave

David_A_Greene · September 12, 2009, 12:57am

This relies on the compiler uniquing static member variables across
translation units, and I've never tested that across shared library
boundaries. The initializer didn't work with gcc-2 (there was a
workaround), but I believe it works with gcc-4. I've never tested it
with MSVC. We can also use static local variables, which would have a
different set of bugs, but they're very slightly slower to access.

Shared libraries are the big problem. I know the Boost guys had endless
discussions about how to design a Singleton to work in the presence of shared
libraries and this is pretty close to the same problem.

Since there's a registration step, we could also use Pass-style IDs,
and have the registration fill them in, which would avoid uniquing
problems.

Yes, I think that should work. Doing things with static initializer magic is
asking for trouble.

-Dave

Nick_Lewycky · September 12, 2009, 2:17am

Dan Gohman wrote:

Chris_Lattner · September 12, 2009, 3:33am

The pushback has been about adding lots of weird and special purpose extensions, not the encoding.

-Chris

Nick_Lewycky · September 12, 2009, 4:00am

Chris Lattner wrote:

Dan Gohman wrote:

Devang's work on debug info prompted this, thoughts welcome:
http://nondot.org/sabre/LLVMNotes/ExtensibleMetadata.txt

The document mentions "instructions" a lot. We'll want to be able to
apply metadata to ConstantExprs as well at least, if not also Arguments
(think noalias) and other stuff, so it seems best to just talk about
"values" instead, and DenseMap<Value *, ...> instead of
DenseMap<Instruction *, ...>.

I'm wondering that too. Can we replace LLVM function attributes with metadata? There's been some pushback to adding new function attributes in the past and it would be nice to be able to prototype new ones without having to change all of the vm core.

The pushback has been about adding lots of weird and special purpose extensions, not the encoding.

The bar is higher for getting something into the vm core, as it should be. It sounds like we're planning to permit special purpose metadata which is why I asked.

If nothing else, it would be more convenient to prototype new extensions to find out what they're really worth.

Nick

Renato_Golin3 · September 12, 2009, 8:39am

I just wonder how stable that would become as time passes by.

It is true that enabling the addition of metadata as part of the
structure is good for specialized optimizations that are not
represented internally (or relevant) in the LLVM core, but there is a
practical limit on what you can do.

My (humble) opinion is that basic language structure, such as function
attributes, should still be part of the core. And if there isn't
always easy ways of getting them and passing them through we should
make it easier... in the core.

Metadata is a completely different beast. It's good for things that
only your own optimization pass or machine code will understand. It's
an additional rather than required information, which the lack of
would be completely harmless.

I completely agree with the text argument that the demand (and
necessity) for metadata is increasing, but that doesn't mean we should
transform everything into it.

The RDF [1] developments sent a clear message that metadata per se are
too loose to hold value. We need a fixed, basic structure on where to
stick metadata, otherwise it'd just be a big slimy blob of untreatable
data.

My two cents...

cheers,
--renato

[1] RDF - Semantic Web Standards

Reclaim your digital rights, eliminate DRM, learn more at
http://www.defectivebydesign.org/what_is_drm

Chris_Lattner · September 12, 2009, 7:52pm

Yep, I completely agree!

-Chris

Devang_Patel · September 15, 2009, 12:55am

IMO, there is not any need to add two llvm::Instruction methods,
getMD and setMD. The metadata associated with an instruction will be
store separately anyway.

Devang_Patel · September 16, 2009, 5:24am

Right now, I am preparing a very simple implementation that allows us
to make progress on debug info front. And the same time, it'd be
possible for someone to extend it for other uses.

Topic		Replies	Views
Extensible Metadata in LLVM IR LLVM Dev List Archives	0	83	April 20, 2010
Extensible Metadata in LLVM IR LLVM Dev List Archives	1	78	April 14, 2010
Metadata on StructType LLVM Dev List Archives	0	176	May 22, 2018
More metadata questions LLVM Dev List Archives	2	107	November 8, 2010
[RFC] Target type classes for extensibility of LLVM IR LLVM Project llvm	14	1354	July 11, 2023

[proposal] Extensible IR metadata

Related Topics