RFP: Metadata is being used poorly to paper over missing IR constructs

Rather than bending over backwards to keep all of this working, we should try to add the IR facilities needed to avoid these problems.

This is the result of a discussion between myself Duncan and Eric (all of us probably relaying ideas from still other discussions) that I’m trying to write down here because none of us are going to be able to prioritize working on this soon. If anyone else ends up needing facilities related to this or having time to make this part of LLVM better, that would be awesome. Hence, a request for patches more than a request for comments. =D

(and apologies if this is poorly structured or rambly)

I’m aware of at least two quite strange uses of metadata at the moment (Duncan, Eric, jump in with more I missed):

  1. The need for an arbitrary (often target-specific) symbolic string in the IR that can be used with intrinsics and/or instructions
  2. The need to build up a nearly arbitrary record of the “flags” used to control the code generation of the module which need to be handled correctly at link time.

Neither of these fit the model of metadata. They aren’t really optional. They can’t be stripped while preserving correctness. They aren’t annotations at all. I’m suggesting they both need first class representation, and that this representation won’t really be complex or intrusive.

#1
We need the ability to put semantic information in the IR that can be used by targets without extending the IR itself. In some cases we can do this with target-specific intrinsics, but those don’t always fit the problem and have their own set of challenges.

I think it would be nice if we just had a top level IR construct for symbolic strings. These should be allowed both inline (much like immediate constants) and potentially out-of-line like attributes. It is possible that this feature could be useful to simplify attributes or #2, but it seems simple and useful enough that I’m OK with it living on its own.

These symbolic strings would definitionally have no impact on the generated code. We could make them opaque Constants if that’s a useful API. If we don’t need the Constant API, I would do something similar to Duncan’s separation for metadata. My suspicion is that making these Constants would be convenient so they can be used as Values, with the understanding that it would be invalid to use them in arbitrary places, as they only have defined semantics in specific scenarios. The canonical example would be:

call i64 @llvm.read_register(“sp”)

#2
We need to make module flags a first class entity of the module, just like datalayout:

&flag = …

(syntax shamelessly stolen from Duncan’s suggestion in IRC)

We can then actually specify exactly what the requirements are on module flags and how they are linked. We might be able to sink datalayout into this, or it might be better to keep separate, unsure. But having these be top-level real entities would be great.

Hope these thoughts are useful. Sadly I’m not going to be able to drive any of these any time soon. I get a similar impression from Duncan. But I think lots of us would be able to help if someone wanted to contribute work on this front.

If there is broad consensus about the design points above, I’ll paste them into two PRs as well for tracking.

-Chandler

Rather than bending over backwards to keep all of this working, we should try to add the IR facilities needed to avoid these problems.

This is the result of a discussion between myself Duncan and Eric (all of us probably relaying ideas from still other discussions) that I'm trying to write down here because none of us are going to be able to prioritize working on this soon. If anyone else ends up needing facilities related to this or having time to make this part of LLVM better, that would be awesome. Hence, a request for patches more than a request for comments. =D

Sorry, I don’t have any patches handy, but i’m full of comments :slight_smile:

<braindump> (and apologies if this is poorly structured or rambly)

I'm aware of at least two quite strange uses of metadata at the moment (Duncan, Eric, jump in with more I missed):
1) The need for an arbitrary (often target-specific) symbolic string in the IR that can be used with intrinsics and/or instructions
2) The need to build up a nearly arbitrary record of the "flags" used to control the code generation of the module which need to be handled correctly at link time.

Neither of these fit the model of metadata. They aren't really optional. They can't be stripped while preserving correctness. They aren't annotations at all. I'm suggesting they both need first class representation, and that this representation won't really be complex or intrusive.

#1
We need the ability to put semantic information in the IR that can be used by targets without extending the IR itself. In some cases we can do this with target-specific intrinsics, but those don't always fit the problem and have their own set of challenges.

I think it would be nice if we just had a top level IR construct for symbolic strings. These should be allowed both inline (much like immediate constants) and potentially out-of-line like attributes. It is possible that this feature could be useful to simplify attributes or #2, but it seems simple and useful enough that I'm OK with it living on its own.

These symbolic strings would definitionally have no impact on the generated code. We could make them opaque Constants if that's a useful API. If we don't need the Constant API, I would do something similar to Duncan's separation for metadata. My suspicion is that making these Constants would be convenient so they can be used as Values, with the understanding that it would be invalid to use them in arbitrary places, as they only have defined semantics in specific scenarios. The canonical example would be:

  call i64 @llvm.read_register(<sigil>"sp")

#2
We need to make module flags a first class entity of the module, just like data layout:

For #2, I agree we need module flags. Run an objective C program and you’ll see stuff like this at the bottom which should not be metadata:

!{i32 4, !"Objective-C Garbage Collection", i32 0}

Personally i would put an AttributeSet on the Module. It already has most of what you want here, already has parsing support, serialization, etc. You just need to define the set of allowable flags for a module. Targets can also use the string=string Attribute to add target specific attributes to the Module. And you could sink DataLayout or even the triple in there if you wanted, although I haven’t thought through this part enough to know if its a good idea.

This would also play nice with LTO. Modules store code gen attributes on the module itself. When you LTO, at the point where you get a clash (say SSE vs non-SSE), you can sink the attribute from module level down to all that modules functions.

Thanks,
Pete

Sure, I have no specific thoughts on implementation of this. It just
shouldn't be metadata.

Agreed.

#1
We need the ability to put semantic information in the IR that can be used by targets without extending the IR itself. In some cases we can do this with target-specific intrinsics, but those don't always fit the problem and have their own set of challenges.

+1, depending on the details :slight_smile:

#2
We need to make module flags a first class entity of the module, just like datalayout:

  &flag = ....

(syntax shamelessly stolen from Duncan's suggestion in IRC)

I disagree. The concerns about “stability” of metadata don’t apply to module-level metadata that doesn’t refer to the other IR in the module. A nice thing about module-level metadata is that it eliminates the “need" to encode features like command line flags directly in the IR in a custom tailored way. There should be no need to design bitcode and .ll syntax for new things like command line flags. If we had module-level metadata back in day 1, targetdata would be using it...

-Chris

Calling whatever it is that encodes things in the module "metadata" makes
that term less useful. I don't really care about the syntax or encoding,
but I do very much care that we separate the terminology and APIs used for
entities that have very fundamentally different behavior constraints.

Metadata is discardable without changing correctness. These other things
are not. We need two different ways to describe and manipulate them so that
we don't continually get confused as to which case we are dealing with. And
"module-level" isn't even a good predicate because some things at the
module level are meeting the same constraints as the rest of metadata --
namely, debug information.

I can see what you’re saying, but I think you’ll end up with something that is “module metadata that is string-only” or something. I don’t know if the clarity of the model you’re looking for is worth adding another similar-but-different facility, but I suppose we can discuss that when someone actually has time to implement such a thing.

-Chris