[RFC] Module Flags Metadata

Hello,

This is a proposal for implementing "module flags". Please take a look at this and give any feedback you may have.

Thanks!
-bw

                         Module Flags Metadata

Information about the module as a whole is difficult to convey to LLVM's
subsystems. The LLVM IR isn't sufficient to transmit this information. One
should instead use the llvm.module.flags named metadata. These flags are in the
form of a key / value pair -- much like a dictionary -- making it easy for any
subsystem who cares about a flag to look it up.

The llvm.module.flags metadata contains a list of metadata triplets. Each
triplet has the following form:

  - The first element is a "behavior" flag, which specifies the behavior when
    two (or more) modules are merged together, and it encounters two (or more)
    metadata with the same ID. The supported behaviors are described below.

  - The second element is a metadata string that is a unique ID for the
    metadata. How each ID is interpreted is documented below.

  - The third element is the value of the metadata.

When two (or more) modules are merged together, the resulting llvm.module.flags
metadata is the union of the modules' llvm.module.flags metadata. The only
exception being a flag with the 'Override' behavior, which may override another
flag's value (see below).

The following behavior flags are supported:

   Value Behavior
   ----- --------
     1 Error
                  Emits an error if two values disagree.

     2 Warning
                  Emits a warning if two values disagree.

     3 Require

                  Emits an error when the specified value is not present or
                  doesn't have the specified value. It is an error for two (or
                  more) llvm.module.flags with the same ID to have the Require
                  behavior but different values. There may be multiple Require
                  flags per ID.

     4 Override
                  Uses the specified value if the two values disagree. It is an
                  error for two (or more) llvm.module.flags with the same ID to
                  have the Override behavior but different values.

An example of module flags:

  !0 = metadata !{ i32 1, metadata !"foo", i32 1 }
  !1 = metadata !{ i32 4, metadata !"bar", i32 37 }
  !2 = metadata !{ i32 2, metadata !"qux", i32 42 }
  !3 = metadata !{ i32 3, metadata !"qux",
    metadata !{
      metadata !"foo", i32 1
    }
  }
  !llvm.module.flags = !{ !0, !1, !2, !3 }

- Metadata !0 has the ID !"foo" and the value '1'. The behavior if two or more
  !"foo" flags are seen is to emit an error if their values are not equal.

- Metadata !1 has the ID !"bar" and the value '37'. The behavior if two or more
  !"bar" flags are seen is to use the value '37' if their values are not equal.

- Metadata !2 has the ID !"qux" and the value '42'. The behavior if two or more
  !"qux" flags are seen is to emit a warning if their values are not equal.

- Metadata !3 has the ID !"qux" and the value:

           metadata !{ metadata !"foo", i32 1 }

  The behavior is to emit an error if the llvm.module.flags does not contain a
  flag with the ID !"foo" that has the value '1'. If two or more !"qux" flags
  exist, then they must have the same value or an error will be issued.

Objective-C Metadata

To clarify, the "Objective-C Metadata" section is meant to act as an example implementation of the module flags metadata. (It was the impetus for me to start on this project.) In the LangRef.html document, it would be documented as a subsection of the Module Flags Metadata section. Similarly, additional uses (e.g., OpenCL) would be carefully documented in LangRef.html so that the semantics of the flags are clear.

-bw

Could you expand on likely uses other than ObjC? For example, should
float ABI (soft/softfp/hard) be conveyed via this mechanism?

deep

Hi Sandeep,

ObjC is the first place that I will use it, of course (we need it to fix LTO). Other uses will come along later. (I don't know enough about the float ABI issues to say whether they should be done via module flags or not.) The OpenCL people have a need for named metadata for their stuff. I would hope that module flags would be a good fit for that, but that remains to be seen. But in general, any information which affects the module as a whole could use this feature. ObjC metadata is just an obvious first step. :slight_smile:

-bw

Hi Bill,

For the GNU runtimes, this metadata is stored in the module structure in each compilation unit, and uses a different set of flags. Do you have any plans for this to be generic, or is it intended just for Darwin?

David

Hi David,

This should be generic as is. :slight_smile:

-bw

I guess more to the point, the implementation of module flags is generic. However, it can be specific to a submodule or runtime. So the flags I introduced here are for Mach-O. I don't know enough about the GNU runtime to say whether they can be used for that as well. If not, we can either expand the flags or create new ones.

-bw

Hi Bill,

This is a proposal for implementing "module flags". Please take a look at this and give any feedback you may have.

what does this give you that you can't get with the existing scheme of using
global variables in a special section?

Ciao, Duncan.

I have only one real comment – this violates the contract and spirit of LLVM’s metadata design. You’re specifically encoding semantics in metadata, but the principle of metadata is that a program with all metadata stripped has the same behavior as one with the metadata still in place.

I think what you’re really talking about are Module-level attributes much like we have function attributes. These have inherently significant semantics, and must be handled explicitly, not simply dropped when unknown.

Anyways, that’s my only real comment about the proposal. I think you need something other than metadata to encode this.

In the case of "image info" flags, we would need to have special code in the linker which knew to look for that special GV name, and then be able to interpret and merge two or more GVs together. It's specialized specifically to ObjC+MachO and very hacky. Placing specialized code into a generic module isn't a good idea. Also, that method isn't extensible to other potential uses (I'm suggesting they use it for some OpenCL work). And finally, because this would be defined in the LangRef document, the flags and how they're interpreted / merged would be formalized. The one LLVM submodule which cares about these flags would interpret them according to the LangRef.

-bw

I had thought of that too (and having a module-level attribute scheme), but I was surprised when I found out that named metadata wasn't "strippable" from modules. (You can't strip them via the 'opt' command.) Chris assured me that they were meant to stick around...

-bw

Hello,

This is a proposal for implementing “module flags”. Please take a look at this and give any feedback you may have.

Thanks!
-bw

I have only one real comment – this violates the contract and spirit of LLVM’s metadata design. You’re specifically encoding semantics in metadata, but the principle of metadata is that a program with all metadata stripped has the same behavior as one with the metadata still in place.

I think what you’re really talking about are Module-level attributes much like we have function attributes. These have inherently significant semantics, and must be handled explicitly, not simply dropped when unknown.

Anyways, that’s my only real comment about the proposal. I think you need something other than metadata to encode this.

I had thought of that too (and having a module-level attribute scheme), but I was surprised when I found out that named metadata wasn’t “strippable” from modules. (You can’t strip them via the ‘opt’ command.)

I’m not claiming that we have a tool today that will strip named metadata for modules, I’m just claiming that the design of metadata, as Nick explained it to me originally and as he has re-explained it to me recently, operates under the assumption that metadata doesn’t carry required semantics, it carries optional information.

Chris assured me that they were meant to stick around…

Meant to is different from can change behavior if removed. This would make module-level named metadata obey a different set of constraints from all of the other named metadata we have. Those most definitely are stripped, corrupted, inverted and made up at the whims of the optimizer in several cases under the supposition that the code always remains valid…

I’m really not opposed to something like named metadata (or named metadata itself) being persistent, and being required to be persistent. My only concern is with overloading a construct that wasn’t designed with that in mind, and currently isn’t consistently treated in that way even if it happens to work today at the module level.

> Hello,
>
> This is a proposal for implementing "module flags". Please take a look at this and give any feedback you may have.
>
> Thanks!
> -bw
>

> I have only one real comment -- this violates the contract and spirit of LLVM's metadata design. You're specifically encoding semantics in metadata, but the principle of metadata is that a program with all metadata stripped has the same behavior as one with the metadata still in place.
>
> I think what you're really talking about are Module-level attributes much like we have function attributes. These have inherently significant semantics, and must be handled explicitly, not simply dropped when unknown.
>
> Anyways, that's my only real comment about the proposal. I think you need something other than metadata to encode this.

I had thought of that too (and having a module-level attribute scheme), but I was surprised when I found out that named metadata wasn't "strippable" from modules. (You can't strip them via the 'opt' command.)

I'm not claiming that we have a tool today that will strip named metadata for modules, I'm just claiming that the design of metadata, as Nick explained it to me originally and as he has re-explained it to me recently, operates under the assumption that metadata doesn't carry required semantics, it carries optional information.

> Chris assured me that they were meant to stick around...

Meant to is different from can change behavior if removed. This would make module-level named metadata obey a different set of constraints from all of the other named metadata we have. Those most definitely are stripped, corrupted, inverted and made up at the whims of the optimizer in several cases under the supposition that the code always remains valid....

I don't know of any pass which modifies metadata unless it knows what it's doing. And as pointed out, named metadata isn't stripped via normal methods. So I don't see how this will be a problem.

I'm really not opposed to something like named metadata (or named metadata itself) being persistent, and being required to be persistent. My only concern is with overloading a construct that wasn't designed with that in mind, and currently isn't consistently treated in that way even if it happens to work today at the module level.

I understand what you're saying, and I agree to a certain extent. However, there is already a case where metadata may affect the semantics of the program. The 'fpaccuracy' metadata appears to have an affect on floating point calculations. But that's a side issue. The problem with named metadata is that it's not very well defined in the documentation. All that it says is that it's a collection of metadata nodes. So it's hard to argue with what its semantics and behavior is (or was) supposed to be.

What I'm saying is that named metadata is a method to pass this information along which doesn't require adding a new IR feature to LLVM. I say this because it would be modified only be passes which know how to modify it, it wouldn't be stripped via conventional means of stripping a program, and we would document how passes should handle this metadata.

-bw

I have only one real comment – this violates the contract and spirit of LLVM’s metadata design. You’re specifically encoding semantics in metadata, but the principle of metadata is that a program with all metadata stripped has the same behavior as one with the metadata still in place.

This is a simplified understanding of semantics. As I understand, the expected metadata design behavior is that optimizer/transformations are not responsible to preserve any relationship between a User and a MDNode. For example, if a MDNode is “using” a User then optimizer can remove the User without bothering about what happens to the MDNode. Same way, If MDNode is attached to an Instruction then optimizer can mutate, delete or replace the Instruction while completely ignoring attached MDNode.

NamedMDNode is a simple collection of metadata nodes at module level. By design NamedMDNode does not have any uses and it can not directly hold any values (use any values) other then MDNode, so there is not any reason for optimizer to worry about it.

Thanks Devang, I think this reasoning helps me have a consistent model for what is and isn’t allowed when transforming metadata.

To touch on Bill’s mail, I think one thing that would help this discussion (and others I’ve had recently) is to get some of these semantics of metadata written down in the LangRef so that we at least have a documented set of rules to follow.

Chandler Carruth wrote:

     > Hello,
     >
     > This is a proposal for implementing "module flags". Please take a
    look at this and give any feedback you may have.
     >
     > Thanks!
     > -bw
     >

     > I have only one real comment -- this violates the contract and
    spirit of LLVM's metadata design. You're specifically encoding
    semantics in metadata, but the principle of metadata is that a
    program with all metadata stripped has the same behavior as one with
    the metadata still in place.
     >
     > I think what you're really talking about are Module-level
    attributes much like we have function attributes. These have
    inherently significant semantics, and must be handled explicitly,
    not simply dropped when unknown.
     >
     > Anyways, that's my only real comment about the proposal. I think
    you need something other than metadata to encode this.

    I had thought of that too (and having a module-level attribute
    scheme), but I was surprised when I found out that named metadata
    wasn't "strippable" from modules. (You can't strip them via the
    'opt' command.)

I'm not claiming that we have a tool today that will strip named
metadata for modules, I'm just claiming that the design of metadata, as
Nick explained it to me originally and as he has re-explained it to me
recently, operates under the assumption that metadata doesn't carry
required semantics, it carries optional information.

    Chris assured me that they were meant to stick around...

Meant to is different from can change behavior if removed. This would
make module-level named metadata obey a different set of constraints
from all of the other named metadata we have. Those most definitely are
stripped, corrupted, inverted and made up at the whims of the optimizer
in several cases under the supposition that the code always remains
valid....

I'm really not opposed to something like named metadata (or named
metadata itself) being persistent, and being required to be persistent.
My only concern is with overloading a construct that wasn't designed
with that in mind, and currently isn't consistently treated in that way
even if it happens to work today at the module level.

Yeah, I can't think of any use for something that would pull out NamedMDNodes for no reason. That said, if you want this to work, please audit the module cloner at the very least (it should copy the NamedMDNodes).

But what would you do with llvm-extract? Should it keep a copy of every global metadata node that references a function? The same applies to bugpoint. What if the NamedMDNode is used in codegen, and removing it removes the crash? Simply put, I don't like this design, but my objections are weak and I lack an alternative plan.

On the other side, there is a precedent for doing this. For example, RenderScript uses metadata to carry reflection information in the .bc files; their pipeline has that nothing else will touch the .bc files from the time their SDK produces it to the time the phone consumes it, so they assume the metadata wil still be there. RS would break if NamedMDNodes were stripped out.

It seems to make sense to treat NamedMDNodes not unlike GlobalVariables in most regards, but the MDNodes they contain may change as much as any mdnode.

Nick

I have only one real comment -- this violates the contract and spirit of LLVM's metadata design. You're specifically encoding semantics in metadata, but the principle of metadata is that a program with all metadata stripped has the same behavior as one with the metadata still in place.

This is a simplified understanding of semantics. As I understand, the expected metadata design behavior is that optimizer/transformations are not responsible to preserve any _relationship_ between a User and a MDNode. For example, if a MDNode is "using" a User then optimizer can remove the User without bothering about what happens to the MDNode.

Right.

Same way, If MDNode is attached to an Instruction then optimizer can mutate, delete or replace the Instruction while completely ignoring attached MDNode.

However, this isn't necessarily true. For example, it would seem to be
within the spirit of LLVM's metadata design to describe the range of values
that a given instruction might have. However, if the optimizer mutates the
instruction (and preserves program correctness by mutating its operand
instructions to compensate), then that metadata could easily become
incorrect. Right now, there aren't any rules about what metadata can do,
or what optimizers must do to preserve it. It's sort of the
"head in the sand" level of conceptual maturity.

Dan

The number one reason behind metadata is to have a mechanism to track values while being completely transparent to optimizer. If you want a guarantee from the optimizer to preserve certain semantics about the way metadata is used (e.g. say to describe range of values) then metadata is not appropriate mechanism.

If the optimizer makes no guarantees whatsoever, then metadata is
not appropriate for anything.

For example, the metadata used by TBAA today is not safe. Imagine an
optimization pass which takes two allocas that are used in
non-overlaping regions and rewrites all uses of one to use the other,
to reduce the stack size. By LLVM IR rules alone, this would seem to
be a valid semantics-preserving transformation. But if the loads
and stores for the two allocas have different TBAA type tags, the
tags will say NoAlias for memory references that do in fact alias.

The only reason why TBAA doesn't have a problem with this today is
that LLVM doesn't happen to implement optimizations which break it
yet. But there are no guarantees.

Dan

>
>
>> or what optimizers must do to preserve it.
>
> The number one reason behind metadata is to have a mechanism to track values while being completely transparent to optimizer. If you want a guarantee from the optimizer to preserve certain semantics about the way metadata is used (e.g. say to describe range of values) then metadata is not appropriate mechanism.

If the optimizer makes no guarantees whatsoever, then metadata is
not appropriate for anything.

For example, the metadata used by TBAA today is not safe. Imagine an
optimization pass which takes two allocas that are used in
non-overlaping regions and rewrites all uses of one to use the other,
to reduce the stack size. By LLVM IR rules alone, this would seem to
be a valid semantics-preserving transformation. But if the loads
and stores for the two allocas have different TBAA type tags, the
tags will say NoAlias for memory references that do in fact alias.

The only reason why TBAA doesn't have a problem with this today is
that LLVM doesn't happen to implement optimizations which break it
yet. But there are no guarantees.

On that thought, is there any way that my autovectorization pass could
invalidate the TBAA metadata (in a harmful way) when it fuses two
memory-adjacent loads or stores? Currently, it performs this fusion by
first cloning the first instruction (which I think will pick up its
metadata), then changing the instruction's type and operands as
necessary. This fusion will only take place if the two instructions have
the same LLVM type, but currently there is no check of the associated
metadata.

-Hal