[RFC] Encoding Compile Flags into the IR

Hi,

Link-Time Optimization has a problem. We need to preserve some of the flags with which the modules were compiled so that the semantics of the resulting program are correct. For example, a module compiled with `-msoft-float' should use library calls for floating point. And that's only the tip of the proverbial iceberg.

Goals

Hi,

Link-Time Optimization has a problem. We need to preserve some of the flags with which the modules were compiled so that the semantics of the resulting program are correct. For example, a module compiled with `-msoft-float' should use library calls for floating point. And that's only the tip of the proverbial iceberg.

Goals

My goals for whichever solution we come up with are to be:

1) Flexible enough to differentiate between flags which affect a module as a whole and those which affect individual functions.
2) Quick to query for specific flags.
3) Easily extensible, preferably without changing the IR for each new flag.

Proposed Solution

My solution to this is to use a mixture of module-level flags and named metadata. It gives us the flexibility asked for in (1), they are relatively quick to query (after being read in, the module flags could be placed into an efficient data structure), and can be extended by updating the LangRef.html doc.

- Module-level flags would be used for those options which affect the whole module and which prevent two modules that do not have that flag set from being merged together. For example, `-msoft-float' changes the calling convention in the output file. Therefore, it's only useful if the whole program is compiled with it. It would be a module-level IR flag:

   \!0 = metadata \!\{ i32 1, metadata \!"\-msoft\-float", i1 true \}
   \!llvm\.module\.flags = \!\{ \!0 \}

So the objective in here is to diagnose cases where the program would
already be broken even without LTO, correct? If so I think this going
on the right direction, I am just not sure if a 1:1 mapping with
command line options is the best solution. These are basic "abi
options".

- Named metadata would be used for those options which affect code generation for the functions, but which doesn't prevent two modules from being merged together. For example, `-fomit-frame-pointer' applies to individual functions, but it doesn't prevent a module compiled with `-fno-omit-frame-pointer' from being merged with one compiled with `-fomit-frame-pointer'. We would use a named metadata flag:

   define void @foo \(\) \{ \.\.\. \}
   define double @bar\(i32 %a\) \{ \.\.\. \}

   ; A list of the functions affected by \`\-fno\-omit\-frame\-pointer' within the Module\.
   \!0 = metadata \!\{ void \(\)\* @foo, double \(i32\)\* @bar \}
   \!fno\.omit\.frame\.pointer = \!\{ \!0 \}

And so on.

This part I am not so sure about. I fixed a similar problem for unwind
tables by adding an attribute. It could be safely done with a metadata
with the oposite meaning (nouwtable). The things I am uncomfortable
with this part of the proposal are:

* Why not use metadata attached directly to the functions? They are
the closest thing to an easy to add attribute.

* The recent discussion about metadata points out that it is not
really safe to add information that only part of the compiler reasons
about. Metadata adds the nice property that it is safe to drop, but
that is it. Passes have to know about it, it should be documented in
the language ref and the verifier should check it. I am afraid that
this part of the proposal would again create a feeling that we have a
magic bullet for passing semantic info from the FE to some passes.

* As you mention, this is probably the tip of the iceberg. Maybe we
should explore it a bit more with the tools we have before declaring
them insufficient. Duncan is working on fp precision, you can probably
add a no_frame_pointer metadata to functions and from there we will
have a better idea of how things are going.

Cheers,
Rafael

Hi,

Link-Time Optimization has a problem. We need to preserve some of the flags with which the modules were compiled so that the semantics of the resulting program are correct. For example, a module compiled with `-msoft-float' should use library calls for floating point. And that's only the tip of the proverbial iceberg.

Goals

My goals for whichever solution we come up with are to be:

1) Flexible enough to differentiate between flags which affect a module as a whole and those which affect individual functions.
2) Quick to query for specific flags.
3) Easily extensible, preferably without changing the IR for each new flag.

Proposed Solution

My solution to this is to use a mixture of module-level flags and named metadata. It gives us the flexibility asked for in (1), they are relatively quick to query (after being read in, the module flags could be placed into an efficient data structure), and can be extended by updating the LangRef.html doc.

- Module-level flags would be used for those options which affect the whole module and which prevent two modules that do not have that flag set from being merged together. For example, `-msoft-float' changes the calling convention in the output file. Therefore, it's only useful if the whole program is compiled with it. It would be a module-level IR flag:

       !0 = metadata !{ i32 1, metadata !"-msoft-float", i1 true }
       !llvm.module.flags = !{ !0 }

So the objective in here is to diagnose cases where the program would
already be broken even without LTO, correct? If so I think this going
on the right direction, I am just not sure if a 1:1 mapping with
command line options is the best solution. These are basic "abi
options".

Diagnosis is only one use of this proposal. The other, more important, use is to generate the correct code.

- Named metadata would be used for those options which affect code generation for the functions, but which doesn't prevent two modules from being merged together. For example, `-fomit-frame-pointer' applies to individual functions, but it doesn't prevent a module compiled with `-fno-omit-frame-pointer' from being merged with one compiled with `-fomit-frame-pointer'. We would use a named metadata flag:

       define void @foo () { ... }
       define double @bar(i32 %a) { ... }

       ; A list of the functions affected by `-fno-omit-frame-pointer' within the Module.
       !0 = metadata !{ void ()* @foo, double (i32)* @bar }
       !fno.omit.frame.pointer = !{ !0 }

And so on.

This part I am not so sure about. I fixed a similar problem for unwind
tables by adding an attribute. It could be safely done with a metadata
with the oposite meaning (nouwtable). The things I am uncomfortable
with this part of the proposal are:

* Why not use metadata attached directly to the functions? They are
the closest thing to an easy to add attribute.

Possible, but the problem with metadata is that it should be possible to remove them from the object and not affect the semantics of the program.

* The recent discussion about metadata points out that it is not
really safe to add information that only part of the compiler reasons
about. Metadata adds the nice property that it is safe to drop, but
that is it. Passes have to know about it, it should be documented in
the language ref and the verifier should check it. I am afraid that
this part of the proposal would again create a feeling that we have a
magic bullet for passing semantic info from the FE to some passes.

I'm not sure I understand your meaning here. The point of making this named metadata is that it cannot be stripped from the module via normal methods. And it's inevitable that we will need for passes to know about the metadata and modify their behavior accordingly. That's the whole point, of course. :slight_smile: (They will, at least, be able to query the Module object for the information they care about. The Module is the one which knows about the metadata.)

* As you mention, this is probably the tip of the iceberg. Maybe we
should explore it a bit more with the tools we have before declaring
them insufficient. Duncan is working on fp precision, you can probably
add a no_frame_pointer metadata to functions and from there we will
have a better idea of how things are going.

I'd rather not start coding before we can agree on a concrete implementation.

-bw

* The recent discussion about metadata points out that it is not
really safe to add information that only part of the compiler reasons
about. Metadata adds the nice property that it is safe to drop, but
that is it. Passes have to know about it, it should be documented in
the language ref and the verifier should check it. I am afraid that
this part of the proposal would again create a feeling that we have a
magic bullet for passing semantic info from the FE to some passes.

I'm not sure I understand your meaning here. The point of making this named metadata is that it cannot be stripped from the module via normal methods. And it's inevitable that we will need for passes to know about the metadata and modify their behavior accordingly. That's the whole point, of course. :slight_smile: (They will, at least, be able to query the Module object for the information they care about. The Module is the one which knows about the metadata.)

My point is that many passes might need to know about it, not just the
pass doing the modification. Any change to the IR has that property,
so it is better that it stays a somewhat formal process, involving a
discussion of each change and documentation on the language
reference..

A simple example of a problem it would be nice to handle in LTO: A
single file in a project is compiled with -mavx and the project uses
cpuid to decide if it should use that function or not. With LTO
currently we would miss the information that functions from one file
could use AVX.

Faced with this problem and with the above scheme implemented, it is
very likely I would jump to recording -mavx in the IL. A more
conventional review of an instruction_set metadata or attribute would
be way more likely to find and document issues like: can the inliner
inline a function into one having a more restrictive instruction set?
What about the other way? What should the resulting instruction set
be? It is unlikely the answer would be to just ignore the attribute.

In summary, I like the current review and discussion process that goes
with proposed changes to the IL and I am afraid the second part of the
proposal would cause us to loose (not lose) it.

-bw

Cheers,
Rafael

Another reason to have a definite API for metadata in the IR. So
passes shouldn't have to know about this particular metadata, but
metadata in general.

Hi Bill,

While it's true that knowing compiler flags will help you with linking
problems (including optimisations), I don't think they're 1:1 with
link issues, nor I think storing all compilation options on all
modules every time is a fair price to pay for something that specific.

You have a goal to correct link-time optimisations, or as we discussed
earlier in the fp-math thread, even code generation could be broken
without the knowledge of the user's intent. That can be accomplished
now by putting "-msoft-float" as a global metadata, yes, but does that
fit a general solution for the general problem? I literally don't
know.

What you need to do, if your intent to create a long-lasting framework
- not just a quick fix for the LTO, is to analyse the biggest problems
and the information you need. If you have problems in multiple domains
(I'm guessing fp is not the only one), and could get information from
multiple sources (again, guessing compile options is not the only
one), then your solution is lacking.

I'm guessing linker scripts could have a lot to say about link-time
issues, as well as environment, ABI, chipset, ISA and so on. If you
put all compiler flags in metadata now, we'll end up putting all
options of all sources in global metadata, and well, that's far from
desirable.

I propose a more general scheme of global metadata, similar to yours
(one global for each big problem, multiple options inside for each
user intent), but generated from cherry-picked sources and put into
specific global metadata baskets (duplication could occur, if the
semantics is different). So each further step reads its own basket
(LTO reads @llvm.metadata.lto {...}) and so on. Of course LTO could
read other baskets, but it'll have to be for a precise reason, with a
precise meaning.

While merging modules (inlining included) with different metadata, you
have to have a specific well defined merge rule, with warnings and
errors in case they mismatch. We were discussing the merge semantics
for fp models earlier, that kind of analysis should happen for every
new flag you put in.

Though you have to take my proposal with a pinch of salt, because
that's remarkably similar to ARM's build attributes, and I'm not sure
that's the best idea either. There is probably a smarter way of doing
this, I just didn't think hard enough to find it... :wink:

But either way, you will need some sort of guidelines on how passes
should treat metadata with stronger guarantees than today, or your LTO
will still not see the info it needs...

It seems that the correct solution for this would be to make the softfloat and hardfloat calling conventions into... calling conventions. This would require sinking some logic for defining calling conventions down into LLVM, rather than requiring every single front end to duplicate the same logic, but reduced code duplication might just be a price worth paying for making writing front ends and optimisation passes easier...

David

There is already a function attribute for soft/hard float (in ARM is
AAPCS_VFP), which is generally (or should be) produced when the
compiler specifies hard-float.

Bill Wendling <wendling@apple.com> writes:

Link-Time Optimization has a problem. We need to preserve some of the
flags with which the modules were compiled so that the semantics of
the resulting program are correct. For example, a module compiled with
`-msoft-float' should use library calls for floating point. And that's
only the tip of the proverbial iceberg.

This is an important missing feature.

- Named metadata would be used for those options which affect code
generation for the functions, but which doesn't prevent two modules
from being merged together. For example, `-fomit-frame-pointer'
applies to individual functions, but it doesn't prevent a module
compiled with `-fno-omit-frame-pointer' from being merged with one
compiled with `-fomit-frame-pointer'. We would use a named metadata
flag:

Doesn't this violate the "no semantics" requirement of metadata? What
happens if the metadata gets dropped?

                           -Dave

Hi,

Link-Time Optimization has a problem. We need to preserve some of the flags with which the modules were compiled so that the semantics of the resulting program are correct. For example, a module compiled with `-msoft-float' should use library calls for floating point. And that's only the tip of the proverbial iceberg.

Hi Bill,

While it's true that knowing compiler flags will help you with linking
problems (including optimisations), I don't think they're 1:1 with
link issues, nor I think storing all compilation options on all
modules every time is a fair price to pay for something that specific.

You have a goal to correct link-time optimisations, or as we discussed
earlier in the fp-math thread, even code generation could be broken
without the knowledge of the user's intent. That can be accomplished
now by putting "-msoft-float" as a global metadata, yes, but does that
fit a general solution for the general problem? I literally don't
know.

I'm not familiar with the fp-math thread. Could you summarize it for me?

What you need to do, if your intent to create a long-lasting framework
- not just a quick fix for the LTO, is to analyse the biggest problems
and the information you need. If you have problems in multiple domains
(I'm guessing fp is not the only one), and could get information from
multiple sources (again, guessing compile options is not the only
one), then your solution is lacking.

Nothing about the proposal is meant to be a quick fix. The process of adding new flags that we care about would be a formal process, just not one that modifies the IR every time. (Yes, I'm fixated on that one aspect of it. I find it too heavyweight for the problem at hand.)

I'm guessing linker scripts could have a lot to say about link-time
issues, as well as environment, ABI, chipset, ISA and so on. If you
put all compiler flags in metadata now, we'll end up putting all
options of all sources in global metadata, and well, that's far from
desirable.

The information is important for correct code generation and linking. I need alternatives to putting it in metadata. :slight_smile:

I propose a more general scheme of global metadata, similar to yours
(one global for each big problem, multiple options inside for each
user intent), but generated from cherry-picked sources and put into
specific global metadata baskets (duplication could occur, if the
semantics is different). So each further step reads its own basket
(LTO reads @llvm.metadata.lto {...}) and so on. Of course LTO could
read other baskets, but it'll have to be for a precise reason, with a
precise meaning.

While merging modules (inlining included) with different metadata, you
have to have a specific well defined merge rule, with warnings and
errors in case they mismatch. We were discussing the merge semantics
for fp models earlier, that kind of analysis should happen for every
new flag you put in.

Yup! The module-level flags has these abilities. :slight_smile:

Though you have to take my proposal with a pinch of salt, because
that's remarkably similar to ARM's build attributes, and I'm not sure
that's the best idea either. There is probably a smarter way of doing
this, I just didn't think hard enough to find it... :wink:

But either way, you will need some sort of guidelines on how passes
should treat metadata with stronger guarantees than today, or your LTO
will still not see the info it needs...

Could you give an example of how yours would look like in a sample Module?

-bw

Named metadata cannot be stripped by normal methods.

-bw

Bill Wendling <wendling@apple.com> writes:

Link-Time Optimization has a problem. We need to preserve some of the
flags with which the modules were compiled so that the semantics of
the resulting program are correct. For example, a module compiled with
`-msoft-float' should use library calls for floating point. And that's
only the tip of the proverbial iceberg.

This is an important missing feature.

- Named metadata would be used for those options which affect code
generation for the functions, but which doesn't prevent two modules
from being merged together. For example, `-fomit-frame-pointer'
applies to individual functions, but it doesn't prevent a module
compiled with `-fno-omit-frame-pointer' from being merged with one
compiled with `-fomit-frame-pointer'. We would use a named metadata
flag:

Doesn't this violate the "no semantics" requirement of metadata? What
happens if the metadata gets dropped?

Named metadata cannot be stripped by normal methods.

Do we elsewhere use named metadata for any information that is required for correctness?

AFAIK, only debug info uses it.

IIRC, the ARC frontend uses it to tell the optimizer what assembly
(if any) to add after calls to objc_retainAutoreleasedReturnValue.

John.

The "Module-level Flags" is implemented using named metadata. Removing those flags would result in incorrect semantics for the Objective-C runtime. So there is a precedent.

-bw

I'm not familiar with the fp-math thread. Could you summarize it for me?

Too many issues to summarize:

http://lists.cs.uiuc.edu/pipermail/llvmdev/2012-April/048951.html

The important bit to this discussion is that metadata should be used
for correctness and we shouldn't disregard it the way we do today.

The information is important for correct code generation and linking. I need alternatives to putting it in metadata. :slight_smile:

What you need is to present the information to the code generation /
linking phases in a concise, de-duplicated and semantically correct
way.

Just dropping compilation flags on the IR will give you the
information, yes, but their meaning could be inaccurate. Compiling for
generic platforms than linking for specific CPUs could lead to
confusion, or slight changes in compilation options semantics could
make a very subtle corner case blow up (silently) in the linker.

Could you give an example of how yours would look like in a sample Module?

Something similar to yours:

!llvm.module.flags.lto = !{ !0, !1 }
!llvm.module.flags.cg = !{ !0 }

!0 = metadata !{ i32 1, metadata !"soft-float", i1 true } // forcing
soft float all the way through
!1 = metadata !{ i32 1, metadata !"VFPv2", i1 false } // The user do
not want VFPv2 at all costs

My proposal is more than just what it looks like in the IR, I don't
really care that much about that.

My points:
1. Dropping general info into metadata is not enough. You need to
make sure it'll be semantically correct for the majority of cases,
2. There will be more than one producer (compilation options, linker
options, ABIs, ISAs) and more than one consumer (lto, cg, etc), you
need to common up as much as possible, and split when the semantics
differ,
3. You need rules for merging metadata, and that's not just a
requirement of this topic, but also raised in the FP discussion,
4. Metadata must have its status raised in IR, and passes should be
aware of it and manipulate it correctly.

Points 3 and 4 go against the original design of Metadata, I know. But
it's being used by debug information and correctness (ARC, Obj-C).
We're about to take the leap towards FP correctness with metadata, I
don't think you can still disregard Metadata as quickly as the current
back-end does.

Metadata semantics is much simpler than IR semantics. Most of the
problems arise from merging code, so if there is a clear
identification of the types of metadata (debug, asm, fp) and there is
a simple module with simple, clear rules of merging, you can just
apply those rules, via the same module. So passes don't need to learn
too much, just blindly applying the general rules will get you far
enough.

Merging metadata is (mostly) a matter of dominance and equivalence.
When inlining a function into a different module, for instance, you'll
check for the debug metadata in the former for equivalent (you define
what's equivalent for each case) metadata and move the pointers. So,
for "int a", you will still point "a" to the type "int" in the new
module. In the FP case, you merge according to dominance: merging a
less strict model into a more strict can either increase the strength
of the former, or leave it be in a per-instruction model. That also
depends on the architecture, and possibly, compiler flags.

My final point is simple: if you don't think about those issues now,
before start filling IR with unchecked metadata, it'll be harder to
enforce the required merge rules later. And they will be necessary.

Could you give an example of how yours would look like in a sample Module?

Something similar to yours:

!llvm.module.flags.lto = !{ !0, !1 }
!llvm.module.flags.cg = !{ !0 }

!0 = metadata !{ i32 1, metadata !"soft-float", i1 true } // forcing
soft float all the way through
!1 = metadata !{ i32 1, metadata !"VFPv2", i1 false } // The user do
not want VFPv2 at all costs

My proposal is more than just what it looks like in the IR, I don't
really care that much about that.

My points:
1. Dropping general info into metadata is not enough. You need to
make sure it'll be semantically correct for the majority of cases,

I don't know why you think I'm suggesting otherwise or ignoring this.

2. There will be more than one producer (compilation options, linker
options, ABIs, ISAs) and more than one consumer (lto, cg, etc), you
need to common up as much as possible, and split when the semantics
differ,
3. You need rules for merging metadata, and that's not just a
requirement of this topic, but also raised in the FP discussion,
4. Metadata must have its status raised in IR, and passes should be
aware of it and manipulate it correctly.

Points 3 and 4 go against the original design of Metadata, I know. But
it's being used by debug information and correctness (ARC, Obj-C).
We're about to take the leap towards FP correctness with metadata, I
don't think you can still disregard Metadata as quickly as the current
back-end does.

Metadata semantics is much simpler than IR semantics. Most of the
problems arise from merging code, so if there is a clear
identification of the types of metadata (debug, asm, fp) and there is
a simple module with simple, clear rules of merging, you can just
apply those rules, via the same module. So passes don't need to learn
too much, just blindly applying the general rules will get you far
enough.

Merging metadata is (mostly) a matter of dominance and equivalence.
When inlining a function into a different module, for instance, you'll
check for the debug metadata in the former for equivalent (you define
what's equivalent for each case) metadata and move the pointers. So,
for "int a", you will still point "a" to the type "int" in the new
module. In the FP case, you merge according to dominance: merging a
less strict model into a more strict can either increase the strength
of the former, or leave it be in a per-instruction model. That also
depends on the architecture, and possibly, compiler flags.

My final point is simple: if you don't think about those issues now,
before start filling IR with unchecked metadata, it'll be harder to
enforce the required merge rules later. And they will be necessary.

Okay. For some reason you think I haven't thought of these issues. Please don't make that assumption. I proposed a simple framework for conveying the information that needs to be conveyed. I omitted the types of information that's to be conveyed (except for a couple of examples) because that's a very different discussion than this, and requires much more detail (as you mentioned above). I'm concerned with figuring out how to do it at this point in time. Each flag that's passed down will need to be treated in its own special manner.

-bw

Each flag that's passed down will need to be treated in its own special manner.

I think this is the perfect summary of the thread. Since each flag
needs a special treatment, each flag deserves a discussion on how it
should be represented and the decision should be documented in the
language reference and its use checked by the verifier.

-bw

Cheers,
Rafael