LTO, Code Generation Options, etc

From PR18808 I said a few things and that I was going to redirect to the mailing list for further discussion. So here we are, go.

  1. Whether or not to allow changing of target-cpu/target-feature/triple at link time code generation.
  • Not convinced here of the facility to do so. Could just recompile the individual bitcode files to get what you want, but there are some users that are trying to ship bitcode (as crazy as that sounds).
  1. How to pass other sorts of options to the backend for code generation
  • -ffoo options -fno-foo options. I.e. -fno-inline, etc. I think this is really pretty important from the user POV. It affects things at a more global level.
  1. The llvm developer debugging story
  • It’s useful for llvm developers to be able to more accurately debug a set of IR using bisection or being able to turn off code generation options. Should this be done at the command level (i.e. infrastructure that clang and llc etc could even share), or should it be done at an llvm IR rewriting level? Don’t know. I kind of want a rewriter, but I’m not wedded to any particular answer.

-eric

That said I was actually envisioning something like:

clang -emit-llvm foo.c -o foo.bc

clang -O3 -flto all.bc -arch x86_64h -o haswell_slice
clang -O3 -flto all.bc -arch x86_64 -o x86_64_slice

for the same set of bitcode files. But given the front end language restrictions on doing anything actually interesting there it’s not too much of a constraint.

Another usage is the (admittedly one I don’t think we want to support) halide one that I discovered this week:

clang foo.c -emit-llvm foo.bc
clang -target aarch64-linux-gnu foo.bc -O3 -o foo.aarch64
clang -target x86_64-linux-gnu foo.bc -O3 -o foo.x86_64

I’ve since convinced them to use the pnacl sort of thing for more target independent code generation at the moment. It’s a use case that could be thought about more though - especially as pnacl does the exact same sort of thing, just with a different triple for actual link time code generation, it looks more like:

clang -target le64-unknown-unknown -emit-llvm foo.c -o foo.bc
clang -target aarch64-linux-gnu foo.bc -O3 -o foo.aarch64
clang -target x86_64-linux-gnu foo.bc -O3 -o foo.x86_64

Just to add some more actual use cases in the discussion.

-eric

From PR18808 I said a few things and that I was going to redirect to the
mailing list for further discussion. So here we are, go.

1) Whether or not to allow changing of target-cpu/target-feature/triple at
link time code generation.

- Not convinced here of the facility to do so. Could just recompile the
individual bitcode files to get what you want, but there are some users
that are trying to ship bitcode (as crazy as that sounds).

2) How to pass other sorts of options to the backend for code generation

- -ffoo options -fno-foo options. I.e. -fno-inline, etc. I think this is
really pretty important from the user POV. It affects things at a more
global level.

I assume many of these still could/should be function-specific attributes,
no? (it seems like there's a logical interpretation of compiling one object
file with -fno-inline and another without it - the same as having
__attribute__((noinline)) on the functions of the first object and not on
those of the second - this may not apply to other options, but perhaps
there's a better example of options we should think about here that don't
have a per-function interpretation?)

3) The llvm developer debugging story

Was this the thing Duncan was talking about/proposing on llvmdev a few
weeks ago? Something about command line overridable options, etc? (I can
try to find the thread if that doesn't sound familiar)

Some, yes, all? Dunno. I was thinking mostly of CGSCC or Module passes in general. Ideally it’ll just be options that turn on/off things like that though.

Yep. Just widening it and giving the whole set of things its own thread.

-eric

From PR18808 I said a few things and that I was going to redirect to the mailing list for further discussion. So here we are, go.

1) Whether or not to allow changing of target-cpu/target-feature/triple at link time code generation.

- Not convinced here of the facility to do so. Could just recompile the individual bitcode files to get what you want, but there are some users that are trying to ship bitcode (as crazy as that sounds).

IMO, it's cleanest of the target-cpu/target-feature/etc. are set at
compile time. That's where users are accustomed to specifying codegen
options already, and besides: the frontend needs to know the backend in
order to conform to the ABI, set macros, emit calls to target-specific
intrinsics, etc.

I'll send a review of r233227 in a moment to that effect ;).

2) How to pass other sorts of options to the backend for code generation

- -ffoo options -fno-foo options. I.e. -fno-inline, etc. I think this is really pretty important from the user POV. It affects things at a more global level.

This is easy to solve for -fno-inline in particular: we should just
add a function attribute (`noinline`?) that the inliner should treat
as a synonym for `optnone`. Any functions that come from translation
units compiled with `-fno-inline` get ignored by the inliner; functions
from other translation units participate fully.

But in terms of setting up the LTO pass pipeline, some level of user
customization makes sense. I'm not really sure how much is useful. We
have a start at that with Peter's recent commits to add -O0/-O1/-O2
(not that anyone thought too carefully about what's happening at those
optimization levels).

3) The llvm developer debugging story

- It's useful for llvm developers to be able to more accurately debug a set of IR using bisection or being able to turn off code generation options. Should this be done at the command level (i.e. infrastructure that clang and llc etc could even share), or should it be done at an llvm IR rewriting level? Don't know. I kind of want a rewriter, but I'm not wedded to any particular answer.

I think some sort of rewriter makes sense.

Long-term I'd still like to encode whether an option is overridable in
a sane way (via a default attribute sets or something), but I haven't
had time yet to go back to my original proposal and refine it :(.

That said I was actually envisioning something like:

clang -emit-llvm foo.c -o foo.bc
...

clang -O3 -flto all.bc -arch x86_64h -o haswell_slice
clang -O3 -flto all.bc -arch x86_64 -o x86_64_slice

for the same set of bitcode files. But given the front end language restrictions on doing anything actually interesting there it's not too much of a constraint.

Many of the differences between architectures CPUs affect preprocesser
definitions, right? Link-time is too late for the frontend to emit
Haswell-specific intrinsics, for example.

That said, it would be cool if this worked.

Another usage is the (admittedly one I don't think we want to support) halide one that I discovered this week:

clang foo.c -emit-llvm foo.bc
clang -target aarch64-linux-gnu foo.bc -O3 -o foo.aarch64
clang -target x86_64-linux-gnu foo.bc -O3 -o foo.x86_64
...

Whereas this is just insane :0.

> From PR18808 I said a few things and that I was going to redirect to the
mailing list for further discussion. So here we are, go.
>
> 1) Whether or not to allow changing of target-cpu/target-feature/triple
at link time code generation.
>
> - Not convinced here of the facility to do so. Could just recompile the
individual bitcode files to get what you want, but there are some users
that are trying to ship bitcode (as crazy as that sounds).

IMO, it's cleanest of the target-cpu/target-feature/etc. are set at
compile time. That's where users are accustomed to specifying codegen
options already, and besides: the frontend needs to know the backend in
order to conform to the ABI, set macros, emit calls to target-specific
intrinsics, etc.

Cleanest yes, most familiar yes, but doesn't fit the usecase of
PNaCl/Emscripten/Renderscript/Halide/... as Eric was mentioning. These
indeed need to figure out proper ABI, macros, intrinsics, but the existence
of these is a pretty good proof that something can be done (I'm not saying
it's clean or pretty!).

> That said I was actually envisioning something like:
>
> clang -emit-llvm foo.c -o foo.bc
> ...
>
> clang -O3 -flto all.bc -arch x86_64h -o haswell_slice
> clang -O3 -flto all.bc -arch x86_64 -o x86_64_slice
>
> for the same set of bitcode files. But given the front end language
restrictions on doing anything actually interesting there it's not too much
of a constraint.

Many of the differences between architectures CPUs affect preprocesser
definitions, right? Link-time is too late for the frontend to emit
Haswell-specific intrinsics, for example.

That said, it would be cool if this worked.

Agreed.

> Another usage is the (admittedly one I don't think we want to support)
halide one that I discovered this week:
>
> clang foo.c -emit-llvm foo.bc
> clang -target aarch64-linux-gnu foo.bc -O3 -o foo.aarch64
> clang -target x86_64-linux-gnu foo.bc -O3 -o foo.x86_64
> ...

Whereas this is just insane :0.

Disagreed: different usecase from above :slight_smile:

Whoever maintains "portable" things has to figure out how this works, and I
think right now it's still the wild west (hence insane may not be too far
of a qualification), but I don't think it's an invalid usecase.

> From PR18808 I said a few things and that I was going to redirect to the mailing list for further discussion. So here we are, go.
>
> 1) Whether or not to allow changing of target-cpu/target-feature/triple at link time code generation.
>
> - Not convinced here of the facility to do so. Could just recompile the individual bitcode files to get what you want, but there are some users that are trying to ship bitcode (as crazy as that sounds).

IMO, it's cleanest of the target-cpu/target-feature/etc. are set at
compile time. That's where users are accustomed to specifying codegen
options already, and besides: the frontend needs to know the backend in
order to conform to the ABI, set macros, emit calls to target-specific
intrinsics, etc.

Cleanest yes, most familiar yes, but doesn't fit the usecase of PNaCl/Emscripten/Renderscript/Halide/... as Eric was mentioning. These indeed need to figure out proper ABI, macros, intrinsics, but the existence of these is a pretty good proof that something can be done (I'm not saying it's clean or pretty!).

But the typical clang user shouldn't suffer just because there are
interesting use cases out there. Cleanest and familiar are important.

I'd be happy enough with a command-line option to specify "don't encode
the target" to support this kind of thing. Although Eric's idea from
elsewhere in the thread seems better than adding a driver option:

    $ clang -target le64-unknown-unknown -emit-llvm foo.c -o foo.bc
    $ clang -target aarch64-linux-gnu foo.bc -O3 -o foo.aarch64
    $ clang -target x86_64-linux-gnu foo.bc -O3 -o foo.x86_64

In this scenario, I figure the Frontend would recognize `le64` as a
special architecture whose target shouldn't get encoded in the IR... or
the backend would recognize that it should be overwritten.

> That said I was actually envisioning something like:
>
> clang -emit-llvm foo.c -o foo.bc
> ...
>
> clang -O3 -flto all.bc -arch x86_64h -o haswell_slice
> clang -O3 -flto all.bc -arch x86_64 -o x86_64_slice
>
> for the same set of bitcode files. But given the front end language restrictions on doing anything actually interesting there it's not too much of a constraint.

Many of the differences between architectures CPUs affect preprocesser
definitions, right? Link-time is too late for the frontend to emit
Haswell-specific intrinsics, for example.

That said, it would be cool if this worked.

Agreed.

> Another usage is the (admittedly one I don't think we want to support) halide one that I discovered this week:
>
> clang foo.c -emit-llvm foo.bc
> clang -target aarch64-linux-gnu foo.bc -O3 -o foo.aarch64
> clang -target x86_64-linux-gnu foo.bc -O3 -o foo.x86_64
> ...

Whereas this is just insane :0.

Disagreed: different usecase from above :slight_smile:

Whoever maintains "portable" things has to figure out how this works, and I think right now it's still the wild west (hence insane may not be too far of a qualification), but I don't think it's an invalid usecase.

I didn't say invalid :).

From PR18808 I said a few things and that I was going to redirect to the mailing list for further discussion. So here we are, go.

  1. Whether or not to allow changing of target-cpu/target-feature/triple at link time code generation.
  • Not convinced here of the facility to do so. Could just recompile the individual bitcode files to get what you want, but there are some users that are trying to ship bitcode (as crazy as that sounds).

IMO, it’s cleanest of the target-cpu/target-feature/etc. are set at
compile time. That’s where users are accustomed to specifying codegen
options already, and besides: the frontend needs to know the backend in
order to conform to the ABI, set macros, emit calls to target-specific
intrinsics, etc.

I’ll send a review of r233227 in a moment to that effect ;).

  1. How to pass other sorts of options to the backend for code generation
  • -ffoo options -fno-foo options. I.e. -fno-inline, etc. I think this is really pretty important from the user POV. It affects things at a more global level.

This is easy to solve for -fno-inline in particular: we should just
add a function attribute (noinline?) that the inliner should treat
as a synonym for optnone. Any functions that come from translation
units compiled with -fno-inline get ignored by the inliner; functions
from other translation units participate fully.

This is pretty terrible as you allude to here, this is a hack for -fno-inline, but it’s also not good for “I’d like to inline at the individual translation unit compile time, but not at the LTO time.”

But in terms of setting up the LTO pass pipeline, some level of user
customization makes sense. I’m not really sure how much is useful. We
have a start at that with Peter’s recent commits to add -O0/-O1/-O2
(not that anyone thought too carefully about what’s happening at those
optimization levels).

Yeah, I commented pretty heavily on that thread if you’ll remember. It’s a hackish workaround for the moment, but will work in the short term.

  1. The llvm developer debugging story
  • It’s useful for llvm developers to be able to more accurately debug a set of IR using bisection or being able to turn off code generation options. Should this be done at the command level (i.e. infrastructure that clang and llc etc could even share), or should it be done at an llvm IR rewriting level? Don’t know. I kind of want a rewriter, but I’m not wedded to any particular answer.

I think some sort of rewriter makes sense.

Long-term I’d still like to encode whether an option is overridable in
a sane way (via a default attribute sets or something), but I haven’t
had time yet to go back to my original proposal and refine it :(.

If I had any bright ideas I’d have said something. :slight_smile:

That said I was actually envisioning something like:

clang -emit-llvm foo.c -o foo.bc

clang -O3 -flto all.bc -arch x86_64h -o haswell_slice
clang -O3 -flto all.bc -arch x86_64 -o x86_64_slice

for the same set of bitcode files. But given the front end language restrictions on doing anything actually interesting there it’s not too much of a constraint.

Many of the differences between architectures CPUs affect preprocesser
definitions, right? Link-time is too late for the frontend to emit
Haswell-specific intrinsics, for example.

nod But useful for making code generation decisions (vectorization etc).

That said, it would be cool if this worked.

Yep, which leads us to:

Another usage is the (admittedly one I don’t think we want to support) halide one that I discovered this week:

clang foo.c -emit-llvm foo.bc
clang -target aarch64-linux-gnu foo.bc -O3 -o foo.aarch64
clang -target x86_64-linux-gnu foo.bc -O3 -o foo.x86_64

Whereas this is just insane :0.

Sorta…

I’ve since convinced them to use the pnacl sort of thing for more target independent code generation at the moment. It’s a use case that could be thought about more though - especially as pnacl does the exact same sort of thing, just with a different triple for actual link time code generation, it looks more like:

clang -target le64-unknown-unknown -emit-llvm foo.c -o foo.bc
clang -target aarch64-linux-gnu foo.bc -O3 -o foo.aarch64
clang -target x86_64-linux-gnu foo.bc -O3 -o foo.x86_64

This is probably a bit more sane, i.e. a generic situation. PNaCl has been using this exact use case for quite a while now and, IIUC it’s also the basis of the new Khronos proposal. It’d be nice to support this sort of thing in some fashion (i.e. make restrictions), but I think at this point telling them their behavior isn’t allowed would be a little mean :slight_smile:

-eric

From PR18808 I said a few things and that I was going to redirect to the mailing list for further discussion. So here we are, go.

  1. Whether or not to allow changing of target-cpu/target-feature/triple at link time code generation.
  • Not convinced here of the facility to do so. Could just recompile the individual bitcode files to get what you want, but there are some users that are trying to ship bitcode (as crazy as that sounds).

IMO, it’s cleanest of the target-cpu/target-feature/etc. are set at
compile time. That’s where users are accustomed to specifying codegen
options already, and besides: the frontend needs to know the backend in
order to conform to the ABI, set macros, emit calls to target-specific
intrinsics, etc.

Cleanest yes, most familiar yes, but doesn’t fit the usecase of PNaCl/Emscripten/Renderscript/Halide/… as Eric was mentioning. These indeed need to figure out proper ABI, macros, intrinsics, but the existence of these is a pretty good proof that something can be done (I’m not saying it’s clean or pretty!).

But the typical clang user shouldn’t suffer just because there are
interesting use cases out there. Cleanest and familiar are important.

I’d be happy enough with a command-line option to specify “don’t encode
the target” to support this kind of thing. Although Eric’s idea from
elsewhere in the thread seems better than adding a driver option:

$ clang -target le64-unknown-unknown -emit-llvm foo.c -o foo.bc
$ clang -target aarch64-linux-gnu foo.bc -O3 -o foo.aarch64
$ clang -target x86_64-linux-gnu foo.bc -O3 -o foo.x86_64

In this scenario, I figure the Frontend would recognize le64 as a
special architecture whose target shouldn’t get encoded in the IR… or
the backend would recognize that it should be overwritten.

I think, in this sort of case, le64-unknown-unknown (or some such) is actually going to be a decent sort of triple for the generic case. The question is what to do at lto/codegen time as far as:

a) making them rewrite, or
b) allowing it to change from the outside as we were imagining for the llc type of use case or
c) at least programmatically, i.e. if you explicitly construct a target machine it’ll override at least the triple

a seems the friendliest to us, b seems a little weird, but piggy backs on our existing plans, c seems kinda fun and opens up some decent apis, but I’m not sure about some of the longer term consequences.

Just some food for thought as we go through this.

-eric