RFC: module flag for hosted mode

In PR30403 we’ve been discussing how to encode -ffreestanding when using LTO. This bit is currently dropped during LTO because its only representation is in the TargetLibraryInfo created by clang (http://llvm-cs.pcc.me.uk/tools/clang/lib/CodeGen/BackendUtil.cpp#258).

The proposal is to introduce a module flag that we set in any translation unit compiled in hosted (i.e. -fno-freestanding) mode. At LTO time, if the combined module has this flag (i.e. if any of the inputs have this flag), we compile in hosted mode. This means that if we combine freestanding and hosted modules, the entire resulting module will be compiled in hosted mode.

The justification for this behaviour (per Duncan) is that hosted/freestanding is a property of the linkage environment, and if the standard library is claimed to be available for any one translation unit in the linkage unit, it should be available for every other translation unit in the linkage unit.

One question that arises is how to handle old modules which were compiled in hosted mode and lack the hosted module flag. With the above scheme, LTO would run in freestanding mode if there are no contemporaneous modules. I think this is probably fine, since (1) I’d normally expect there to be at least one contemporaneous module (i.e. the main program, as opposed to old modules belonging to a prebuilt library) and (2) the loop idiom recognizer has already been run over these modules at compile time, so even if all modules are old there’s unlikely to be a huge perf regression.

Thanks,

As an alternative view: since we didn’t support -ffreestanding in LTO mode before, we should be able to just auto-upgrade these bitcode to the -fnofreestanding version.

That said, as you mentioned it is probably not the common case and I don’t expect us to hit this in practice.

+Eric and Akira (for thoughts on module flags)

In PR30403 we've been discussing how to encode -ffreestanding when using LTO. This bit is currently dropped during LTO because its only representation is in the TargetLibraryInfo created by clang (http://llvm-cs.pcc.me.uk/tools/clang/lib/CodeGen/BackendUtil.cpp#258).

The proposal is to introduce a module flag that we set in any translation unit compiled in hosted (i.e. -fno-freestanding) mode. At LTO time, if the combined module has this flag (i.e. if any of the inputs have this flag), we compile in hosted mode. This means that if we combine freestanding and hosted modules, the entire resulting module will be compiled in hosted mode.

The justification for this behaviour (per Duncan) is that hosted/freestanding is a property of the linkage environment, and if the standard library is claimed to be available for any one translation unit in the linkage unit, it should be available for every other translation unit in the linkage unit.

(Tangent, but related: I'm also wondering if this logic could/should be used to encode other TLI flags, such as -fveclib=<lib>. I.e., if one translation unit has -fveclib=Accelerate, does that imply that we can use that for the entire linkage unit? If there's a conflict, such as -fveclib=SVML in one and -fveclib=Accelerate in another, can we safely pick the first one arbitrarily?)

One question that arises is how to handle old modules which were compiled in hosted mode and lack the hosted module flag. With the above scheme, LTO would run in freestanding mode if there are no contemporaneous modules. I think this is probably fine, since (1) I'd normally expect there to be at least one contemporaneous module (i.e. the main program, as opposed to old modules belonging to a prebuilt library) and (2) the loop idiom recognizer has already been run over these modules at compile time, so even if all modules are old there's unlikely to be a huge perf regression.

As an alternative view: since we didn’t support -ffreestanding in LTO mode before, we should be able to just auto-upgrade these bitcode to the -fnofreestanding version.

That said, as you mentioned it is probably not the common case and I don’t expect us to hit this in practice.

I completely agree with Mehdi here. We could auto-upgrade to -fno-freestanding, but it's also not important.

Two thoughts on this:

  1. My personal preference is that certain command line flags that affect the final generated code get passed at LTO time. This is a fairly good example of that.
  2. I know that I’m not going to get #1, so figuring out a good way to encode additional module level flags is fine. For -ffreestanding I’m inclined to think that we should perhaps error on merge? With inlining we’re not going to be able to tell the difference. Alternately the TLI function model can also be encoded on a function level and affect cross module inlining.

-eric

(summarising IRC)

Rethinking a little, I would be inclined to agree that combined hosted and freestanding modules should not be compiled in hosted mode. Here’s one scenario where we may break: suppose I LTO-link an implementation of memset compiled with -ffreestanding with a program compiled with -fhosted. With the proposed rule, the loop idiom recognizer may transform the body of the memset function into a self-call.

So that leaves either compile in freestanding or error out. Freestanding would produce a conservatively correct result, but it may lead to unintentional pessimisations, so unless we error out we’d likely want to warn on mixed. In principle erroring out could break existing builds, but I suppose these builds are already wrong in LTO mode, so it may not matter.

In my view the higher order bit is resolved: we should not support mixed hosted/freestanding “well”. Users would be expected to compile the freestanding parts of their program in non-LTO mode. Vendors would not be able to compile their low-level runtime libraries with LTO. I would also agree with Eric that we should error out on mixed instead of trying to half-support it. But I’d like to get the opinions of others as well.

Peter

(summarising IRC)

Rethinking a little, I would be inclined to agree that combined hosted and freestanding modules should not be compiled in hosted mode. Here's one scenario where we may break: suppose I LTO-link an implementation of memset compiled with -ffreestanding with a program compiled with -fhosted. With the proposed rule, the loop idiom recognizer may transform the body of the memset function into a self-call.

Agreed, that would be weird (and wrong).

So that leaves either compile in freestanding or error out. Freestanding would produce a conservatively correct result, but it may lead to unintentional pessimisations, so unless we error out we'd likely want to warn on mixed. In principle erroring out could break existing builds, but I suppose these builds are already wrong in LTO mode, so it may not matter.

In my view the higher order bit is resolved: we should not support mixed hosted/freestanding "well". Users would be expected to compile the freestanding parts of their program in non-LTO mode. Vendors would not be able to compile their low-level runtime libraries with LTO. I would also agree with Eric that we should error out on mixed instead of trying to half-support it. But I'd like to get the opinions of others as well.

If we're not going to support it well (sadness), then I could be convinced of either: error, or freestanding-wins. I'd like to hear what others think too.

Peter

Two thoughts on this:

1) My personal preference is that certain command line flags that affect the final generated code get passed at LTO time. This is a fairly good example of that.
2) I know that I'm not going to get #1, so figuring out a good way to encode additional module level flags is fine. For -ffreestanding I'm inclined to think that we should perhaps error on merge? With inlining we're not going to be able to tell the difference. Alternately the TLI function model can also be encoded on a function level and affect cross module inlining.

Doing TLI on the function model seems best to me, but maybe impractical. Peter pointed out up-thread (or maybe in the PR?) that -globalopt uses TLI to find __cxa_at_exit. Maybe it doesn't really need to? Or -globalopt could get smarter somehow?

I’d rather not default to freestanding: this would lead to confusing performance issue where the origin would be hard to figure.

>
>
>>
>> (summarising IRC)
>>
>> Rethinking a little, I would be inclined to agree that combined hosted
and freestanding modules should not be compiled in hosted mode. Here's one
scenario where we may break: suppose I LTO-link an implementation of memset
compiled with -ffreestanding with a program compiled with -fhosted. With
the proposed rule, the loop idiom recognizer may transform the body of the
memset function into a self-call.
>
> Agreed, that would be weird (and wrong).
>
>> So that leaves either compile in freestanding or error out.
Freestanding would produce a conservatively correct result, but it may lead
to unintentional pessimisations, so unless we error out we'd likely want to
warn on mixed. In principle erroring out could break existing builds, but I
suppose these builds are already wrong in LTO mode, so it may not matter.
>>
>> In my view the higher order bit is resolved: we should not support
mixed hosted/freestanding "well". Users would be expected to compile the
freestanding parts of their program in non-LTO mode. Vendors would not be
able to compile their low-level runtime libraries with LTO. I would also
agree with Eric that we should error out on mixed instead of trying to
half-support it. But I'd like to get the opinions of others as well.
>
> If we're not going to support it well (sadness), then I could be
convinced of either: error, or freestanding-wins. I'd like to hear what
others think too.

I’d rather not default to freestanding: this would lead to confusing
performance issue where the origin would be hard to figure.

Right, that's why I proposed that if we do default to freestanding we
should warn on mixed.


Mehdi

>
>> Peter
>>
>>> Two thoughts on this:
>>>
>>> 1) My personal preference is that certain command line flags that
affect the final generated code get passed at LTO time. This is a fairly
good example of that.
>>> 2) I know that I'm not going to get #1, so figuring out a good way to
encode additional module level flags is fine. For -ffreestanding I'm
inclined to think that we should perhaps error on merge? With inlining
we're not going to be able to tell the difference. Alternately the TLI
function model can also be encoded on a function level and affect cross
module inlining.
>
> Doing TLI on the function model seems best to me, but maybe
impractical. Peter pointed out up-thread (or maybe in the PR?) that
-globalopt uses TLI to find __cxa_at_exit. Maybe it doesn't really need
to? Or -globalopt could get smarter somehow?

Here's my earlier comment on function context:
https://llvm.org/bugs/show_bug.cgi?id=30403#c6

I suspect that it may be possible to encode on the function. I think I was
mistaken about __cxa_atexit -- it looks like there is function context
available in the only user, OptimizeEmptyGlobalCXXDtors.

I'll give it a shot and report back to the list if any other issues come up.

Peter