[RFC] Stop giving a default CPU to the LTO plugin?

Hello everyone, this is most likely Arm specific, but could affect
other targets where there is a somewhat complex relationship between
the triple and mcpu option.

At present when clang is used as a linker driver for the gold-plugin
and when using and an explicit -mcpu is not given to clang, then clang
will always generate a -Wl,-plugin-opt=mcpu=<default CPU> where the
default CPU is based on the triple.

I think that this is causing more problems than it is worth and I'd
like clang to not pass -Wl,-plugin-opt=mcpu=<CPU> unless the
-mcpu=<CPU> is explicitly on the clang command line.

I'd like to know if I'm missing something important, and if there
would be any support for a patch to not pass an implicit mcpu via
plugin? There may also be a way of handling an inappropriate
cpu/triple combination more gracefully.

The full story of a problem with the Arm target is available at
36542 – LTO ARM thumb relocation overflow in R_ARM_THM_JUMP11 . A short description of
the problem is:
- Source files are compiled with -mthumb
--target=arm-linux-androideabi -mcpu=cortex-a7,
this records a triple of thumbv7a-linux-androideabi in the bitcode file.
- The command line for the link step only passes --target=
arm-linux-androideabi.
- The default cpu passed to LTO for a triple arm-linux-androideabi is
the very old arm7tdmi.
- When the ARMAsmBackend is created from the combination of triple
thumbv7a-linux-androideabi and cpu arm7tdmi the resultant target
features only include Thumb1. Amongst other things this prevents MC
from widening branch instructions to the Thumb2 wide branch.
- Code generation prior to MC seems to prefer the attributes in the
bitcode file (triple thumbv7a-linux-androideabi -mcpu=generic), this
selects a narrow branch instruction for a tail call with the
expectation it will be widened by MC. End result is a relocation out
of range error.

In summary the combination of -mcpu=arm7tdmi and a triple from the
bitcode file of thumbv7a-linux-androideabi does not make sense, but
when passed to LTO seems to:
- Escape all the existing warnings about nonsensical triples [*].
- On Arm targets at least; having a final link step without a cpu is
likely to be quite common so I don't think that this will be a corner
case.
- The Build Attributes of the output ELF file get the triple ARMv4t
from -mcpu=arm7tdmi which is lying to the linker as the file contains
ARMv7a instructions.

My personal preference is that we shouldn't pass an implicit CPU
derived from the triple, at least for Arm targets. If that isn't
possible I think that we should make sure we detect inappropriate
cpu/triple combinations and either warn or error. Does anyone have any
strong opinions on which way to go? I'm happy to work on a patch to
improve this.

I know that this can be worked around by explicitly passing a cpu or
more specific triple to the link line, however this is often not done
in existing builds.

Thanks in advance for any suggestions

Peter

[*] If the bitcode files don't have a .o suffix a CodeGenAction is
performed on the bitcode files giving the warning: warning: overriding
the module target triple with armv4t--linux-androideabi
[-Woverride-module]

Hello everyone, this is most likely Arm specific, but could affect
other targets where there is a somewhat complex relationship between
the triple and mcpu option.

At present when clang is used as a linker driver for the gold-plugin
and when using and an explicit -mcpu is not given to clang, then clang
will always generate a -Wl,-plugin-opt=mcpu=<default CPU> where the
default CPU is based on the triple.

I think that this is causing more problems than it is worth and I'd
like clang to not pass -Wl,-plugin-opt=mcpu=<CPU> unless the
-mcpu=<CPU> is explicitly on the clang command line.

I'd like to know if I'm missing something important, and if there
would be any support for a patch to not pass an implicit mcpu via
plugin? There may also be a way of handling an inappropriate
cpu/triple combination more gracefully.

The full story of a problem with the Arm target is available at
36542 – LTO ARM thumb relocation overflow in R_ARM_THM_JUMP11 . A short description of
the problem is:
- Source files are compiled with -mthumb
--target=arm-linux-androideabi -mcpu=cortex-a7,
this records a triple of thumbv7a-linux-androideabi in the bitcode file.
- The command line for the link step only passes --target=
arm-linux-androideabi.
- The default cpu passed to LTO for a triple arm-linux-androideabi is
the very old arm7tdmi.
- When the ARMAsmBackend is created from the combination of triple
thumbv7a-linux-androideabi and cpu arm7tdmi the resultant target
features only include Thumb1. Amongst other things this prevents MC
from widening branch instructions to the Thumb2 wide branch.
- Code generation prior to MC seems to prefer the attributes in the
bitcode file (triple thumbv7a-linux-androideabi -mcpu=generic), this
selects a narrow branch instruction for a tail call with the
expectation it will be widened by MC. End result is a relocation out
of range error.

In summary the combination of -mcpu=arm7tdmi and a triple from the
bitcode file of thumbv7a-linux-androideabi does not make sense, but
when passed to LTO seems to:
- Escape all the existing warnings about nonsensical triples [*].
- On Arm targets at least; having a final link step without a cpu is
likely to be quite common so I don't think that this will be a corner
case.
- The Build Attributes of the output ELF file get the triple ARMv4t
from -mcpu=arm7tdmi which is lying to the linker as the file contains
ARMv7a instructions.

Having ARMv7a instructions in an ARMv4t file shouldn't be a problem: a function should be allowed to override the CPU attributes to generate code for a newer target. This is generally done using the "target" function attribute. If this doesn't work correctly, we should fix it. It looks like it's currently broken; testcase:

void g();
__attribute__((target("thumb,arch=cortex-a53")))
void f() { g(); }

My personal preference is that we shouldn't pass an implicit CPU
derived from the triple, at least for Arm targets. If that isn't
possible I think that we should make sure we detect inappropriate
cpu/triple combinations and either warn or error. Does anyone have any
strong opinions on which way to go? I'm happy to work on a patch to
improve this.

I know that this can be worked around by explicitly passing a cpu or
more specific triple to the link line, however this is often not done
in existing builds.

Thanks in advance for any suggestions

It doesn't really make sense to override the target specified in the bitcode; I agree we shouldn't do that by default. But I don't think changing that actually fixes the underlying bug.

-Eli

Thanks for the example, that is very useful in working out the overall
scope of the problem, which is now wider than I thought it was. I've
put some comments inline.

Hello everyone, this is most likely Arm specific, but could affect
other targets where there is a somewhat complex relationship between
the triple and mcpu option.

At present when clang is used as a linker driver for the gold-plugin
and when using and an explicit -mcpu is not given to clang, then clang
will always generate a -Wl,-plugin-opt=mcpu=<default CPU> where the
default CPU is based on the triple.

I think that this is causing more problems than it is worth and I'd
like clang to not pass -Wl,-plugin-opt=mcpu=<CPU> unless the
-mcpu=<CPU> is explicitly on the clang command line.

I'd like to know if I'm missing something important, and if there
would be any support for a patch to not pass an implicit mcpu via
plugin? There may also be a way of handling an inappropriate
cpu/triple combination more gracefully.

The full story of a problem with the Arm target is available at
36542 – LTO ARM thumb relocation overflow in R_ARM_THM_JUMP11 . A short description of
the problem is:
- Source files are compiled with -mthumb
--target=arm-linux-androideabi -mcpu=cortex-a7,
this records a triple of thumbv7a-linux-androideabi in the bitcode file.
- The command line for the link step only passes --target=
arm-linux-androideabi.
- The default cpu passed to LTO for a triple arm-linux-androideabi is
the very old arm7tdmi.
- When the ARMAsmBackend is created from the combination of triple
thumbv7a-linux-androideabi and cpu arm7tdmi the resultant target
features only include Thumb1. Amongst other things this prevents MC
from widening branch instructions to the Thumb2 wide branch.
- Code generation prior to MC seems to prefer the attributes in the
bitcode file (triple thumbv7a-linux-androideabi -mcpu=generic), this
selects a narrow branch instruction for a tail call with the
expectation it will be widened by MC. End result is a relocation out
of range error.

In summary the combination of -mcpu=arm7tdmi and a triple from the
bitcode file of thumbv7a-linux-androideabi does not make sense, but
when passed to LTO seems to:
- Escape all the existing warnings about nonsensical triples [*].
- On Arm targets at least; having a final link step without a cpu is
likely to be quite common so I don't think that this will be a corner
case.
- The Build Attributes of the output ELF file get the triple ARMv4t
from -mcpu=arm7tdmi which is lying to the linker as the file contains
ARMv7a instructions.

Having ARMv7a instructions in an ARMv4t file shouldn't be a problem: a
function should be allowed to override the CPU attributes to generate code
for a newer target. This is generally done using the "target" function
attribute. If this doesn't work correctly, we should fix it. It looks like
it's currently broken; testcase:

void g();
__attribute__((target("thumb,arch=cortex-a53")))
void f() { g(); }

Hmmm, allowing that makes life much more complicated. For example I
can also write:
void g();
__attribute__((target("thumb,arch=cortex-m0")))
void f() { g(); }

void i();
__attribute__((target("arm,arch=cortex-a53")))
void h() { i(); }

With -mcpu=cortex-m0 and get ARM code within an object claiming to be
Thumb only with no errors or warnings, with no chance of a linker
detecting a mismatch either.

I think that part of this is the same problem that is observed in
PR36542 the ARMAsmBackend that is responsible for widening the tail
call to a Thumb2 branch is created with ARMv4T which doesn't support
Thumb1. There has been a recent change that threads through the
existing SubtargetInfo instead of recreating it from the triple alone.
It is worth mentioning that the object level BuildAttributes do not
include Thumbv7a which is misleading to a linker as it will be
expecting no ARMv7A in the object.

Has there already been a discussion about what per function
code-generation with BuildAttributes higher than the base object
should mean in the context of capabilities of the ARMAsmBackend and
BuildAttributes? My thoughts right now are that if ARMAsmBackend is to
operate at an object level, rather than a per-function level then it
has to use the capabilities of the highest architecture in the file.
This also means giving the object BuildAttributes of the highest
architecture in the file, and giving an error if they contradict, for
example mixing Thumb Cortex-M0 and ARM Cortex-A53. If the
ARMAsmBackend could be made to work on a per-function level then there
is a chance that we could only widen the tail call to g() in f(), but
not elsewhere. To honestly describe this in the BuildAttributes we
would need to use per Section or per Function attributes though.

My suggestion to move forward here is:
- Recreate the SubtargetInfo based on the merge of all the Targets and
CPU information that we have seen, or warn/error if they are
incompatible.
- Ouput the Tag_CPU_arch BuildAttributes based on the merge of all the
Targets and CPU information that we have seen.

It is probably worth moving any discussion of this particular part to
PR36542 since it is somewhat Arm specific. I'll add this comment to
there.

My personal preference is that we shouldn't pass an implicit CPU
derived from the triple, at least for Arm targets. If that isn't
possible I think that we should make sure we detect inappropriate
cpu/triple combinations and either warn or error. Does anyone have any
strong opinions on which way to go? I'm happy to work on a patch to
improve this.

I know that this can be worked around by explicitly passing a cpu or
more specific triple to the link line, however this is often not done
in existing builds.

Thanks in advance for any suggestions

It doesn't really make sense to override the target specified in the
bitcode; I agree we shouldn't do that by default. But I don't think
changing that actually fixes the underlying bug.

It won't fix the underlying bug, but depending on we deal with
nonsensical combinations of triple and CPU we may end up with a
warning or an error for a common use case.

Having ARMv7a instructions in an ARMv4t file shouldn't be a problem: a
function should be allowed to override the CPU attributes to generate code
for a newer target. This is generally done using the "target" function
attribute. If this doesn't work correctly, we should fix it. It looks like
it's currently broken; testcase:

void g();
__attribute__((target("thumb,arch=cortex-a53")))
void f() { g(); }

Hmmm, allowing that makes life much more complicated. For example I
can also write:
void g();
__attribute__((target("thumb,arch=cortex-m0")))
void f() { g(); }

void i();
__attribute__((target("arm,arch=cortex-a53")))
void h() { i(); }

With -mcpu=cortex-m0 and get ARM code within an object claiming to be
Thumb only with no errors or warnings, with no chance of a linker
detecting a mismatch either.

I think we can all agree that there should be no real problem with
instruction selection when adding these sorts of target attributes. As
you point out below, the problems start occurring when it gets to the
MC layer and object emission.

As the author of the well-intentioned cleanup patch that unmasked this
issue, I'd like to thank you for putting in the time to delve into
things. The patches in question were:
* rG46db78b74353
* rGb22f751fa7d1

I think that part of this is the same problem that is observed in
PR36542 the ARMAsmBackend that is responsible for widening the tail
call to a Thumb2 branch is created with ARMv4T which doesn't support
Thumb1. There has been a recent change that threads through the
existing SubtargetInfo instead of recreating it from the triple alone.
It is worth mentioning that the object level BuildAttributes do not
include Thumbv7a which is misleading to a linker as it will be
expecting no ARMv7A in the object.

Has there already been a discussion about what per function
code-generation with BuildAttributes higher than the base object
should mean in the context of capabilities of the ARMAsmBackend and
BuildAttributes? My thoughts right now are that if ARMAsmBackend is to
operate at an object level, rather than a per-function level then it
has to use the capabilities of the highest architecture in the file.
This also means giving the object BuildAttributes of the highest
architecture in the file, and giving an error if they contradict, for
example mixing Thumb Cortex-M0 and ARM Cortex-A53. If the
ARMAsmBackend could be made to work on a per-function level then there
is a chance that we could only widen the tail call to g() in f(), but
not elsewhere. To honestly describe this in the BuildAttributes we
would need to use per Section or per Function attributes though.

My suggestion to move forward here is:
- Recreate the SubtargetInfo based on the merge of all the Targets and
CPU information that we have seen, or warn/error if they are
incompatible.
- Ouput the Tag_CPU_arch BuildAttributes based on the merge of all the
Targets and CPU information that we have seen.

It is probably worth moving any discussion of this particular part to
PR36542 since it is somewhat Arm specific. I'll add this comment to
there.

I'm not so sure this is ARM specific, as other targets might well
encounter similar issues (even if there is no direct equivalent to
build attributes, there are cases where information is encoded into
ELF flags on a per-object basis).

Best,

Alex

FWIW I’ve followed up in the bug with both a high level description of how these things work (and should work) as well as what to do for things that are encoded on a per-object basis.

-eric