[RFC][ARM] -Oz implies -mthumb

Hello,

I would like to address an issue/inconsistency related to command line options and compiling for minimum code size, and wanted to check if there would be any problems or objections to my change.

The problem is that compiling for minimum code size like this:

-Oz --target=arm-arm–eabi -mcpu=cortex-xyz

does not really give minimum code size because -mthumb is not enabled. This looks like a sub-optimal user experience to me, and also, it is inconsistent with GCC’s behaviour.

In other words: for AArch32, optimisation level -Oz targets A32, but I would like to change that to T32, and so I would like to propose that -Oz implies -mthumb.

Cheers,

Sjoerd.

IMPORTANT NOTICE: The contents of this email and any attachments are confidential and may also be privileged. If you are not the intended recipient, please notify the sender immediately and do not disclose the contents to any other person, use it for any purpose, or store or copy the information in any medium. Thank you.

Hi Sjoerd,

I’ve never tried -mcpu=cortex-xyz but I know -march=armv7 defaults to Thumb

OK, I just checked, and -mcpu=cortex-{m3,m4,m7,a7,a9,a15,a53} gives Thumb at -O1, -O1, -Os on the following gcc:

arm-linux-gnueabihf-gcc (Ubuntu/Linaro 7.3.0-27ubuntu1~18.04) 7.3.0

cortex-m0 fails because it doesn’t do hard float. I don’t have an eabi compiler around.

If anything I'd be inclined to just default to Thumb always. I haven't
checked myself, but rumour has it the icache benefits make it faster
than ARM code as well as smaller in most cases. My one worry there is
with reset vectors, which I believe must be implemented in ARM in some
cases; but since GCC itself appears to be inconsistent here, hopefully
those people are already explicit about their needs.

Cheers.

Tim.

Well, yes. Thumb1 was not clear cut, but with Thumb2 there are I think only two possible things that can make Thumb very slightly slower than ARM:

  1. needing an extra IT instruction to cast predication over following instructions
  2. on some microarchitectures there might be a penalty for branching to an address that isn’t 4-byte aligned. (probably not on recent ones)

My understanding is that whether a gcc toolchain defaults to ARM or
Thumb is a configuration time decision by whomever builds the
toolchain. The linaro arm-linux-gnueabihf toolchain I have defaults to
-mthumb and that doesn't vary for -mcpu or any other command line
option. I haven't got a gcc to hand that defaults to -marm so that I
can test whether -mcpu=cortex-m3 will change that to Thumb. If I try
-marm -mcpu=cortex-m3 I get "error: target CPU does not support ARM
mode".

Can you give us a more concrete example about where GCC is inconsistent?

For clang I'm not particularly fond of -Oz implying a change of
instruction set state. I think that it would be difficult to document
properly, especially how to tell clang that I really did mean -Oz on
ARM. As well as being a bit messy to implement.

I think that most users of clang would prefer to use Thumb(2) than
ARM, although how to make change globally and give people enough
warning could be challenging and we'd need to get a consensus from the
community. I'd also not want to be the person updating all the tests
with -marm.

The reset vectors on some old architectures did need to be ARM state,
although I think they had to be written in assembly. It is possible
though that someone is using clang as the assembler driver and we'd
pass through -mthumb when they weren't expecting it.

Peter

Sure, none of the cortex-m cores support ARM mode. Try cortex-a{5,7,8,9,15,53} etc and you’ll see it works.

Yes, exactly this:

Sure, none of the cortex-m cores support ARM mode. Try cortex-a{5,7,8,9,15,53} etc and you’ll see it works.

Sorry for being a bit vague and unclear here: yes, I should have said cortex-a{5,7,8,9,15,53}.

I was just having a play with this native compiler:

gcc-5 (Ubuntu/Linaro 5.4.0-6ubuntu1~16.04.10) 5.4.0 20160609

when I noticed that -Os gives me Thumb on Cortex-A{8,9,17}, which is what I would expect, but with Clang -Oz I get A32.

I haven’t thought about the implementation yet, but I hope passing -mthumb in the driver is not that difficult when we i) target an A-core and AArch32 state, and ii) we optimise for minimum size. And if I have to update a lot of tests, then that’s what it is I am afraid, if we agree this is a sensible change that is.

Cheers,
Sjoerd.

Ahhh, typo in my previous mail:

when I noticed that -Os gives me Thumb on Cortex-A{8,9,17}

I wanted to say:

when I noticed that “GCC -Os” gives me Thumb on Cortex-A{8,9,17}

Yes. Just to clarify my response. That particular linaro toolchain
(I've got 5.3) defaults to -mthumb. It will also give you thumb code
if you compile with -O3 though. It is a toolchain default and not an
implication of -Os or -Oz.

My vote is not imply ARM/Thumb state changes with optimization
options. We've already got two ways to do it --target=thumb-none-eabi,
--target=arm-none-eabi and -mthumb/-marm I think the potential
confusion outweighs the potential benefit. I'm just one voice though.

Peter

I agree with that.

Tim.

I also agree. Something that nobody else has mentioned is that forcing Thumb mode onto code that uses inline assembly and/or intrinsics can cause compile errors.

It's theoretically possible to implement a mode which dynamically switches between ARM and Thumb mode; that would avoid compatibility issues, but it's probably not worthwhile given that Thumb2 is widely supported.

-Eli

Hi Tim and Peter,

Thanks for your comments! Yes, I now see now that with GCC, the ARM/Thumb state does not depends on optimisation levels, but it is a toolchain default (I’ve learned something today! :-)).

But I guess it doesn’t change much about my observations:

  • we have an inconsistency/discrepancy between GCC’s and Clang/LLVM’s behaviour,

  • and most likely Clang’s default behaviour, for a native toolchain users which I am now, gives surprising and perhaps undesired results. I.e., the equivalent of GCC’s “gcc -Os”, which is “clang -Oz”, doesn’t really do what I want/expect. And being a native toolchain user is quite important here, because that means I don’t expect it would be necessary to provide --target and then provide some triple to get me Thumb when I’ve already specified -Oz (but perhaps I am totally wrong here).

I agree now that changing Thumb/ARM state depending on optimisation levels might be confusing and not worth the effort, but that leaves me wondering what our options are (not a rhetorical question). I guess that is:

  1. keep it as it is (I haven’t looked at the docs yet, but perhaps document this better if necessary), or

  2. adopt GCC’s behaviour and flip the default?

Cheers,

Sjoerd.

Hi Tim and Peter,

Thanks for your comments! Yes, I now see now that with GCC, the ARM/Thumb state does not depends on optimisation levels, but it is a toolchain default (I've learned something today! :-)).

But I guess it doesn't change much about my observations:

- we have an inconsistency/discrepancy between GCC's and Clang/LLVM's behaviour,

- and most likely Clang's default behaviour, for a native toolchain users which I am now, gives surprising and perhaps undesired results. I.e., the equivalent of GCC's "gcc -Os", which is "clang -Oz", doesn't really do what I want/expect. And being a native toolchain user is quite important here, because that means I don't expect it would be necessary to provide --target and then provide some triple to get me Thumb when I've already specified -Oz (but perhaps I am totally wrong here).

I agree now that changing Thumb/ARM state depending on optimisation levels might be confusing and not worth the effort, but that leaves me wondering what our options are (not a rhetorical question). I guess that is:

1) keep it as it is (I haven't looked at the docs yet, but perhaps document this better if necessary), or

2) adopt GCC's behaviour and flip the default?

My current thought is that 2 has too many problems associated with it.
In particularly I can't think of a good solution to the inline
assembly/intrinsics problem that Eli mentioned.

For documentation it looks like there is a CPU Architecture Features
and Limitations section
(http://clang.llvm.org/docs/UsersManual.html#id79) that looks like it
is in need of an update. Perhaps that would be a good place to mention
advice on whether to use Arm or Thumb state and how to do it?
Alternatively maybe a separate LLVM documentation page on how to
choose Arm command line options. I suspect most people will arrive
there via their search engine of choice so the location probably
doesn't matter too much.

Peter