Recent Commits by Tim Northover

Today I updated to trunk the toolchain for my work developing on Cortex-M4F. I was super excited to see three commits by Tim Northover that actually attempt to improve the machine code generation for my target, or any ARM target for that matter (as opposed to other important work on compiler correctness or architectural elegance or formatting comment white-space, I mean). Is he alone or are there others working toward such improvements?

The subject of two of his commits dealt with substituting MOVW/MOVT pairs for an LDR and a lit-pool. Isn't this what MachineConstantPool and ARMConstantIslandPass was all about? I vaguely recall a while back that it was disabled by some Darwin snob who thought no useful target benefited from it. What about enabling it again? Perhaps you've noticed in the last two months that someone's been porting it to the MIPS target, suggesting to me that it's still a good starting point. Finally, I would really like to see this optimization be promoted from -Oz to -Os. Doesn't it satisfy the criteria for -Os over -Oz?

Tim's other commit was about stack adjustment folding. So, Tim, did you see the treads with Andrea Mucignat back in October? She asked for some help so that she could provide a patch to improve machine code generation for Thumb entry/exit points. No one with knowledge about the matter responded. This commit of yours looks to me like you do have some knowledge about it. She seems to have given up (and judging by the way she was treated, I don't blame her -- sad). But would you review what she was attempting, please?

http://lists.cs.uiuc.edu/pipermail/llvmdev/2013-October/
066451.html
066461.html, 066466.html, 066470.html, 066475.html
066641.html, 066650.html

Thanks,
Gary Fuehrer

Today I updated to trunk the toolchain for my work developing on Cortex-M4F. I was super excited to see three commits by Tim Northover that actually attempt to improve the machine code generation for my target, or any ARM target for that matter (as opposed to other important work on compiler correctness or architectural elegance or formatting comment white-space, I mean). Is he alone or are there others working toward such improvements?

The subject of two of his commits dealt with substituting MOVW/MOVT pairs for an LDR and a lit-pool. Isn't this what MachineConstantPool and ARMConstantIslandPass was all about? I vaguely recall a while back that it was disabled by some Darwin snob who thought no useful target benefited from it.

You recall incorrectly.

Ooops. I apologize.

Islands are not being placed within range of a near LDR. They only appear between functions. (It seemed to me like ARMConstanIslandPass was not being used to make them.)

Does anyone know, is this expected?

Hi Gary,

The subject of two of his commits dealt with substituting MOVW/MOVT pairs
for an LDR and a lit-pool. Isn't this what MachineConstantPool and
ARMConstantIslandPass was all about?

Both are essential components to using lit-pools: the
MachineConstantPool is just LLVM's underlying machinery and
ARMConstantIslands is for fixing up out of range loads and so on so
they can actually be used.

My recent changes have been to fix Darwin CodeGen so that they're
actually useful (previously we combined movw/movt pairs referring to
the same global but not litpool ones, which meant that litpools
actually took up more room), and to enable them in "-Oz" mode.

It sounds like you're on an ELF platform, in which case (fingers
crossed) you already get the combining unless you compile with
"-fPIC". The "-Oz" change *should* just apply directly and be useful.
Please let me know if it doesn't, I'd like to get the same benefits on
ELF if at all possible.

I vaguely recall a while back that it was disabled [...] What about enabling it again?

The only matching situation I can think of there is that Jim suggested
I use a different approach for constants on 64-bit AArch64. I don't
know the performance numbers, but frankly I was glad to kill it. The
ConstantIslands pass is complicated enough that it *really* needs to
justify itself and I don't think code size is a priority on AArch64.

It's still present, supported and enabled in 32-bit ARM, though until
today only used for targets that didn't have movw/movt available (and
perhaps some odd corner cases like floating constants).

Islands are not being placed within range of a near LDR. They only appear
between functions. (It seemed to me like ARMConstanIslandPass was not
being used to make them.)

That's a very worrying bug, and shouldn't be happening at all. Do you
have a .ll test-case you can show us?

Finally, I would really like to see
this optimization be promoted from -Oz to -Os. Doesn't it satisfy the
criteria for -Os over -Oz?

Not generally, since in LLVM -Os means roughly "don't bloat code
speculatively while looking for performance". On A-class cores,
litpools are almost always slower so they don't qualify. We *could*
enable it at -Os on M-class CPUs separately, but not without benchmark
evidence (and I suspect it would have a bad effect even there).

Tim's other commit was about stack adjustment folding. So, Tim, did you see
the treads with Andrea Mucignat back in October?

It rings some bells, but I wasn't paying much attention. I think she
did get knowledgeable help; nothing in the thread jumps out as wrong.
It looks like a reasonable goal, but as Renato said, should be
considered carefully. It's a nasty area of the compiler and has
knock-on effects.

Cheers.

Tim.

Hi Tim, Gary,

I think this is an interesting proposition...

I don't like checks on CPU name/arch/class to guide low-level
optimization decisions, and adding yet-another space level would
complicate matters. But adding special flags to control fine-grained
behaviour would be possible, and even letting it on by default on
Clang if the arch is M-class.

Not without benchmark evidence, of course.

cheers,
--renato

Hi Gary,

The subject of two of his commits dealt with substituting MOVW/MOVT pairs
for an LDR and a lit-pool. Isn't this what MachineConstantPool and
ARMConstantIslandPass was all about?

Both are essential components to using lit-pools: the
MachineConstantPool is just LLVM's underlying machinery and
ARMConstantIslands is for fixing up out of range loads and so on so
they can actually be used.

My recent changes have been to fix Darwin CodeGen so that they're
actually useful (previously we combined movw/movt pairs referring to
the same global but not litpool ones, which meant that litpools
actually took up more room), and to enable them in "-Oz" mode.

It sounds like you're on an ELF platform, in which case (fingers
crossed) you already get the combining unless you compile with
"-fPIC". The "-Oz" change *should* just apply directly and be useful.
Please let me know if it doesn't, I'd like to get the same benefits on
ELF if at all possible.

I vaguely recall a while back that it was disabled [...] What about enabling it again?

The only matching situation I can think of there is that Jim suggested
I use a different approach for constants on 64-bit AArch64. I don't
know the performance numbers, but frankly I was glad to kill it. The
ConstantIslands pass is complicated enough that it *really* needs to
justify itself and I don't think code size is a priority on AArch64.

It's still present, supported and enabled in 32-bit ARM, though until
today only used for targets that didn't have movw/movt available (and
perhaps some odd corner cases like floating constants).

It’s still enabled and used for all 32 bit targets, actually. Just not as aggressively. Consider 64 and 128 bit floating point and vector constants, for example.

Hi Gary,

>
>> Islands are not being placed within range of a near LDR. They only appear
>> between functions. (It seemed to me like ARMConstanIslandPass was not
>> being used to make them.)
>
> That's a very worrying bug, and shouldn't be happening at all. Do you
> have a .ll test-case you can show us?

I have a large number of instances in my firmware, so I'll work at producing an .ll test-case. But first I'll get smart about what you said concerning elf -- I had been using -target arm-none-eabi. I just tried arm-elf-eabi and that had almost no effect (but an 'interesting' one nonetheless). But I don't know yet if that triple (or quad) supplies the necessary elf-ness.

>> Tim's other commit was about stack adjustment folding. So, Tim, did you see
>> the treads with Andrea Mucignat back in October?
>
> It rings some bells, but I wasn't paying much attention. I think she
> did get knowledgeable help; nothing in the thread jumps out as wrong.
> It looks like a reasonable goal, but as Renato said, should be
> considered carefully. It's a nasty area of the compiler and has
> knock-on effects.

Thank you for double checking. And for the caution -- I was inclined to look into this one but now I know that it's contraindicated against noobieness.

- Gary