[RFC] [ARM] Execute only support

Hi,

I’m planning to implement “execute only” support in the ARM code generator. This basically means that the compiler will not generate data access to the generated code sections (e.g. data and code are strictly separated into different sections). Outline:

  • Add the subtarget feature/attribute "execute-only” to the ARM code generator to enable the feature.

  • Add a clang option “-mexecute-only” that passes said attribute to LLVM.

If execute only is enabled:

  • Instead of using integer literal pools, use movw/movt to construct the literals. This means this feature is only available for sub-targets that support these instructions.

  • For floating point literals, use movw/movt/vmov instead of a literal pool.

  • Move jump tables to data sections.

This is basically a re-implementation of a feature that is found in the ARM Compiler (http://infocenter.arm.com/help/topic/com.arm.doc.dui0471l/chr1368698593511.html).

Would such a feature be accepted upstream?

Thanks,

Christof

Would such a feature be accepted upstream?

I think so. It sounds like a very useful feature with the way things
are moving these days.

The outline sounds pretty sane too. What are your plans for -fPIC,
particularly GOT accesses? I couldn't see any obviously useful
relocations for materialising GOT entries with movw/movt in the ELF
ABI.

For floating point literals, use movw/movt/vmov instead of a literal pool.

Another option is moving litpools to one of the data sections (as they
are in AArch64, for example). I'm not sure exactly where the crossover
is, but I'd be a little surprised if movw/movt/vmov was more efficient
for 128-bit constants.

Cheers.

Tim.

> Would such a feature be accepted upstream?

I would strongly support this.

For floating point literals, use movw/movt/vmov instead of a literal pool.

Another option is moving litpools to one of the data sections (as they
are in AArch64, for example). I'm not sure exactly where the crossover
is, but I'd be a little surprised if movw/movt/vmov was more efficient
for 128-bit constants.

I was wondering the same. For comparison, the Dart VM does something
similar but burns a register to point to each function's "data" section.

Another option, with a little bit of linker support, would be to add an invariant that the literal pools are in a page that will be mapped at a fixed offset from the code. ARMv8 is designed to support large immediate offsets to allow execute-only mappings, but materialising a constant of a fixed power of two is cheap on ARMv7 and loading literals from pc - 64KB (for example) would allow binaries to be mapped in alternating superpages of execute-only and read-only chunks.

David

Hi Tim.

From: Tim Northover [mailto:t.p.northover@gmail.com]
Sent: 04 December 2015 17:12

[snip]

The outline sounds pretty sane too. What are your plans for -fPIC, particularly
GOT accesses? I couldn't see any obviously useful relocations for
materialising GOT entries with movw/movt in the ELF ABI.

I had no concrete plans on adding PIC+execute-only support. But I'll take a closer look at it.

Regards,
Christof

Thanks for the pointers on floating point handling. I'll look into it.

Thanks,
Christof

From: Dr D. Chisnall [mailto:dc552@hermes.cam.ac.uk] On Behalf Of David
Sent: 05 December 2015 10:17

>
> Another option is moving litpools to one of the data sections (as they
> are in AArch64, for example). I'm not sure exactly where the crossover
> is, but I'd be a little surprised if movw/movt/vmov was more efficient
> for 128-bit constants.

Another option, with a little bit of linker support, would be to add an
invariant that the literal pools are in a page that will be mapped at a

fixed

offset from the code. ARMv8 is designed to support large immediate

offsets