I’ve prepared a preliminary patch with the intention of implementing PPC-EABI subtarget features for applications that run in a standalone embedded environment.
The most significant difference compared with the SVR4 ABI is the use of SDA (small data area). This allows full-word constants and data to be grouped into small-data sections accessed using relocated addresses; calculated relative to the non-volatile values loaded into base registers r13 and r2 by the runtime init (similar to gp_rel on MIPS). Only a single load/store instruction is needed to contain the relocated address.
The MIPS target already has a solid approach for handling small global variables in its TargetLoweringObjectFile subclass. Also, the clang driver responds to the
-G <bytes> flag so the user can define a cutoff size other than 8 (or 0 to disable small data altogether).
It seems the Hexagon and Lanai targets duplicate much of the IsGlobalInSmallSection handling from the MIPS target, but perform the same basic classification tasks:
- Pass GlobalObject to target-declared IsGlobalInSmallSection.
If it’s a declaration, skip to step 4.
- Pass GlobalObject to target-independent
- Ensure the returned SectionKind is Data, BSS or Common.
- Pass GlobalObject to target-specific IsGlobalInSmallSectionImpl,
which scrutinizes the object’s type for the specific architecture.
I believe this redundant implementation between targets can be reduced by giving SectionKind an ‘isSmallKind’ bit (OR’d with 0x80). This provides a much clearer (and cached) predicate that ISel lowering can take advantage of when a small data load/store must be generated. The existing predicates in SectionKind may be modified to use the underlying Kind (AND-ing with 0x7f), so existing ISel behaviors are mostly unchanged.
The proposed target-independent small data classification has 2 usage avenues depending on the context:
For all GlobalObjects:
- Pass GlobalObject to target-independent
TargetLoweringObjectFile::isGlobalInSmallSection. If it’s a
declaration, make a virtual call to a new method named
target-specific scrutiny) and return the result early.
- Pass to TargetLoweringObjectFile::getKindForGlobal.
If the Kind is Data, BSS or Common, make a virtual call to
TargetLoweringObjectFile::isGlobalInSmallSectionKind just before
returning the SectionKind. If true is returned, set the ‘isSmallKind’
bit in the returned SectionKind.
- Return the result of the isSmallKind() predicate from
If the GlobalObject is known to be a definition, the process is even simpler:
- Pass GlobalObject to TargetLoweringObjectFile::getKindForGlobal
(which calls isGlobalInSmallSectionKind in turn).
- Act on isSmallKind() predicate (and conveniently get the
SectionKind at the same time).
I feel that the SectionKind modification is the best route, since it’s already used to uniquely classify constant merge sections. Even though small data sections are linked in the same manner as their ‘regular’ counterparts, there must be a clear distinction when producing memory access code (and obviously selecting the target section to allocate in).
I’d like some input from PowerPC, MIPS, Hexagon and Lanai maintainers to ensure this approach accommodates their targets appropriately.
Thanks for working on this, I think you have the right approach by making this more general.
I've been looking at your posted patch and I'll post some comments there.
Oh, one thing I forgot to mention:
ReadOnly objects are also counted as small data globals on PPC (on top of BSS, Data, Common). That's what the r2 base is for (.sdata2, defined to be constant data). 32-bit immediate loads take 2 ops minimum on PPC, so even constant loading benefits from small data.
It'd be handy to add a third argument containing what kind would normally be returned:
isGlobalInSmallSectionKind(GO, TM, <nominal-kind-expr>)
If a ReadOnly global is better emitted as instruction immediates, then the target can return `false` right then and there.
On second thought, a third arg will add a burdensome caller constraint in several areas.
The GlobalObject itself has isConstant(), so that should be sufficient for rejecting ReadOnly cases (Hexagon does it already).
The idea on Hexagon is that const objects should not be placed in a writable section, and there is no such thing as read-only small-data on Hexagon (we only have one global pointer register).
At the same time, there are instructions that load large integer immediates into registers (CONST32/CONST64), and they way they do it is by loading them from small-data (where the assembler/linker would place them), so this is not something carved in silicone...
One issue we had was that small-data filled up quite quickly for large applications, so we needed some heuristics to limit the amount of stuff that ended up there.
Lanai side looks fine to me.
Just pinging this patch for review, particularly from PPC maintainers:
It's now rebased for the latest master commits, `check-all` test results match those of the upstream base.
There is also a clang driver patch, extending PPC target support for the `-G` flag:
And lld patch implementing the _SDA_BASE_ symbols and includes an end-to-end test for small data relocations:
I don't have commit access either, so I could use some help with that when ready.
From: "Jack Andersen via llvm-dev" <email@example.com>
To: "llvm-dev" <firstname.lastname@example.org>
Sent: Wednesday, November 16, 2016 10:39:53 PM
Subject: Re: [llvm-dev] [MC] Target-Independent Small Data Section Handling
Just pinging this patch for review, particularly from PPC
Hi Jack, thanks for working on this. I'll look at your patches next week.
Something I think needs to be determined is how much driver compatibility with GCC is desired. This patch shares command-line flags that were added for MIPS backend (`-G`, `-G=`, `-msmall-data-threshold=`). I feel this simplified usage is enough for most cases, but GCC offers several `-msdata*` flags to tune the ABI particulars (as well as a `-meabi` flag, which LLVM represents as the fourth TargetTriple component).
Be default, PPC-EABI targets on GCC will allocate small data entries in the small sections, but _won't_ actually use SDA21 relocation to access them out of memory. This patch assumes the opposite as long as EABI is in the Triple. Of course, this behavior can be worked around by issuing `-G 0` to disable small data relocation.
Really, it's just a question of whether a 64K small-section limit is acceptable to an EABI application by default.
Bumping this again.
Rebased against upstream master with some minor consistency improvements.