Big endian ARM?

Richard_Pennington1 · June 2, 2012, 3:20pm

Hi,

I've been trying to set up clang/LLVM to compile for big endian ARM and I need
a little help. The code generation works for the most part and most of my
regression tests pass, but I noticed that code like this

extern void g(void);
int *p;

int main()
{
if (*p & 0x01000000) g();
}

generates

        ldr r0, [r0]
        ldrb r0, [r0, #3]
        tst r0, #1

i.e. the test of the value is optimized to use a byte load, but the ldrb is
done assuming a little endian address space.

I've been snooping around, but can't seem to find where the conversion to a
byte operation is done. Could someone point me in the right direction?

-Rich

Richard_Pennington1 · June 3, 2012, 3:35am

I've figured out my problem. I didn't adjust the data layout description string
in ARMTargetMachine.cpp for big endian targets.

This brings up another question. clang has its own set of description strings
for varying ABIs, etc. Should those strings somehow override in the code
generators?

-Rich

Duncan_Sands · June 3, 2012, 7:47am

Hi Rich,

I've figured out my problem. I didn't adjust the data layout description string
in ARMTargetMachine.cpp for big endian targets.

This brings up another question. clang has its own set of description strings
for varying ABIs, etc. Should those strings somehow override in the code
generators?

no, they shouldn't override it. These strings exist AFAIK so that clang
doesn't have to pull in all of LLVM's codegen just to know data layout,
i.e. it gives better decoupling. What would make sense is to have LLVM
codegen check that the data layout string in the module matches the string
that codegen is going to use and error out if not.

Ciao, Duncan.

Chris_Lattner · June 3, 2012, 6:39pm

The current design is that the frontend (if it attaches a TD string) is *required* to match the code generator:
http://llvm.org/docs/LangRef.html#datalayout

It is intended to allow the mid-level optimizers to know about data layout without having the code generator linked in (e.g. "opt").

-Chris

Finkel_Hal_J · June 3, 2012, 11:08pm

>> i.e. the test of the value is optimized to use a byte load, but
>> the ldrb is done assuming a little endian address space.
>>
>> I've been snooping around, but can't seem to find where the
>> conversion to a byte operation is done. Could someone point me in
>> the right direction?
>>
>
> I've figured out my problem. I didn't adjust the data layout
> description string in ARMTargetMachine.cpp for big endian targets.
>
> This brings up another question. clang has its own set of
> description strings for varying ABIs, etc. Should those strings
> somehow override in the code generators?

The current design is that the frontend (if it attaches a TD string)
is *required* to match the code generator:
LLVM Language Reference Manual — LLVM 18.0.0git documentation

Chris,

Do we actually verify this anywhere?

-Hal

Duncan_Sands · June 4, 2012, 7:23am

The current design is that the frontend (if it attaches a TD string)
is *required* to match the code generator:
LLVM Language Reference Manual — LLVM 18.0.0git documentation

Chris,

Do we actually verify this anywhere?

No. This has been on my todo list for a long time - people regularly get bitten
by it.

Ciao, Duncan.

Richard_Pennington1 · June 5, 2012, 3:10am

As did I. It would be nice if, rather that just checking consistancy, the
compiler could override the code generator's default.

-Rich

Duncan_Sands · June 5, 2012, 8:20am

Hi Richard,

Topic		Replies	Views
Endianness emulation LLVM Dev List Archives	1	71	August 9, 2010
Changing Endian in TargetData LLVM Dev List Archives	5	76	July 9, 2012
Is the llvm ARM support big endian elf/obj output? LLVM Dev List Archives	5	104	April 23, 2013
Question: Clang vs LLVM data layout Clang Frontend	4	351	May 10, 2023
llvm-abi: A library for generating ABI-compliant LLVM IR LLVM Dev List Archives	10	147	July 1, 2015

Big endian ARM?

Related topics