Big endian ARM?

Hi,

I've been trying to set up clang/LLVM to compile for big endian ARM and I need
a little help. The code generation works for the most part and most of my
regression tests pass, but I noticed that code like this

extern void g(void);
int *p;
        
int main()
{
    if (*p & 0x01000000) g();
}

generates

        ldr r0, [r0]
        ldrb r0, [r0, #3]
        tst r0, #1

i.e. the test of the value is optimized to use a byte load, but the ldrb is
done assuming a little endian address space.

I've been snooping around, but can't seem to find where the conversion to a
byte operation is done. Could someone point me in the right direction?

-Rich

I've figured out my problem. I didn't adjust the data layout description string
in ARMTargetMachine.cpp for big endian targets.

This brings up another question. clang has its own set of description strings
for varying ABIs, etc. Should those strings somehow override in the code
generators?

-Rich

Hi Rich,

I've figured out my problem. I didn't adjust the data layout description string
in ARMTargetMachine.cpp for big endian targets.

This brings up another question. clang has its own set of description strings
for varying ABIs, etc. Should those strings somehow override in the code
generators?

no, they shouldn't override it. These strings exist AFAIK so that clang
doesn't have to pull in all of LLVM's codegen just to know data layout,
i.e. it gives better decoupling. What would make sense is to have LLVM
codegen check that the data layout string in the module matches the string
that codegen is going to use and error out if not.

Ciao, Duncan.

The current design is that the frontend (if it attaches a TD string) is *required* to match the code generator:
http://llvm.org/docs/LangRef.html#datalayout

It is intended to allow the mid-level optimizers to know about data layout without having the code generator linked in (e.g. "opt").

-Chris

>> i.e. the test of the value is optimized to use a byte load, but
>> the ldrb is done assuming a little endian address space.
>>
>> I've been snooping around, but can't seem to find where the
>> conversion to a byte operation is done. Could someone point me in
>> the right direction?
>>
>
> I've figured out my problem. I didn't adjust the data layout
> description string in ARMTargetMachine.cpp for big endian targets.
>
> This brings up another question. clang has its own set of
> description strings for varying ABIs, etc. Should those strings
> somehow override in the code generators?

The current design is that the frontend (if it attaches a TD string)
is *required* to match the code generator:
LLVM Language Reference Manual — LLVM 18.0.0git documentation

Chris,

Do we actually verify this anywhere?

-Hal

The current design is that the frontend (if it attaches a TD string)
is *required* to match the code generator:
LLVM Language Reference Manual — LLVM 18.0.0git documentation

Chris,

Do we actually verify this anywhere?

No. This has been on my todo list for a long time - people regularly get bitten
by it.

Ciao, Duncan.

As did I. It would be nice if, rather that just checking consistancy, the
compiler could override the code generator's default.

-Rich

Hi Richard,