Unaligned vector memory access for ARM/NEON.

Hello all,

I am a first time writer here, but am a happy LLVM tinkerer. It is a pleasure to use :).
We have come across some sub-optimal behavior when LLVM lowers loads for vectors with small integers, i.e. load <4 x i16>* %a, align 2,
using a sequence of scalar loads rather than a single vld1 on armv7 linux with NEON.
Looking at the code in svn, it appears the ARM backend is capable of lowering these loads as desired, and will if we use an appropriate darwin triple.
It appears this was actually enabled relatively recently.
Seemingly, the case where the Subtarget has NEON available should be handled the same on Darwin and Linux.
Is this true, or am I missing something?
Do the regulars have an opinion on the best way to handle this?
Thanks!

Pete

VLD1 expects a 64-bit aligned address unless the target explicitly days that unaligned loads are OK.

For your situation, either the subtarget should set AllowsUnalignedMem to true (if that's accurate), or the load address should be made 64-bit aligned.

-Jim

Hello Jim,

Thank you for the response. I may be confused about the alignment rules here.
I had been looking at the ARM RVCT Assembler Guide, which seems to indicate vld1.16 operates on 16-bit aligned data, unless I am misinterpreting their table
(Table 5-11 in ARM DUI 0204H, pg 5-70,5-71).
Prior to the table, It does mention the accesses need to be "element" aligned, where I took element in this case to mean i16.

Anyhow, to make this a little more concrete:

void extend(short* a, int* b) {
   for(int i = 0; i < 8; i++)
     b[i] = (int)a[i];
}

When I compile this program with clang -O3 -ccc-host-triple armv7-none-linux-gnueabi -mfpu=neon -mllvm -vectorize, the intermediate LLVM assembly
looks OK (and it has an align 2 vector load), but the generated ARM assembly has the scalar loads.
When I compile with (4.6) gcc -std=c99 -ftree-vectorize -marm -mfpu=neon -O3, it uses vld1.16 and vst1.32 regardless of the parameter alignment. This is on armv7a.

The gcc version (and the clang version with our modified backend) runs fine, even on 2-byte aligned data. Is this not a guarantee across armv7/armv7a generally?

Pete

extend.c (92 Bytes)

Hmmm. Well, it's entirely possible that it's LLVM that's confused about the alignment requirements here. :slight_smile:

I think I see, in general, where. I twiddled the IR to give it higher alignment (16 bytes) and get:
extend: @ @extend
@ BB#0:
  vldr d16, [r0]
  vmovl.s16 q8, d16
  vstmia r1, {d16, d17}
  vldr d16, [r0, #8]
  add r0, r1, #16
  vmovl.s16 q8, d16
  vstmia r0, {d16, d17}
  bx lr

Note that we're using a plain vldr instruction here to load the d register, not a vld1 instruction. Similarly for the stores. According to the ARM ARM (DDI 0406C), you're correct about the element size alignment requirement for VLD1, but our isel isn't attempting to use that instruction, but rather VLDR, which has word alignment required, so it falls over.

Given that, it seems that the answer to your original question is that to improve codegen for this case, the proper place to look is in instruction selection for loads and stores to the VFP/NEON registers. That code can be made smarter to better use the NEON instructions. I know Jakob has done some work related to better utilization of those for other constructs.

-Jim

I don't think isel ever uses vld1.16, but I don't see anything wrong with it for 2-byte aligned vectors.

There is an issue with big-endian semantics, but I don't think we're seriously trying to support big-endian ARM?

/jakob

I've never seen anything that would indicate otherwise. I'm pretty sure big-endian would fall over pretty horribly in the object emitter if nothing else.

-Jim

Hello,

Thanks again. We did try overestimating the alignment, and saw the vldr you reference here.
It looks like a recent change (r161962?) did enable vld1 generation for this case (great!) on darwin, but not linux.
I'm not sure if the effect of lowering load <4 x i16>* align 2 to vld1.16 this was intentional in this change or not.
If so, my question is what is the preferable way to inform the Subtarget that it is allowed to use unaligned vector loads/stores when NEON is available,
but can't use unaligned accesses generally speaking?
A new field in ARMSubtarget?
Should the -arm-strict-align flag force expansion even on unaligned vector loads/stores?
We got this working by adding a field to ARMSubtarget and changing logic in ARMTargetLowering::allowsUnalignedMemoryAccesses, but
I am admittedly not entirely sure of the downstream consequences of this, as we don't allow unaligned access generally.

Pete

Hi Pete,

We ran into the same issue with generating vector loads/stores for vectors
with less than word alignment. It seems we took a similar approach to
solving the problem by modifying the logic in allowsUnalignedMemoryAccesses.

As you and Jim mentioned, it looks like the vld1/vst1 instructions should
support element aligned access for any armv7 implementation (I'm looking at
Table A3-1 ARM Architecture Reference Manual - ARM DDI 0406C).

Right now I do not think we have the correct code setup in ARMSubtarget to
accurately represent this table. I would propose that we keep the existing
field for unaligned access and add a new field for element-aligned access.

The AllowsUnAlignedMem field remains as is and it could be used to represent
the SCTLR.A column in Table A3-1. The AllowsElementAlignedNEON field would
be used allow targets to generate vld1/vst1 instructions for element-aligned
accesses. By default it would be set to true for armv7 targets with NEON.

The -arm-strict-align would set both of the fields to false. This would
retain the behavior that seems to be desired from the
test/CodeGen/ARM/unaligned_load_store.ll test case.

A bit of a grey area is if we have an unaligned f64 store and
AllowsElementAlignedNEON is true. We can actually generate a vstr1.8 to
support this store directly instead of using the target-independent method
and I think it would be good to do so.

I have some code to do this that I will likely be able to upstream.

-Dave

-- Qualcomm Innovation Center, Inc. is a member of Code Aurora Forum, hosted
by The Linux Foundation

Hi Pete,

We ran into the same issue with generating vector loads/stores for vectors
with less than word alignment. It seems we took a similar approach to
solving the problem by modifying the logic in allowsUnalignedMemoryAccesses.

As you and Jim mentioned, it looks like the vld1/vst1 instructions should
support element aligned access for any armv7 implementation (I'm looking at
Table A3-1 ARM Architecture Reference Manual - ARM DDI 0406C).

Right now I do not think we have the correct code setup in ARMSubtarget to
accurately represent this table. I would propose that we keep the existing
field for unaligned access and add a new field for element-aligned access.

The AllowsUnAlignedMem field remains as is and it could be used to represent
the SCTLR.A column in Table A3-1. The AllowsElementAlignedNEON field would
be used allow targets to generate vld1/vst1 instructions for element-aligned
accesses. By default it would be set to true for armv7 targets with NEON.

That doesn't make sense to me. Element-aligned Neon load/stores are always valid. The AllowsUnalignedMem setting is supposed to model the SCTLR.A setting, which also applies to Neon load/stores with less-than-element alignment. Why would we need a new field?

The -arm-strict-align would set both of the fields to false. This would
retain the behavior that seems to be desired from the
test/CodeGen/ARM/unaligned_load_store.ll test case.

A bit of a grey area is if we have an unaligned f64 store and
AllowsElementAlignedNEON is true. We can actually generate a vstr1.8 to
support this store directly instead of using the target-independent method
and I think it would be good to do so.

Don't we already do this as of svn r161962?

Hi Pete,

We ran into the same issue with generating vector loads/stores for
vectors with less than word alignment. It seems we took a similar
approach to solving the problem by modifying the logic in

allowsUnalignedMemoryAccesses.

As you and Jim mentioned, it looks like the vld1/vst1 instructions
should support element aligned access for any armv7 implementation
(I'm looking at Table A3-1 ARM Architecture Reference Manual - ARM DDI

0406C).

Right now I do not think we have the correct code setup in
ARMSubtarget to accurately represent this table. I would propose that
we keep the existing field for unaligned access and add a new field for

element-aligned access.

The AllowsUnAlignedMem field remains as is and it could be used to
represent the SCTLR.A column in Table A3-1. The
AllowsElementAlignedNEON field would be used allow targets to generate
vld1/vst1 instructions for element-aligned accesses. By default it would

be set to true for armv7 targets with NEON.

That doesn't make sense to me. Element-aligned Neon load/stores are always
valid. The AllowsUnalignedMem setting is supposed to model the SCTLR.A
setting, which also applies to Neon load/stores with less-than-element
alignment. Why would we need a new field?

I was proposing it as a way to keep the current behavior of the
AllowsUnAlignedMem field. Originally I just guarded the use of vld1/vst1 by
testing whether the target has NEON, but then I was getting a failure in
test/CodeGen/ARM/unaligned_load_store.ll because it did not want to see a
vld1 when using the -arm-strict-align even if the target has NEON.

The difference in the flags is between allowing unaligned access for
half-word and word load/stores (e.g. ldr/str) and getting around an
unaligned access for D registers by using vld1.8/vst1.8 to load/store.

It looks to me that a target that supports NEON could support unaligned
access for D registers by using vld1.8/vst1.8 regardless of whether it
supports unaligned access for other types. Currently these two are tied
together into a single flag. The new flag was proposed as a way to make a
distinction between these.

I guess one way to keep the current behavior would be to look back through
the store instruction to find the actual vector type (everything gets
legalized to f64 or v2f64) and allow it for real vector types but disallow
it for a real f64 type. It seems that it would be good to support the
unaligned f64 access if we have NEON though.

The -arm-strict-align would set both of the fields to false. This
would retain the behavior that seems to be desired from the
test/CodeGen/ARM/unaligned_load_store.ll test case.

A bit of a grey area is if we have an unaligned f64 store and
AllowsElementAlignedNEON is true. We can actually generate a vstr1.8
to support this store directly instead of using the target-independent
method and I think it would be good to do so.

Don't we already do this as of svn r161962?

No, the vstr1 will only be generated if the target supports unaligned
access. Setting AllowsUnAlignedMem to true will then also allow unaligned
access using smaller types (e.g. word) which may not be desirable.

-Dave

I have some code to do this that I will likely be able to upstream.

-Dave

-- Qualcomm Innovation Center, Inc. is a member of Code Aurora Forum,
hosted by The Linux Foundation

From: llvmdev-bounces@cs.uiuc.edu [mailto:llvmdev-bounces@cs.uiuc.edu]
On Behalf Of Peter Couperus
Sent: Thursday, September 06, 2012 8:14 AM
To: Jim Grosbach
Cc: Jakob Olesen; llvmdev@cs.uiuc.edu (LLVMdev@cs.uiuc.edu)
Subject: Re: [LLVMdev] Unaligned vector memory access for ARM/NEON.

Hello,

Thanks again. We did try overestimating the alignment, and saw the
vldr you reference here.
It looks like a recent change (r161962?) did enable vld1 generation
for this case (great!) on darwin, but not linux.
I'm not sure if the effect of lowering load <4 x i16>* align 2 to
vld1.16 this was intentional in this change or not.
If so, my question is what is the preferable way to inform the
Subtarget that it is allowed to use unaligned vector loads/stores when
NEON is available, but can't use unaligned accesses generally speaking?
A new field in ARMSubtarget?
Should the -arm-strict-align flag force expansion even on unaligned
vector loads/stores?
We got this working by adding a field to ARMSubtarget and changing
logic in ARMTargetLowering::allowsUnalignedMemoryAccesses, but I am
admittedly not entirely sure of the downstream consequences of this,
as we don't allow unaligned access generally.

Pete

Hmmm. Well, it's entirely possible that it's LLVM that's confused
about the alignment requirements here. :slight_smile:

I think I see, in general, where. I twiddled the IR to give it higher

alignment (16 bytes) and get:

extend: @ @extend
@ BB#0:
  vldr d16, [r0]
  vmovl.s16 q8, d16
  vstmia r1, {d16, d17}
  vldr d16, [r0, #8]
  add r0, r1, #16
  vmovl.s16 q8, d16
  vstmia r0, {d16, d17}
  bx lr

Note that we're using a plain vldr instruction here to load the d

register, not a vld1 instruction. Similarly for the stores. According
to the ARM ARM (DDI 0406C), you're correct about the element size
alignment requirement for VLD1, but our isel isn't attempting to use
that instruction, but rather VLDR, which has word alignment required, so

it falls over.

Given that, it seems that the answer to your original question is
that to

improve codegen for this case, the proper place to look is in
instruction selection for loads and stores to the VFP/NEON registers.
That code can be made smarter to better use the NEON instructions. I
know Jakob has done some work related to better utilization of those for

other constructs.

From: Bob Wilson [mailto:bob.wilson@apple.com]
Sent: Thursday, September 06, 2012 3:39 PM
To: David Peixotto
Cc: 'Peter Couperus'; 'Jim Grosbach'; 'Jakob Olesen'; llvmdev@cs.uiuc.edu
Subject: Re: [LLVMdev] Unaligned vector memory access for ARM/NEON.

Hi Pete,

We ran into the same issue with generating vector loads/stores for
vectors with less than word alignment. It seems we took a similar
approach to solving the problem by modifying the logic in

allowsUnalignedMemoryAccesses.

As you and Jim mentioned, it looks like the vld1/vst1 instructions
should support element aligned access for any armv7 implementation
(I'm looking at Table A3-1 ARM Architecture Reference Manual - ARM DDI

0406C).

Right now I do not think we have the correct code setup in
ARMSubtarget to accurately represent this table. I would propose that
we keep the existing field for unaligned access and add a new field for

element-aligned access.

The AllowsUnAlignedMem field remains as is and it could be used to
represent the SCTLR.A column in Table A3-1. The
AllowsElementAlignedNEON field would be used allow targets to generate
vld1/vst1 instructions for element-aligned accesses. By default it would

be set to true for armv7 targets with NEON.

That doesn't make sense to me. Element-aligned Neon load/stores are always
valid. The AllowsUnalignedMem setting is supposed to model the SCTLR.A
setting, which also applies to Neon load/stores with less-than-element
alignment. Why would we need a new field?

I was proposing it as a way to keep the current behavior of the
AllowsUnAlignedMem field. Originally I just guarded the use of vld1/vst1 by
testing whether the target has NEON, but then I was getting a failure in
test/CodeGen/ARM/unaligned_load_store.ll because it did not want to see a
vld1 when using the -arm-strict-align even if the target has NEON.

I think we could just change the test case. The point of -arm-strict-align is just to avoid relying on hardware or OS support for unaligned accesses. Using vld1.8/vst1.8 instructions would not require either of those.

The difference in the flags is between allowing unaligned access for
half-word and word load/stores (e.g. ldr/str) and getting around an
unaligned access for D registers by using vld1.8/vst1.8 to load/store.

It looks to me that a target that supports NEON could support unaligned
access for D registers by using vld1.8/vst1.8 regardless of whether it
supports unaligned access for other types. Currently these two are tied
together into a single flag. The new flag was proposed as a way to make a
distinction between these.

If it works on all targets with Neon, regardless of the SCTLR.A setting, then I still don't see the need for a separate flag.

Note that this won't work on big-endian targets where changing the vld1/vst1 element size actually changes the behavior of the instructions.

I guess one way to keep the current behavior would be to look back through
the store instruction to find the actual vector type (everything gets
legalized to f64 or v2f64) and allow it for real vector types but disallow
it for a real f64 type. It seems that it would be good to support the
unaligned f64 access if we have NEON though.

The -arm-strict-align would set both of the fields to false. This
would retain the behavior that seems to be desired from the
test/CodeGen/ARM/unaligned_load_store.ll test case.

A bit of a grey area is if we have an unaligned f64 store and
AllowsElementAlignedNEON is true. We can actually generate a vstr1.8
to support this store directly instead of using the target-independent
method and I think it would be good to do so.

Don't we already do this as of svn r161962?

No, the vstr1 will only be generated if the target supports unaligned
access. Setting AllowsUnAlignedMem to true will then also allow unaligned
access using smaller types (e.g. word) which may not be desirable.

-Dave

If AllowsUnAlignedMem is set, we should take advantage of it; otherwise, for little-endian targets, we can fall back to your suggestion of using vld1.8/vst1.8.

From: Bob Wilson [mailto:bob.wilson@apple.com]
Sent: Friday, September 07, 2012 10:57 AM
To: David Peixotto
Cc: 'Peter Couperus'; 'Jim Grosbach'; 'Jakob Olesen'; llvmdev@cs.uiuc.edu
Subject: Re: [LLVMdev] Unaligned vector memory access for ARM/NEON.

> From: Bob Wilson [mailto:bob.wilson@apple.com]
> Sent: Thursday, September 06, 2012 3:39 PM
> To: David Peixotto
> Cc: 'Peter Couperus'; 'Jim Grosbach'; 'Jakob Olesen';
> llvmdev@cs.uiuc.edu
> Subject: Re: [LLVMdev] Unaligned vector memory access for ARM/NEON.
>
>
>
>> Hi Pete,
>>
>> We ran into the same issue with generating vector loads/stores for
>> vectors with less than word alignment. It seems we took a similar
>> approach to solving the problem by modifying the logic in
> allowsUnalignedMemoryAccesses.
>>
>> As you and Jim mentioned, it looks like the vld1/vst1 instructions
>> should support element aligned access for any armv7 implementation
>> (I'm looking at Table A3-1 ARM Architecture Reference Manual - ARM
>> DDI
> 0406C).
>>
>> Right now I do not think we have the correct code setup in
>> ARMSubtarget to accurately represent this table. I would propose that
>> we keep the existing field for unaligned access and add a new field
>> for
> element-aligned access.
>>
>> The AllowsUnAlignedMem field remains as is and it could be used to
>> represent the SCTLR.A column in Table A3-1. The
>> AllowsElementAlignedNEON field would be used allow targets to
>> generate
>> vld1/vst1 instructions for element-aligned accesses. By default it
>> would
> be set to true for armv7 targets with NEON.
>
> That doesn't make sense to me. Element-aligned Neon load/stores are
> always valid. The AllowsUnalignedMem setting is supposed to model the
> SCTLR.A setting, which also applies to Neon load/stores with
> less-than-element alignment. Why would we need a new field?
>
> I was proposing it as a way to keep the current behavior of the
> AllowsUnAlignedMem field. Originally I just guarded the use of
> vld1/vst1 by testing whether the target has NEON, but then I was
> getting a failure in test/CodeGen/ARM/unaligned_load_store.ll because
> it did not want to see a
> vld1 when using the -arm-strict-align even if the target has NEON.

I think we could just change the test case. The point of

-arm-strict-align is

just to avoid relying on hardware or OS support for unaligned accesses.
Using vld1.8/vst1.8 instructions would not require either of those.

Ok, that sounds good to me. I will update the test case as part of my patch.

>
> The difference in the flags is between allowing unaligned access for
> half-word and word load/stores (e.g. ldr/str) and getting around an
> unaligned access for D registers by using vld1.8/vst1.8 to load/store.
>
> It looks to me that a target that supports NEON could support
> unaligned access for D registers by using vld1.8/vst1.8 regardless of
> whether it supports unaligned access for other types. Currently these
> two are tied together into a single flag. The new flag was proposed as
> a way to make a distinction between these.

If it works on all targets with Neon, regardless of the SCTLR.A setting,

then I

still don't see the need for a separate flag.

I agree . It wasn't clear to from the code and test case given that the
existing flag (AllowsUnAlignedMem) is actually meant to reflect the SCTLR.A
setting. Given that info I don't see the need for another flag.

Note that this won't work on big-endian targets where changing the
vld1/vst1 element size actually changes the behavior of the instructions.

Ok, so would it be best to put an explicit test for endianess in the code?
The td files already contain restrictions some for endianess for selecting
unaligned loads/stores. Probably the safest thing would be an explicit test,
but it sounds like there is not a real effort to support big-endian arm
targets anyway.

I should be able to get the patch together and submit it for review next
week.

> I guess one way to keep the current behavior would be to look back
> through the store instruction to find the actual vector type
> (everything gets legalized to f64 or v2f64) and allow it for real
> vector types but disallow it for a real f64 type. It seems that it
> would be good to support the unaligned f64 access if we have NEON
though.
>
>>
>> The -arm-strict-align would set both of the fields to false. This
>> would retain the behavior that seems to be desired from the
>> test/CodeGen/ARM/unaligned_load_store.ll test case.
>>
>> A bit of a grey area is if we have an unaligned f64 store and
>> AllowsElementAlignedNEON is true. We can actually generate a vstr1.8
>> to support this store directly instead of using the
>> target-independent method and I think it would be good to do so.
>
> Don't we already do this as of svn r161962?
>
> No, the vstr1 will only be generated if the target supports unaligned
> access. Setting AllowsUnAlignedMem to true will then also allow
> unaligned access using smaller types (e.g. word) which may not be
desirable.
>
> -Dave

If AllowsUnAlignedMem is set, we should take advantage of it; otherwise,
for little-endian targets, we can fall back to your suggestion of using
vld1.8/vst1.8.

>
>>
>> I have some code to do this that I will likely be able to upstream.
>>
>> -Dave
>>
>> -- Qualcomm Innovation Center, Inc. is a member of Code Aurora Forum,
>> hosted by The Linux Foundation
>>
>>
>> From: llvmdev-bounces@cs.uiuc.edu
>> [mailto:llvmdev-bounces@cs.uiuc.edu]
>> On Behalf Of Peter Couperus
>> Sent: Thursday, September 06, 2012 8:14 AM
>> To: Jim Grosbach
>> Cc: Jakob Olesen; llvmdev@cs.uiuc.edu (LLVMdev@cs.uiuc.edu)
>> Subject: Re: [LLVMdev] Unaligned vector memory access for ARM/NEON.
>>
>> Hello,
>>
>> Thanks again. We did try overestimating the alignment, and saw the
>> vldr you reference here.
>> It looks like a recent change (r161962?) did enable vld1 generation
>> for this case (great!) on darwin, but not linux.
>> I'm not sure if the effect of lowering load <4 x i16>* align 2 to
>> vld1.16 this was intentional in this change or not.
>> If so, my question is what is the preferable way to inform the
>> Subtarget that it is allowed to use unaligned vector loads/stores
>> when NEON is available, but can't use unaligned accesses generally
speaking?
>> A new field in ARMSubtarget?
>> Should the -arm-strict-align flag force expansion even on unaligned
>> vector loads/stores?
>> We got this working by adding a field to ARMSubtarget and changing
>> logic in ARMTargetLowering::allowsUnalignedMemoryAccesses, but I am
>> admittedly not entirely sure of the downstream consequences of this,
>> as we don't allow unaligned access generally.
>>
>> Pete
>>
>>
>>> Hmmm. Well, it's entirely possible that it's LLVM that's confused
>>> about the alignment requirements here. :slight_smile:
>>>
>>> I think I see, in general, where. I twiddled the IR to give it
>>> higher
>> alignment (16 bytes) and get:
>>> extend: @ @extend
>>> @ BB#0:
>>> vldr d16, [r0]
>>> vmovl.s16 q8, d16
>>> vstmia r1, {d16, d17}
>>> vldr d16, [r0, #8]
>>> add r0, r1, #16
>>> vmovl.s16 q8, d16
>>> vstmia r0, {d16, d17}
>>> bx lr
>>>
>>> Note that we're using a plain vldr instruction here to load the d
>> register, not a vld1 instruction. Similarly for the stores. According
>> to the ARM ARM (DDI 0406C), you're correct about the element size
>> alignment requirement for VLD1, but our isel isn't attempting to use
>> that instruction, but rather VLDR, which has word alignment required,
>> so
> it falls over.
>>>
>>> Given that, it seems that the answer to your original question is
>>> that to
>> improve codegen for this case, the proper place to look is in
>> instruction selection for loads and stores to the VFP/NEON registers.
>> That code can be made smarter to better use the NEON instructions. I
>> know Jakob has done some work related to better utilization of those
>> for
> other constructs.
>>>
>>> -Jim
>>>
>>>
>>>> Hello Jim,
>>>>
>>>> Thank you for the response. I may be confused about the alignment
>>>> rules
>> here.
>>>> I had been looking at the ARM RVCT Assembler Guide, which seems to
>>>> indicate vld1.16 operates on 16-bit aligned data, unless I am
>> misinterpreting their table (Table 5-11 in ARM DUI 0204H, pg

5-70,5-71).

I don't know if anyone actually uses arm processors in big-endian mode, but it shouldn't be too hard to conditionalize it. If it does turn out to be difficult for some reason, we should at least have comments to indicate where the endian assumptions are being made.

AFAICR, people used to use big-endian on non-NEON ARM cores in the
7TDMI/920 days, and if they still do so, they'll buy those chips.

IMO, throwing a few comments would be enough for now. It's better to
assume little enndian now, with a test case, than to assume big-endian
is possible, without one.

Ok, I'll add an explicit test for endinaness before allowing the unaligned
loads. It's easy to do and makes it very clear it will only work on
little-endian systems.

-- Qualcomm Innovation Center, Inc. is a member of Code Aurora Forum, hosted
by The Linux Foundation

From: rengolin@gmail.com [mailto:rengolin@gmail.com] On Behalf Of
Renato Golin
Sent: Monday, September 10, 2012 12:07 AM
To: Bob Wilson
Cc: David Peixotto; Peter Couperus; Jakob Olesen; llvmdev@cs.uiuc.edu
Subject: Re: [LLVMdev] Unaligned vector memory access for ARM/NEON.

> I don't know if anyone actually uses arm processors in big-endian mode,
but it shouldn't be too hard to conditionalize it. If it does turn out to

be