Hi Pete,
We ran into the same issue with generating vector loads/stores for
vectors with less than word alignment. It seems we took a similar
approach to solving the problem by modifying the logic in
allowsUnalignedMemoryAccesses.
As you and Jim mentioned, it looks like the vld1/vst1 instructions
should support element aligned access for any armv7 implementation
(I'm looking at Table A3-1 ARM Architecture Reference Manual - ARM DDI
0406C).
Right now I do not think we have the correct code setup in
ARMSubtarget to accurately represent this table. I would propose that
we keep the existing field for unaligned access and add a new field for
element-aligned access.
The AllowsUnAlignedMem field remains as is and it could be used to
represent the SCTLR.A column in Table A3-1. The
AllowsElementAlignedNEON field would be used allow targets to generate
vld1/vst1 instructions for element-aligned accesses. By default it would
be set to true for armv7 targets with NEON.
That doesn't make sense to me. Element-aligned Neon load/stores are always
valid. The AllowsUnalignedMem setting is supposed to model the SCTLR.A
setting, which also applies to Neon load/stores with less-than-element
alignment. Why would we need a new field?
I was proposing it as a way to keep the current behavior of the
AllowsUnAlignedMem field. Originally I just guarded the use of vld1/vst1 by
testing whether the target has NEON, but then I was getting a failure in
test/CodeGen/ARM/unaligned_load_store.ll because it did not want to see a
vld1 when using the -arm-strict-align even if the target has NEON.
The difference in the flags is between allowing unaligned access for
half-word and word load/stores (e.g. ldr/str) and getting around an
unaligned access for D registers by using vld1.8/vst1.8 to load/store.
It looks to me that a target that supports NEON could support unaligned
access for D registers by using vld1.8/vst1.8 regardless of whether it
supports unaligned access for other types. Currently these two are tied
together into a single flag. The new flag was proposed as a way to make a
distinction between these.
I guess one way to keep the current behavior would be to look back through
the store instruction to find the actual vector type (everything gets
legalized to f64 or v2f64) and allow it for real vector types but disallow
it for a real f64 type. It seems that it would be good to support the
unaligned f64 access if we have NEON though.
The -arm-strict-align would set both of the fields to false. This
would retain the behavior that seems to be desired from the
test/CodeGen/ARM/unaligned_load_store.ll test case.
A bit of a grey area is if we have an unaligned f64 store and
AllowsElementAlignedNEON is true. We can actually generate a vstr1.8
to support this store directly instead of using the target-independent
method and I think it would be good to do so.
Don't we already do this as of svn r161962?
No, the vstr1 will only be generated if the target supports unaligned
access. Setting AllowsUnAlignedMem to true will then also allow unaligned
access using smaller types (e.g. word) which may not be desirable.
-Dave
I have some code to do this that I will likely be able to upstream.
-Dave
-- Qualcomm Innovation Center, Inc. is a member of Code Aurora Forum,
hosted by The Linux Foundation
From: llvmdev-bounces@cs.uiuc.edu [mailto:llvmdev-bounces@cs.uiuc.edu]
On Behalf Of Peter Couperus
Sent: Thursday, September 06, 2012 8:14 AM
To: Jim Grosbach
Cc: Jakob Olesen; llvmdev@cs.uiuc.edu (LLVMdev@cs.uiuc.edu)
Subject: Re: [LLVMdev] Unaligned vector memory access for ARM/NEON.
Hello,
Thanks again. We did try overestimating the alignment, and saw the
vldr you reference here.
It looks like a recent change (r161962?) did enable vld1 generation
for this case (great!) on darwin, but not linux.
I'm not sure if the effect of lowering load <4 x i16>* align 2 to
vld1.16 this was intentional in this change or not.
If so, my question is what is the preferable way to inform the
Subtarget that it is allowed to use unaligned vector loads/stores when
NEON is available, but can't use unaligned accesses generally speaking?
A new field in ARMSubtarget?
Should the -arm-strict-align flag force expansion even on unaligned
vector loads/stores?
We got this working by adding a field to ARMSubtarget and changing
logic in ARMTargetLowering::allowsUnalignedMemoryAccesses, but I am
admittedly not entirely sure of the downstream consequences of this,
as we don't allow unaligned access generally.
Pete
Hmmm. Well, it's entirely possible that it's LLVM that's confused
about the alignment requirements here. 
I think I see, in general, where. I twiddled the IR to give it higher
alignment (16 bytes) and get:
extend: @ @extend
@ BB#0:
vldr d16, [r0]
vmovl.s16 q8, d16
vstmia r1, {d16, d17}
vldr d16, [r0, #8]
add r0, r1, #16
vmovl.s16 q8, d16
vstmia r0, {d16, d17}
bx lr
Note that we're using a plain vldr instruction here to load the d
register, not a vld1 instruction. Similarly for the stores. According
to the ARM ARM (DDI 0406C), you're correct about the element size
alignment requirement for VLD1, but our isel isn't attempting to use
that instruction, but rather VLDR, which has word alignment required, so
it falls over.
Given that, it seems that the answer to your original question is
that to
improve codegen for this case, the proper place to look is in
instruction selection for loads and stores to the VFP/NEON registers.
That code can be made smarter to better use the NEON instructions. I
know Jakob has done some work related to better utilization of those for
other constructs.