My group is working on a lifter from AArch64 to LLVM IR as part of a testing loop, and we’re now adding vector support, and we’re confused about what we’re seeing and could use a bit of advice about where to find the reasoning behind the calling convention for vectors that don’t exceed 64 bits.

A few examples from llc 17:

a <1 x i1> is passed in w0

a <2 x i1> is passed in v0

a <3 x i1> is passed in w0,w1,w2

a <4 x i1> is passed in v0

a <5 x i1> is passed in w0,w1,w2,w3,w4

I’ve looked at AArch64CallingConvention.td and also at VFABI and so far am none the wiser. But I’m not good with tablegen. Here’s the full collection of vectors <=64 bits. Or if codegen for these weird vectors isn’t a priority and we should just support powers of two, of course that’s easier and we can just do that.

Hi Thorsten, thanks!
But I’m not having good luck with this command, for example this IR file makes llc 17.0.3 hang apparently infinitely:

Johns-MacBook-Pro:tmp regehr$ cat test_2_3.aarch64.ll
define <2 x i3> @vector_add_2_3(<2 x i3> %a, <2 x i3> %b) {
%c = add <2 x i3> %a, %b
ret <2 x i3> %c
}
Johns-MacBook-Pro:tmp regehr$ timeout 300 llc --march=aarch64 -O0 --global-isel -stop-after=legalizer -o - test_2_3.aarch64.ll
Johns-MacBook-Pro:tmp regehr$ echo $?
124
Johns-MacBook-Pro:tmp regehr$

Also, if these types are illegal then I’m pretty sure that we should unconditionally error out of codegen instead of producing plausible output.
Thanks,
John

You put the test_2_3.aarch64.ll in the wrong position. It must be before -o -.

It is called legalisation. The Dag and GIsel turn illegal types and operations into legal types and operations. It is based on heuristics and an ongoing effort. It is not standardised by Arm.

It hangs either way. gdb says there’s a loop in the legalizer.

I’m using the llc from our github: clang+llvm-17.0.3-arm64-apple-darwin22.0.tar.xz, but I see the same behavior from 17.0.4 on Ubuntu 22.04. On an LLVM I built from upstream the other day this command, on this input file, gives me an assertion violation.

Thanks.
I’d still appreciate some sort of semi-official statement about what vector types are priorities for the AArch64 backend. In the meantime we’ll focus on vectors of i8, i16, i32, i64.

Hmm… even considering vectors of i{8, 16, 32, 64} there’s still an awkward, case, <3 x i8>, where the vector arrives in a triple of general-purpose registers instead of a vector register. Maybe we’ll work on other things for a while.

Johns-MacBook-Pro:tmp regehr$ cat test_3_8.aarch64.ll
define <3 x i8> @vector_add_3_8(<3 x i8> %a, <3 x i8> %b) {
%c = add <3 x i8> %a, %b
ret <3 x i8> %c
}
Johns-MacBook-Pro:tmp regehr$ llc test_3_8.aarch64.ll -o -
.section __TEXT,__text,regular,pure_instructions
.build_version macos, 14, 0
.globl _vector_add_3_8 ; -- Begin function vector_add_3_8
.p2align 2
_vector_add_3_8: ; @vector_add_3_8
.cfi_startproc
; %bb.0:
fmov s0, w3
fmov s1, w0
mov.h v0[1], w4
mov.h v1[1], w1
mov.h v0[2], w5
mov.h v1[2], w2
add.4h v0, v1, v0
umov.h w0, v0[0]
umov.h w1, v0[1]
umov.h w2, v0[2]
ret
.cfi_endproc
; -- End function
.subsections_via_symbols

Targets do not typically have stable ABI for illegal vector types. You’ll basically get whatever happens to be the default legalization. For vectors with non-power-of-two number of elements, I believe scalarization is the default.

For AArch64 specifically, I think the Arm Procedure Call Standard only defines rules for passing arguments for “short vectors” which are 8 or 16 bytes, see https://github.com/ARM-software/abi-aa/blob/main/aapcs64/aapcs64.rst#short-vectors. That ABI of course defines mapping from C and C++ to AArch64; not from LLVM-IR to AArch64.

I’m not sure if only these 8 or 16 bytes-sized vectors should be considered the stable ABI, or if implementations rely on stability of other-sized (LLVM-IR) vectors.

Thanks Kristof. I had been trying a different approach to figuring out which vector types are important to the community, which is to just look at the number of occurrences of different vector types in .ll files in our repo. Below are the types that occur at least 1000 times. Of course the fact that these occur doesn’t mean we expect a stable ABI.

77977 <4 x i32>
58328 <4 x float>
45458 <2 x i64>
41843 <8 x i16>
32164 <8 x i32>
29831 <2 x i32>
26779 <2 x double>
24366 <4 x i64>
20242 <2 x float>
18767 <8 x i64>
15527 <8 x float>
15488 <4 x i16>
15060 <8 x i1>
14881 <8 x i8>
14445 <4 x i1>
14126 <4 x double>
13712 <2 x i1>
13413 <2 x i8>
11668 <4 x i8>
11015 <2 x i16>
9770 <8 x double>
9103 <8 x half>
7448 <2 x half>
5960 <4 x half>
3775 <3 x float>
3654 <1 x i64>
2860 <3 x i32>
2774 <1 x double>
2060 <3 x i8>
1980 <3 x double>
1897 <1 x i32>
1828 <2 x s16>
1778 <1 x i1>
1733 <4 x s32>
1581 <1 x float>
1354 <2 x i4>
1292 <3 x i16>
1213 <2 x ptr>
1051 <3 x i1>
1050 <4 x ptr>
1042 <1 x i16>
1018 <3 x half>