AArch64 calling convention for small vectors?

My group is working on a lifter from AArch64 to LLVM IR as part of a testing loop, and we’re now adding vector support, and we’re confused about what we’re seeing and could use a bit of advice about where to find the reasoning behind the calling convention for vectors that don’t exceed 64 bits.

A few examples from llc 17:

  • a <1 x i1> is passed in w0
  • a <2 x i1> is passed in v0
  • a <3 x i1> is passed in w0,w1,w2
  • a <4 x i1> is passed in v0
  • a <5 x i1> is passed in w0,w1,w2,w3,w4

I’ve looked at AArch64CallingConvention.td and also at VFABI and so far am none the wiser. But I’m not good with tablegen. Here’s the full collection of vectors <=64 bits. Or if codegen for these weird vectors isn’t a priority and we should just support powers of two, of course that’s easier and we can just do that.


They are illegal types for AARCH64. There are no 1 or 2 bit registers. You can try:

bin/llc --march=aarch64  -O0 --global-isel -stop-after=legalizer foo.ll -o -

Hi Thorsten, thanks!
But I’m not having good luck with this command, for example this IR file makes llc 17.0.3 hang apparently infinitely:

Johns-MacBook-Pro:tmp regehr$ cat test_2_3.aarch64.ll 
define <2 x i3> @vector_add_2_3(<2 x i3> %a, <2 x i3> %b) {
    %c = add <2 x i3> %a, %b
    ret <2 x i3> %c
Johns-MacBook-Pro:tmp regehr$ timeout 300 llc --march=aarch64  -O0 --global-isel -stop-after=legalizer -o -  test_2_3.aarch64.ll 
Johns-MacBook-Pro:tmp regehr$ echo $?
Johns-MacBook-Pro:tmp regehr$ 

Also, if these types are illegal then I’m pretty sure that we should unconditionally error out of codegen instead of producing plausible output.

You put the test_2_3.aarch64.ll in the wrong position. It must be before -o -.

It is called legalisation. The Dag and GIsel turn illegal types and operations into legal types and operations. It is based on heuristics and an ongoing effort. It is not standardised by Arm.

The GIsel Legalizer for AArch64. You find all kinds of widen and clamp and … The file changes over time.

It hangs either way. gdb says there’s a loop in the legalizer.

I’m using the llc from our github: clang+llvm-17.0.3-arm64-apple-darwin22.0.tar.xz, but I see the same behavior from 17.0.4 on Ubuntu 22.04. On an LLVM I built from upstream the other day this command, on this input file, gives me an assertion violation.

I opened an issue for further investigation:

I’d still appreciate some sort of semi-official statement about what vector types are priorities for the AArch64 backend. In the meantime we’ll focus on vectors of i8, i16, i32, i64.

Hmm… even considering vectors of i{8, 16, 32, 64} there’s still an awkward, case, <3 x i8>, where the vector arrives in a triple of general-purpose registers instead of a vector register. Maybe we’ll work on other things for a while.

Johns-MacBook-Pro:tmp regehr$ cat test_3_8.aarch64.ll 
define <3 x i8> @vector_add_3_8(<3 x i8> %a, <3 x i8> %b) {
    %c = add <3 x i8> %a, %b
    ret <3 x i8> %c
Johns-MacBook-Pro:tmp regehr$ llc test_3_8.aarch64.ll -o -
	.section	__TEXT,__text,regular,pure_instructions
	.build_version macos, 14, 0
	.globl	_vector_add_3_8                 ; -- Begin function vector_add_3_8
	.p2align	2
_vector_add_3_8:                        ; @vector_add_3_8
; %bb.0:
	fmov	s0, w3
	fmov	s1, w0
	mov.h	v0[1], w4
	mov.h	v1[1], w1
	mov.h	v0[2], w5
	mov.h	v1[2], w2
	add.4h	v0, v1, v0
	umov.h	w0, v0[0]
	umov.h	w1, v0[1]
	umov.h	w2, v0[2]
                                        ; -- End function

Targets do not typically have stable ABI for illegal vector types. You’ll basically get whatever happens to be the default legalization. For vectors with non-power-of-two number of elements, I believe scalarization is the default.

Thanks-- so we can move forward with power-of-2-sized vectors of {i8, i16, i32, i64}? That’s the maximal set with a stable ABI?

For AArch64 specifically, I think the Arm Procedure Call Standard only defines rules for passing arguments for “short vectors” which are 8 or 16 bytes, see https://github.com/ARM-software/abi-aa/blob/main/aapcs64/aapcs64.rst#short-vectors. That ABI of course defines mapping from C and C++ to AArch64; not from LLVM-IR to AArch64.

I’m not sure if only these 8 or 16 bytes-sized vectors should be considered the stable ABI, or if implementations rely on stability of other-sized (LLVM-IR) vectors.

https://github.com/ARM-software/abi-aa/blob/main/aapcs64/aapcs64.rst#appendix-support-for-advanced-simd-extensions suggests that not only 8- or 16-byte-sized vectors of with element types {i8, i16, i32, i64} are expected to have a stable ABI, but also for 8- or 16-byte-sized vectors with element types {half, float, double, bfloat}.

Thanks Kristof. I had been trying a different approach to figuring out which vector types are important to the community, which is to just look at the number of occurrences of different vector types in .ll files in our repo. Below are the types that occur at least 1000 times. Of course the fact that these occur doesn’t mean we expect a stable ABI.

77977 <4 x i32>
58328 <4 x float>
45458 <2 x i64>
41843 <8 x i16>
32164 <8 x i32>
29831 <2 x i32>
26779 <2 x double>
24366 <4 x i64>
20242 <2 x float>
18767 <8 x i64>
15527 <8 x float>
15488 <4 x i16>
15060 <8 x i1>
14881 <8 x i8>
14445 <4 x i1>
14126 <4 x double>
13712 <2 x i1>
13413 <2 x i8>
11668 <4 x i8>
11015 <2 x i16>
9770 <8 x double>
9103 <8 x half>
7448 <2 x half>
5960 <4 x half>
3775 <3 x float>
3654 <1 x i64>
2860 <3 x i32>
2774 <1 x double>
2060 <3 x i8>
1980 <3 x double>
1897 <1 x i32>
1828 <2 x s16>
1778 <1 x i1>
1733 <4 x s32>
1581 <1 x float>
1354 <2 x i4>
1292 <3 x i16>
1213 <2 x ptr>
1051 <3 x i1>
1050 <4 x ptr>
1042 <1 x i16>
1018 <3 x half>