AArch64 calling convention for small vectors?

regehr · November 13, 2023, 5:16pm

My group is working on a lifter from AArch64 to LLVM IR as part of a testing loop, and we’re now adding vector support, and we’re confused about what we’re seeing and could use a bit of advice about where to find the reasoning behind the calling convention for vectors that don’t exceed 64 bits.

A few examples from llc 17:

a <1 x i1> is passed in w0
a <2 x i1> is passed in v0
a <3 x i1> is passed in w0,w1,w2
a <4 x i1> is passed in v0
a <5 x i1> is passed in w0,w1,w2,w3,w4

I’ve looked at AArch64CallingConvention.td and also at VFABI and so far am none the wiser. But I’m not good with tablegen. Here’s the full collection of vectors <=64 bits. Or if codegen for these weird vectors isn’t a priority and we should just support powers of two, of course that’s easier and we can just do that.

Thanks,
John

tschuett · November 13, 2023, 6:22pm

They are illegal types for AARCH64. There are no 1 or 2 bit registers. You can try:

bin/llc --march=aarch64  -O0 --global-isel -stop-after=legalizer foo.ll -o -

regehr · November 13, 2023, 7:12pm

Hi Thorsten, thanks!
But I’m not having good luck with this command, for example this IR file makes llc 17.0.3 hang apparently infinitely:

Johns-MacBook-Pro:tmp regehr$ cat test_2_3.aarch64.ll 
define <2 x i3> @vector_add_2_3(<2 x i3> %a, <2 x i3> %b) {
    %c = add <2 x i3> %a, %b
    ret <2 x i3> %c
}
Johns-MacBook-Pro:tmp regehr$ timeout 300 llc --march=aarch64  -O0 --global-isel -stop-after=legalizer -o -  test_2_3.aarch64.ll 
Johns-MacBook-Pro:tmp regehr$ echo $?
124
Johns-MacBook-Pro:tmp regehr$

Also, if these types are illegal then I’m pretty sure that we should unconditionally error out of codegen instead of producing plausible output.
Thanks,
John

tschuett · November 13, 2023, 7:23pm

You put the test_2_3.aarch64.ll in the wrong position. It must be before -o -.

It is called legalisation. The Dag and GIsel turn illegal types and operations into legal types and operations. It is based on heuristics and an ongoing effort. It is not standardised by Arm.

tschuett · November 13, 2023, 7:27pm

The GIsel Legalizer for AArch64. You find all kinds of widen and clamp and … The file changes over time.

regehr · November 13, 2023, 7:52pm

It hangs either way. gdb says there’s a loop in the legalizer.

I’m using the llc from our github: clang+llvm-17.0.3-arm64-apple-darwin22.0.tar.xz, but I see the same behavior from 17.0.4 on Ubuntu 22.04. On an LLVM I built from upstream the other day this command, on this input file, gives me an assertion violation.

tschuett · November 13, 2023, 8:00pm

I opened an issue for further investigation:

regehr · November 13, 2023, 8:43pm

Thanks.
I’d still appreciate some sort of semi-official statement about what vector types are priorities for the AArch64 backend. In the meantime we’ll focus on vectors of i8, i16, i32, i64.

regehr · November 13, 2023, 9:07pm

Hmm… even considering vectors of i{8, 16, 32, 64} there’s still an awkward, case, <3 x i8>, where the vector arrives in a triple of general-purpose registers instead of a vector register. Maybe we’ll work on other things for a while.

Johns-MacBook-Pro:tmp regehr$ cat test_3_8.aarch64.ll 
define <3 x i8> @vector_add_3_8(<3 x i8> %a, <3 x i8> %b) {
    %c = add <3 x i8> %a, %b
    ret <3 x i8> %c
}
Johns-MacBook-Pro:tmp regehr$ llc test_3_8.aarch64.ll -o -
	.section	__TEXT,__text,regular,pure_instructions
	.build_version macos, 14, 0
	.globl	_vector_add_3_8                 ; -- Begin function vector_add_3_8
	.p2align	2
_vector_add_3_8:                        ; @vector_add_3_8
	.cfi_startproc
; %bb.0:
	fmov	s0, w3
	fmov	s1, w0
	mov.h	v0[1], w4
	mov.h	v1[1], w1
	mov.h	v0[2], w5
	mov.h	v1[2], w2
	add.4h	v0, v1, v0
	umov.h	w0, v0[0]
	umov.h	w1, v0[1]
	umov.h	w2, v0[2]
	ret
	.cfi_endproc
                                        ; -- End function
.subsections_via_symbols

nikic · November 13, 2023, 9:17pm

Targets do not typically have stable ABI for illegal vector types. You’ll basically get whatever happens to be the default legalization. For vectors with non-power-of-two number of elements, I believe scalarization is the default.

regehr · November 13, 2023, 9:33pm

Thanks-- so we can move forward with power-of-2-sized vectors of {i8, i16, i32, i64}? That’s the maximal set with a stable ABI?

kbeyls · November 14, 2023, 2:48pm

For AArch64 specifically, I think the Arm Procedure Call Standard only defines rules for passing arguments for “short vectors” which are 8 or 16 bytes, see https://github.com/ARM-software/abi-aa/blob/main/aapcs64/aapcs64.rst#short-vectors. That ABI of course defines mapping from C and C++ to AArch64; not from LLVM-IR to AArch64.

I’m not sure if only these 8 or 16 bytes-sized vectors should be considered the stable ABI, or if implementations rely on stability of other-sized (LLVM-IR) vectors.

https://github.com/ARM-software/abi-aa/blob/main/aapcs64/aapcs64.rst#appendix-support-for-advanced-simd-extensions suggests that not only 8- or 16-byte-sized vectors of with element types {i8, i16, i32, i64} are expected to have a stable ABI, but also for 8- or 16-byte-sized vectors with element types {half, float, double, bfloat}.

regehr · November 14, 2023, 4:33pm

Thanks Kristof. I had been trying a different approach to figuring out which vector types are important to the community, which is to just look at the number of occurrences of different vector types in .ll files in our repo. Below are the types that occur at least 1000 times. Of course the fact that these occur doesn’t mean we expect a stable ABI.

77977 <4 x i32>
58328 <4 x float>
45458 <2 x i64>
41843 <8 x i16>
32164 <8 x i32>
29831 <2 x i32>
26779 <2 x double>
24366 <4 x i64>
20242 <2 x float>
18767 <8 x i64>
15527 <8 x float>
15488 <4 x i16>
15060 <8 x i1>
14881 <8 x i8>
14445 <4 x i1>
14126 <4 x double>
13712 <2 x i1>
13413 <2 x i8>
11668 <4 x i8>
11015 <2 x i16>
9770 <8 x double>
9103 <8 x half>
7448 <2 x half>
5960 <4 x half>
3775 <3 x float>
3654 <1 x i64>
2860 <3 x i32>
2774 <1 x double>
2060 <3 x i8>
1980 <3 x double>
1897 <1 x i32>
1828 <2 x s16>
1778 <1 x i1>
1733 <4 x s32>
1581 <1 x float>
1354 <2 x i4>
1292 <3 x i16>
1213 <2 x ptr>
1051 <3 x i1>
1050 <4 x ptr>
1042 <1 x i16>
1018 <3 x half>

Topic		Replies	Views
[AArch64][ABI] should we add more fixed vector type wider than 128-bit? Beginners llvm	4	231	January 31, 2024
ARM aapcs calling convention for small vectors LLVM Dev List Archives	2	88	September 21, 2012
[AArch64][SVE] Floating Point Code Gen LLVM Dev List Archives	2	121	June 22, 2020
Passing a 256 bit integer vector with XMM registers LLVM Dev List Archives	0	71	September 20, 2013
Using MacOS calling convention to call external functions AArch64	4	425	March 2, 2023

AArch64 calling convention for small vectors?

Related Topics