AArch64 Round table

sjoerdmeijer · September 26, 2023, 1:11pm

I would like to organise a AArch64 round table and wanted to check if people are interested and would like to propose any topics. For example, we could discuss anything related to AArch64 code-generation, optimisations, enablement, things people would like to work on and share, blockers, etc.

Please reply or leave a message if you’re interested as that would help to see if it is worthwhile organising a round table. I missed the deadline to get the roundtable included in the online agenda but Tanya wrote:

At the event you can write your round table title on the agenda side outside the room. It will be visible to attendees who walk by the sign, but it won’t be on the online agenda.

Tagging some folks that might be attending and interested:
@kbeyls , @smithp35 , @fhahn , @sscalpone , @ramana.radhakrishnan

kbeyls · September 26, 2023, 1:40pm

Thanks for organizing this, @sjoerdmeijer. I would definitely attend such a round table.

smithp35 · September 26, 2023, 2:37pm

Assuming it doesn’t clash with the PAuth and Embedded Toolchains, I would also attend.

ramana.radhakrishnan · September 26, 2023, 7:32pm

Thanks @sjoerdmeijer for kick starting this .

Perhaps it is worth starting to crowdsource some topics, do you have any @kbeyls and @smithp35 that complement the Pauth ABI and Embedded Toolchain Round tables ?

kbeyls · September 27, 2023, 7:21am

Yeah, I think it would be a good idea to crowdsource some topics.
I’m happy to contribute to any topic related to better support for AArch64 in LLVM technology.

A non-exhaustive list of a few things that are on my mind personally are:

deployment of pointer authentication and other security features in the AArch64 architecture.
support for AArch64 in bolt.
full globalisel support for AArch64.

I’m sure I’m forgetting a lot more topics that I’m actually interested in

banach-space · September 27, 2023, 7:35pm

Thanks for proposing this!

I will be around and can discuss SVE and SME enablement in MLIR. If that’s of interest to anyone

-Andrzej

smeenai · October 2, 2023, 6:52pm

I work on Meta’s mobile apps, and I’d be interested in attending and discussing issues around binary size. In particular, I was wondering if there’s been any discussions around a smaller unwind info format for ELF AArch64 (similar to exidx/extab on armv7, compact unwind on Darwin, and pdata/xdata on Windows), because we find that unwind information is actually a large contributor to binary size for us and also impedes other size optimizations like outlining (the added unwind information for outlined functions adds a lot of size overhead).

sjoerdmeijer · October 3, 2023, 7:46am

Thanks for all the replies! It looks like there’s enough interest, so let’s go ahead with this.
I will register the round table at the conference and try to avoid the path and embedded toolchain roundtables if different time slots are available.

I would like to bring some performance related topics to the table. Perhaps something related to auto-vectorisation and cost-modelling but I will see if I can make that more concrete.

smeenai · October 11, 2023, 5:26pm

Did you end up deciding a time for this?

MaskRay · October 11, 2023, 6:54pm

I’d like to attend once the time and the venue is decided.

There are 4 round tables on Wednesday and ~7 on Thursday.

sjoerdmeijer · October 11, 2023, 10:08pm

Let’s go for 11:00am tomorrow (Thursday).

The schedule for tomorrow is not up yet, but I will add it in the morning.

sjoerdmeijer · October 12, 2023, 3:57pm

Sorry for the reschedule, but on request we have moved this to 16:15hrs so that more folks can attend.

madhur13490 · October 17, 2023, 7:40am

Can someone please share MoM/notes from this round table?

Thanks
Madhur.

smithp35 · October 17, 2023, 11:04am

Notes from my memory. I expect that we’ll need more people’s notes to get a full picture:

Code Size including exception table size

Some mobile applications have a large code base optimised with -Oz with many outlined functions. Parts of the code-base use exceptions so unwind tables are required. The number of outlined functions leads to large .eh_frame sizes. Is there any scope for a compact unwinding table format, asynchronous exceptions are not required. No-one has plans on implementing such a format as it is a considerable amount of work to produce and then maintain over time. Could the exception tables be compressed? Yes but at a large up-front hit when the first exception is called, is there enough memory to decompress etc.

Most at the table are working on performance optimisations for AArch64 rather than code-size. Outside of outlining code-size optimisation are largely a long tail of small optimisations that accumulate over time.

Bolt on AArch64

Several people have tried it. Had some very good results on some programs but not much on others. Some programs such as various language VMs don’t work at all. For example if they encode pointers as offsets which breaks some assumptions that Bolt makes.

Biggest limitation for Bolt was seen to be weakness of sample based profiling for AArch64. Bolt can support instrumentation based profiles which should improve the situation.

Due to its nature it is likely to be an expert level tool requiring some understanding of the program and the hardware to get the best out of it.

There are regular Bolt office hours. Please try it out, ask questions report bugs, submit patches to improve.

SME

Will Clang generate code for SME other than via intrinsics? Not via clang, but it is possible via MLIR.

Performance

Only a small amount spent on this. In general agreement that a regular AArch64 call would be useful to coordinate on benchmarks, different optimisations, testing etc. Just needs someone to organise!

sjoerdmeijer · October 17, 2023, 12:34pm

Thanks @smithp35, I think that’s a good summary. I don’t have much to add.

In general agreement that a regular AArch64 call would be useful to coordinate on benchmarks, different optimisations, testing etc. Just needs someone to organise!

I agree that it would be really useful to have this, and I would like to volunteer for organising this. I will set something up and plan something in soon.

banach-space · October 17, 2023, 1:59pm

Thank you for volunteering - I think that it will be super helpful !

Related to this, last week I chatted with a few people about SME in the context of MLIR/Clang. Do you think that it would be OK to direct them to this call for any updates and/or questions that they may have?

-Andrzej

sjoerdmeijer · October 17, 2023, 2:05pm

Sure, why not. We can add SME to the agenda, so let’s do that and see how it goes;
let’s see how it goes before thinking about a separate SME sync.

fhahn · October 17, 2023, 2:42pm

FWIW Clang has a matrix type (fixed dimensions only at the moment, but may be extended to allow variable dimensions). I think SME code could be generated for those.

banach-space · October 17, 2023, 7:12pm

A bit of advertising on your behalf

AArch64 Sync-up

Yes, but even then the user would be responsible for e.g.

virtual tile allocation, and
enabling/disabling the streaming mode and the ZA storage array.

Basically, there’s relatively little “hand-holding” that end-users can expect from the backend ATM. But given how specialised SME is, perhaps that’s OK?

On a related note, SME brings two things:

ZA array storage (for accelerating e.g. outer-products),
Streaming SVE (in addition to host CPU SVE).

Only the latter will only require Step 2 from above. So it’s not like Clang users won’t be able to leverage their lovely “matrix extension”

fhahn · October 17, 2023, 7:52pm

I’ve not looked at SME in detail, but couldn’t the compiler take care of that?

Topic		Replies	Views
Round table on AArch64 Pauth ABI - minutes LLVM Dev List Archives	0	119	October 15, 2020
AArch64 Roundtable AArch64	1	80	October 25, 2024
AArch64 Sync-up AArch64	7	616	November 7, 2023
AArch64 Sync Up Call AArch64	0	122	March 26, 2024
AArch64 LLVM Sync-up Call 9 January 2024 AArch64	3	260	January 9, 2024

AArch64 Round table

Related topics