[RFC] VScale-aware LoopStrengthReduce

Hi,

In order to better support SVE I’ve extended the LSR pass to be aware of vscale-relative ‘immediates’, so that different (hopefully better) decisions can be made when offsets are a multiple of the in-memory vector size.

The branch for this work can be found here: GitHub - huntergr-arm/llvm-project at vscale-aware-lsr

This did require extending the isLegalAddressingMode and isLegalAddImmediate TTI methods to accept a new FixedOrScalableQuantity type (using a signed Quantity instead of unsigned, as ElementCount does). As such, there’s some no-op changes in non-AArch64 targets to use getFixedValue() in their overridden versions. I wasn’t sure whether to just introduce a new interface instead to avoid changing the other backends, but I suspect that would just open the door to future bugs if someone assumed LSR would never deal with scalable offsets.

I would like some feedback on the approach before carving it up into properly reviewable patches. In particular, for:

  • The changes to isLegalAddressing mode mentioned above
  • The change in isAlwaysFoldable in LSR where I drop the default ‘scaled’ register if a scalable offset is present. It seems like AArch64 might benefit from dropping that in more cases, but I may have missed something and it’s actually a good idea to keep it.
  • The new override for isLSRCostLess in AArch64TargetTransformInfo – the ordering of the terms for comparison is just based on whatever sequence gave me the addressing modes I wanted for SVE without introducing noticeable regressions in our existing unit tests. I’m not sure why we didn’t override this to take instruction count into consideration when many other targets did, but again I may have missed something.

Comments welcome.

Thanks,
-Graham

4 Likes

Very belated follow up - I’d missed this post originally, and was just catching up on LLVM Weekly posts on a trip last week.

Glancing through your patches, they’re not unreasonable, but I want to take a step back and ask a question.

From your code, I found the existence of the addvl instruction, but does SVE actually have an addressing mode which incorporates VLEN scaling in the LD/ST instruction? Your change seems to presume there is, but I couldn’t find such with some quick googling.

If such a thing does exist, then your approach would seem like the right one. If it doesn’t, then some deeper discussion about why you approached it this way might be reasonable.

Hi; thanks for taking a look.

SVE does have addressing modes related to the size of a vector in-memory; see the ‘scalar plus immediate’ forms of e.g. ld1b. This is signified in asm by an immediate accompanied by the text ‘mul vl’ in the address.

A simple asm example:

	ld1b	{ z0.b }, p0/z, [x0]
	ld1b	{ z1.b }, p0/z, [x0, #1, mul vl]
	ld1b	{ z2.b }, p0/z, [x0, #2, mul vl]
	ld1b	{ z3.b }, p0/z, [x0, #3, mul vl]

That will load 4 z registers from contiguous locations in memory, no matter what VL/vscale is.