[SVE]different tail block processing policies

Some of my recent studies have found that llvm supports five different tail block processing strategies: folding tail block, scalar epilogue, and forcing tail block to VF=x vectorization.


For SVE, I found that the default handling policy for clang is scalar epilogue, but I found that gcc12 does default to folding tails, If the memory check function is not required.

I used the simulator to measure some data and found that most of the scenarios are folding tail blocks. Why does clang not fold tail blocks by default?

It makes me wonder, can someone help me?

Thank you very much.

⚙ D130618 [AArch64][LoopVectorize] Enable tail-folding of simple loops on neoverse-v1 contains a discussion about this. That patch got accepted, so future versions of LLVM will do some tail-loop folding by default (but as you can read in the comments of that patch not every time).

Okay. thank you.