Some of my recent studies have found that llvm supports five different tail block processing strategies: folding tail block, scalar epilogue, and forcing tail block to VF=x vectorization.
For SVE, I found that the default handling policy for clang is scalar epilogue, but I found that gcc12 does default to folding tails, If the memory check function is not required.
I used the simulator to measure some data and found that most of the scenarios are folding tail blocks. Why does clang not fold tail blocks by default?
It makes me wonder, can someone help me?
Thank you very much.