Consistent assembly code is not obtained compared to

void testWhileWR(int *data1, int *data2, int size) {
  for (int i = 0; i < size; i++) {
    data2[i] = data1[i];
  • on the Compiler Explorer
    we use cntw x10 after ISel

  • Whie base on the local building of upstream branch: ~/llvm-project-upstream/build/bin/clang -march=armv9-a+sve2 -O3 -w ./test.c -mllvm -prefer-predicate-over-epilogue=predicate-else-scalar-epilogue -Xclang -target-feature -Xclang +use-scalar-inc-vl -S
    I get rdvl x10, #1 + lsr x10, x10, #4 + lsl x10, x10, #2 base on last commit 6370f75ad70

It looks like compiler explorer doesn’t generate cntw x10 any longer. The compilers used on godbolt are a few hours/days behind latest trunk I think, which could explain the difference earlier.

Thanks, begin with 45a379ce Revert “[Libomptarget] Stop testing CPU offloading with LTO”, it doesn’t generate cntw x10, and it is commited at Jul 22 12:04:33 2022, so it is a few months behind latest trunk.

A drive-by comment (after answer accepted)

To find out the commit number of clang on godbolt, --version helps (Compiler Explorer)

according your reminding, I rebuild my local compiler with commit 61e5c14fa8d (Nov 16 00:00:33 2022), but I still don’t get cntw, which have same commit with Compiler Explorer, it is strange.

The difference is probably a result of -O2 vs -O3.

Compiler Explorer shows cntw is generated with O2, and not generated with O3 (other clang options remain the same; this is the result of armv8-clang at trunk, with all architectural features)

oh, yes, I missing this. Related code committed for a very long time.

(Sander de Smalen      2020-01-21 10:20:27 +0000 2346)     def : Pat<(vscale (i64 1)), (UBFMXri (RDVLI_XI 1), 4, 63)>;
(Sander de Smalen      2020-01-21 10:20:27 +0000 2347)     def : Pat<(vscale (i64 -1)), (SBFMXri (RDVLI_XI -1), 4, 63)>;