[AArch64/Cyclone] ZCZeroing Feature

Hi,

I am trying to port FeatureZCZeroing from Cyclone to Kryo. Using immediate #0 to zero out W and X registers works great in Kryo. But using #0 to zero out float registers sometimes causes extra register spills or move instructions on either Cyclone or Kryo.

Take the following C function as an example

double foo(int n) {

double r=-10000;

for (int i=0;i<n;i++) {

x = sin(i);

r = max(r,x);

}

return r;

}

If compiled towards Cyclone, the loop body has one spill and two reloads as below

.LBB0_1: // %for.body

// =>This Inner Loop Header: Depth=1

str q0, [sp] // 16-byte Folded Spill

ldr q0, [sp] // 16-byte Folded Reload

bl sin

fmaxnm d8, d8, d0

ldr q0, [sp] // 16-byte Folded Reload

fadd d0, d0, d9

add w20, w20, #1 // =1

cmp w20, w19

b.lt .LBB0_1

If FeatureZCZeroing is disabled (together with FeatureZCRegMove) on Cyclone, the translated assembly does not have these load/store instructions:

.LBB0_1: // %for.body

// =>This Inner Loop Header: Depth=1

mov v0.16b, v8.16b

bl sin

fmaxnm d9, d9, d0

fadd d8, d8, d10

add w20, w20, #1 // =1

cmp w20, w19

b.lt .LBB0_1

PR27454 has an attached .ll test case. It would be nice if this problem could be solved so that Kryo and Cyclone could use the united method to zero out float registers.

Best,

Haicheng