[ARM] Peephole optimization ( instructions tst + add )


I noticed that in some cases clang generates sequence of AND+TST instructions:

For example:

AND x3, x2, x1

TST x2, x1

I think these instructions should be merged to one:

ANDS x3, x2, x1

( because TST , is alias for ANDS XZR, , - https://static.docs.arm.com/ddi0596/a/DDI_0596_ARM_a64_instruction_set_architecture.pdf )

Is it missing optimization or there could be some negative effect from such merge?

Best regards


PS: Code sample (though it may be significantly reduced):

(clang -target aarch64 sample.c -S -O2 -o sample.S )

That transform is legal; it’s a missed optimization.


Ok, thank you, I will implement it then.

As far as I see this optimization should be done in AArch64LoadStoreOptimizer, is it right?

You probably want to do this some time before register allocation, so you don’t have to worry about physical register definitions.

Maybe take a look at what ARM does in ARMBaseInstrInfo::optimizeCompareInstr ?


Thank you!

I took a look at this method (ARMBaseInstrInfo::optimizeCompareInstr) and how it is used.

So,if I understood correctly, I need to add new method to TargetInstrInfo (similar to optimizeCompareInstr - e.g. optimizeAddInstr) and implement it in AArch64InstrInfo.

This method should be able to transform code like this:

%47:gpr64 = ANDXrr %46:gpr64, %32:gpr64

%48:gpr64common = ORRXrr killed %47:gpr64, %28:gpr64common

%49:gpr64 = ANDSXrr %46:gpr64, %32:gpr64, implicit-def $nzcv

to this form:

%47:gpr64 = ANDSXrr %46:gpr64, %32:gpr64, implicit-def $nzcv

%48:gpr64common = ORRXrr killed %47:gpr64, %28:gpr64common

Is everything correct?