[ARM] Peephole optimization ( instructions tst + add )

kpdev42 · November 21, 2019, 10:00am

Hello!

I noticed that in some cases clang generates sequence of AND+TST instructions:

For example:

AND x3, x2, x1

TST x2, x1

I think these instructions should be merged to one:

ANDS x3, x2, x1

( because TST , is alias for ANDS XZR, , - https://static.docs.arm.com/ddi0596/a/DDI_0596_ARM_a64_instruction_set_architecture.pdf )

Is it missing optimization or there could be some negative effect from such merge?

Best regards

Pavel

PS: Code sample (though it may be significantly reduced):

(clang -target aarch64 sample.c -S -O2 -o sample.S )

efriedma-quic · November 21, 2019, 8:55pm

That transform is legal; it’s a missed optimization.

-Eli

kpdev42 · November 22, 2019, 11:08am

Ok, thank you, I will implement it then.

As far as I see this optimization should be done in AArch64LoadStoreOptimizer, is it right?

efriedma-quic · November 22, 2019, 6:53pm

You probably want to do this some time before register allocation, so you don’t have to worry about physical register definitions.

Maybe take a look at what ARM does in ARMBaseInstrInfo::optimizeCompareInstr ?

-Eli

kpdev42 · November 26, 2019, 6:51am

Thank you!

I took a look at this method (ARMBaseInstrInfo::optimizeCompareInstr) and how it is used.

So,if I understood correctly, I need to add new method to TargetInstrInfo (similar to optimizeCompareInstr - e.g. optimizeAddInstr) and implement it in AArch64InstrInfo.

This method should be able to transform code like this:

%47:gpr64 = ANDXrr %46:gpr64, %32:gpr64

%48:gpr64common = ORRXrr killed %47:gpr64, %28:gpr64common

%49:gpr64 = ANDSXrr %46:gpr64, %32:gpr64, implicit-def $nzcv

to this form:

%47:gpr64 = ANDSXrr %46:gpr64, %32:gpr64, implicit-def $nzcv

%48:gpr64common = ORRXrr killed %47:gpr64, %28:gpr64common

Is everything correct?

Topic		Replies	Views
Strange behaviour of post-legalising optimisations(?) LLVM Dev List Archives	5	57	June 7, 2019
X86 peephole optimization LLVM Dev List Archives	1	47	February 8, 2015
Optimization issue - how do I use normal registers and loop unrolling Code Generation clang	2	216	January 7, 2023
Suboptimal code generated by clang+llc in quite a common scenario (?) LLVM Dev List Archives	9	65	August 21, 2019
Is this repeat load operation a miss-optimization IR & Optimizations llvm	2	187	March 27, 2024

[ARM] Peephole optimization ( instructions tst + add )

Related Topics