Redundant copies

Hi all,

we have encountered a case of redundant copies still left in the final code and we would like to, at least, mitigate it. The original motivating case comes from a context where we have large vector registers. In that context, copies are expensive and we would like to avoid them as much as possible.

This small testcase in C, similar to the original vector case, exposes the issue but using scalars.

long a, b;
long fn1();
long fn2() {
long c = a, d = c;
for (; b;) {
long e = fn1();
d = d + e;
}
long f = d - c;
return f;
}

For instance in RISC-V we emit something like this but other backends like ARM or X86 show the same behaviour.

add s0, zero, s2 # ← copy
beqz a0, .LBB0_3

%bb.1: # %for.body.preheader

add s0, zero, s2 # ← not needed
.LBB0_2: # %for.body

Has anyone encountered a similar issue like this in the past?

We are looking into removing these copies with a post RA pass to address the most obvious case: if we see a copy with the same physregs in dest and source to an earlier one and the reaching definition of the dest and source registers is one and the same, then that copy should be redundant.

This might be too specific though, so perhaps there are better approaches?

Thanks!

  • Sam

Hi Roger,

FWIW: we have observed redundant copies/movies, they are annoying us for some time now but we haven’t got round to looking at it. Not sure we if we are looking at exactly the same problem, but I guess so.

Treating symptoms with post RA dead code elimination might be very effective, but it might also be worth to just have a look at the source of the problem (regalloc?) to see if we are not missing something obvious.

Regarding a post RA pass: you may want to have a look at the ARM hardware-loop pass. In order to make that beneficial, we have to do quite some dead code elimination post RA, both in inside loops and in preheaders, see e.g. ARMLowOverheadLoops::IterationCountDCE. This is using ReachingDefAnalysis (RDA), which has been extended by Sam and made more generic to support this, which was also going to be his eurollvm talk: http://llvm.org/devmtg/2020-04/talks.html#LightningTalk_26. End of advertisement. :wink: Basically what I want to say is that this should provide most of the things you’ll need.

Cheers,
Sjoerd.

Hi Sjoerd,

I’m already using RDA in the pass I mentioned and it works great. Thanks Sam!

Regarding the root cause, I didn’t see anything obviously suboptimal not in the copy coalescing or the register allocation, at least in my previous example. Alternatively we might want to improve what we pass onto RA: i.e. remove the redundant copy earlier. At this point however it doesn’t (obviously) look like one (it still using different vregs) which suggests it might require a bit more of work to discover something that will ultimately lead to a redundant copy. I will investigate this option as well.

I’ll take a look at the hardware-loop pass DCE code. Thanks for the pointer!

Kind regards,

Missatge de Sjoerd Meijer <Sjoerd.Meijer@arm.com> del dia dj., 12 de març 2020 a les 20:50:

At this point however it doesn’t (obviously) look like one (it still using different vregs) which suggests it might require a bit more of work to discover something that will ultimately lead to a redundant copy. I will investigate this option as well.

I correct myself here: in the MIR dumps (for this example) right after copy coalescing the copy is redundant even at vregs

%14:gpr = COPY %0
BEQ %6, $x0, %bb.3
PseudoBR %bb.1

bb.1.for.body.preheader:
%14:gpr = COPY %0

Kind regards,

Yep, exactly that. We see quite a lot of them, most of them get cleaned up, but not always…

Cheers.

Hi Sam,

just a short question: I was taking a look at the RDA pass and it doesn’t currently handle RegMasks. Is in your backlog to add support for those?

I am asking because I found myself analyzing across call instructions. Not taking the regmasks into account leads to inexact results and needs extra care. So I wondered if this was already in your phabricator backlog. I made a first implementation downstream but before I continue with it I wanted to make sure I’m not stepping on your toes :slight_smile:

Thank you very much,

Missatge de Sjoerd Meijer <Sjoerd.Meijer@arm.com> del dia dl., 16 de març 2020 a les 13:28:

Hi Roger,

I don’t have anything on my TODO list for the analysis, so any patches welcome! And, of course, I’ll happily help out with reviews.

cheers,