Regarding codegenprepare transformations

Hello everyone,

For the context of question, I have a small loop written in a custom front-end which can be fairly accurately expressed with the following C program:

struct Array {
double * data;
long n;

#define X 0
#define Y 1
#define Z 2

void f(struct Array * restrict d, struct Array * restrict out, const long n)
for (long i = 0; i < n; ++ i) {
for (long j = i + 1; j < n; ++ j) {
out->data[X] = d->data[i * 3 + X] * d->data[j * 3 + X];
out->data[Y] = d->data[i * 3 + Y] * d->data[j * 3 + Y];
out->data[Z] = d->data[i * 3 + Z] * d->data[j * 3 + Z];


I’m looking through the IR transformations during passes added by LLVMTargetMachine::addPassesToEmitFile and seeing something I could use some help explaining. The point of interest is between ‘unreachableblockelim’ and ‘codegenprepare’ passes. Here is the paste of IR after each pass

I’ve annotated 3 spots in the code with stars. In (1), after unreachableblockelim, addr89 is precomputed outside the loop once and is used in store in (2). However, in (3), after codegenprepare, there is now a bunch of math being done every loop iteration to get the address for the same store. Additionally, looks like the same thing is happening for several addresses above as well.

Does this look right? Why would those calculations be moved back into the loop?



Basically, it thinks that the loads within the loop are using a free(ish) addressing mode that your target will be able to fold into the load. So it sinks the address computations into the loop on the assumption that the addressing mode ISel will kick in, for a net same performance with less register pressure.