Loop localize global variables

Hello all,

I am writing to get some feedback on an optimization that I would like to
upstream. The basic idea is to localize global variables inside loops so
that it can be allocated into registers. For example, transform the
following sequence

static int gbl_var;
void foo() {

  for () {
     ...access gbl_var...
  }

}

into something like

static int gbl_var;
void foo() {
  int lcl_var;

  lcl_var = gbl_var;
  for () {
     ...access clc_var...
  }
  gbl_var = lcl_var;

}

This transformation helps a couple of EEMBC benchmarks on both Aarch64 and
Hexagon backends. I was wondering if there is interest to get this
optimization upstreamed or if there is a better way of doing this.

Thanks,
Sundeep

Qualcomm Innovation Center, Inc. is a member of Code Aurora Forum,
hosted by The Linux Foundation

typo corrected

  lcl_var = gbl_var;
  for () {
     ...access lcl_var...
  }
  gbl_var = lcl_var;

Hi Sundeep,

I am also interested in the load-store lifting transformation.

For static globals as-in your example, the transformation in general
would rely on a better static global aliasing information that is
currently in review http://reviews.llvm.org/D10059

For non-static globals, one problem with loop-based analysis alone is
that in a popular embedded benchmark suite, you get serious gains if
you can localize the globals in code like

int *G1;
int *G2;

foo () {
  G1 = malloc(...);
  for (...) {
       // Lots of stuff with G1 and G2 worth localizing
  }
}

That malloc is very important to consider when doing an alias query
because now the aliasing infrastructure knows G1, G2 don't alias, and
you won't see it from a loop pass. If you were to try and modify LICM
to localize the globals for example, it would have to assume G1 and G2
MayAlias. I believe this implies we must use a FunctionPass, and I
have a prototype that catches cases like the above, as well as the
simpler ones. I can't commit myself however to when that will be
ready, so I'm just sharing what I've found out, maybe it's helpful.

I'd be interested to hear your thoughts / approach in greater detail.

Thanks!
--Charlie.

From: "Charlie Turner" <charlesturner7c5@gmail.com>
To: sundeepk@codeaurora.org
Cc: "LLVM Developers Mailing List" <llvmdev@cs.uiuc.edu>
Sent: Tuesday, July 21, 2015 9:22:04 AM
Subject: Re: [LLVMdev] Loop localize global variables

Hi Sundeep,

I am also interested in the load-store lifting transformation.

I am as well, LICM should certainly be taught to speculatively load/store to conditionally-accessed dereferenceable addresses when that is likely to be profitable.

For static globals as-in your example, the transformation in general
would rely on a better static global aliasing information that is
currently in review http://reviews.llvm.org/D10059

This will help, yes, but the transformation is quite useful regardless.

For non-static globals, one problem with loop-based analysis alone is
that in a popular embedded benchmark suite, you get serious gains if
you can localize the globals in code like

int *G1;
int *G2;

foo () {
  G1 = malloc(...);
  for (...) {
       // Lots of stuff with G1 and G2 worth localizing
  }
}

That malloc is very important to consider when doing an alias query
because now the aliasing infrastructure knows G1, G2 don't alias, and
you won't see it from a loop pass. If you were to try and modify LICM
to localize the globals for example, it would have to assume G1 and
G2
MayAlias.

Why?

-Hal

Hi Charlie,

My prototype only handles the static case. It's very simple implementation.

It relies on ProcessInternalGlobal in GlobalOpt.cpp to check for the
safety of static variables those can be localized (address-not-taken,
volatile etc). Then I run a LoopPass that pretty much goes through each BB
of the Loop and collects all static variables. It also checks for some
more safety conditions (no calls, inline asm blocks etc). If the static
variable is safe to localize in the loop, it creates alloca, loads from GV
and stores into alloca in the pre-header, load from alloca and store into
GV in the exit block, and replaces all uses of GV in the loop with alloca.

The malloc case you mentioned below is very interesting but I don't follow
why you need a function pass to handle this case.

Thanks,
Sundeep

Sorry, I made a mistake about this. I thought because of the way LICM
adds instructions to its tracker from inner-loops outwards, and
doesn't look at instructions outside the loop, it wouldn't be able to
make use of the malloc information. But the alias analysis should have
spotted the surrounding context for us already -- I misunderstood the
alias analysis infrastructure. Apologies if I caused unnecessary
confusion suggesting LICM can't be modified with this.