Whole program alias analysis in backend

Hello everyone,

   we are planning to implement a stronger alias analysis
for backend, because e.g. for VLIW architectures, this is our main performance limitation.
I would have 2 questions regarding this.

   I know that backend processes one function at a time,
is it somehow possible to do there a whole program analysis,
or could you give me some guidelines?

   Which alias analysis algorithm you would recommend?
There was a Stensgaard algorithm implemented before, but noone
was maintaning it, so it was removed. Do you think that this could be a suitable
algorithm or should we choose something newer and stronger?
Regarding the scalability, it ok for us, if the algorithm would run e.g. 1 hour for a
100 KLOC application.

Thank you
   Adam Husar

Hi,

Hi,

Hi,

       I know that backend processes one function at a time,
    is it somehow possible to do there a whole program analysis,
    or could you give me some guidelines?

The backend introduces a MachineFunctionPass, from which point on it is only possible to run FunctionPasses, otherwise the machine functions get screwed.

For our project, we also want do to whole-program analyses. I managed to patch the PassManager, MachineFunctionPass and others to work with a new MachineModule pass I introduced in our copy of LLVM. Now we can run module passes in the backend, and have proper pass dependencies between any combination of function passes and module passes (I haven't tested my changes with any kind of basic block- or loop-passes).

However, this was not an easy task.. it required some bugfixing deep down in the PassManager infrastructure and some refactoring of the backend passes (basically, the MachineFunction needs to be stored and fetched from an ImmutablePass instead of keeping it (only) in the MachineFunctionPass, the PassManager must be able to create new MachineFunctionPasses itself, and the ImmutablePasses need to be passed around in the PassManager properly). The downside is that the memory consumption will go up since the backend must now keep all MachineFunctions in memory instead of processing them one-by-one (I avoid this overhead as long as no MachineModulePass is used though). I also have not checked what happens if you remove or add functions in a module pass.

I hacked this into our backend which is based on LLVM 3.2. There have been some changes to the way passes are initialized and finalized in 3.3, but I think most of my patches will be the same for upstream. I am thinking about posting the patches to the LLVM mailinglists, but for this I need to find the relevant changes first, clean them up and port to 3.3, and write some proper test-cases.. I won't have time to do this properly in the near future, but if anybody is interested, I can try to find the main commits and post them as they are. And if you are really curious, you can find the git-repository here:

https://github.com/t-crest/patmos-llvm

Kind regards,
  Stefan

Hello,
   thank you very much Stefan, we will take a look at the whole module passes
modifications. I seem quite complicated, but probably it is the only option.

   Regarding the alias analysis, maybe it could be done before in opt,
and to keep track of what happens to the loads and stores from LLVM IR
in the backend machine code, I do not know yet, we will have to think about that.

   Regarding the memory consumption, it is not much an issue for us I think,
At least for the opt passes a 20MG source code (generated) file occupies
something about 4G of memory which is still ok I think, I hope it will be similar
for the backend.

Thank you
   Adam

Hello everyone,

  we are planning to implement a stronger alias analysis
for backend, because e.g. for VLIW architectures, this is our main
performance limitation.
I would have 2 questions regarding this.

  I know that backend processes one function at a time,
is it somehow possible to do there a whole program analysis,
or could you give me some guidelines?

  Which alias analysis algorithm you would recommend?
There was a Stensgaard algorithm implemented before, but noone
was maintaning it, so it was removed

There were other concerns that should not be gone into on this mailing list.

. Do you think that this could be a suitable
algorithm or should we choose something newer and stronger?
Regarding the scalability, it ok for us, if the algorithm would run e.g. 1
hour for a
100 KLOC application.

Newer implementations of Andersen's will give you more accurate analysis,
and run in about 20 seconds on 1.5 million LOC on a modern machine.
You could actually do better with a little work, but that is an actual
number from an real field-sensitive solver implemented on top of LLVM,
running on Wine (that is about 2.5 million lines of code).
Scalability is roughly linear in practice. The GIMP, which is about 700k
LOC, takes 4 seconds.

There are always edge cases, of course.