AESOP autoparallelizing compiler

Hi,
We would like to inform the community that we're releasing a version of our research compiler, "AESOP", developed at UMD using LLVM. AESOP is a distance-vector-based autoparallelizing compiler for shared-memory machines. The source code and some further information is available at

http://aesop.ece.umd.edu

The main components of the released implementation are loop memory dependence analysis and parallel code generation using calls to POSIX threads. Since we currently have only a 2-person development team, we are still on LLVM 3.0, and some of the code could use some cleanup. Still, we hope that the work will be of interest to some.

We would welcome any feedback, comments or questions!

Thanks,
Tim Creech

Interesting ! I happen to finish the initial TileGX backend support, which is a many core processor. I am looking forward to testing AESOP on TileGX silicon.

Hi,

Hi Jiong,
  I actually work day-to-day with Tilera processors and I was very pleased to see your recent mail about the TileGx patch! I have access to a Tile-Gx 8036 myself and am certainly planning to add native TileGx support to AESOP in the near future. (Shouldn't be hard: mostly it will require us to finally upgrade from LLVM 3.0 and compile our runtime dependencies for it.) I expect that we will use Tilera's own barrier implementations (in libtmc) directly in our codegen.

-Tim

Hi Sebastian,
  Sure! The bulk of LMDA was written by Aparna Kotha (CCd). It computes dependences between all instructions, computes the resulting direction vectors in the function, then associates them all with loops.

At a high level, the dependence analysis consults with AliasAnalysis, and ScalarEvolution before resorting to attempting to understand the effective affine expressions and performing dependence tests (e.g., Banerjee). If it cannot rule out a dependence, then it will additionally consult with an ArrayPrivatization analysis to see if an involved memory object can be made thread private. It is probably also worth mentioning that the LMDA has been written not only to function well with IR from source code, but also with low level IR from a binary to IR translator in a separate project. This has required new techniques specific to this problem. Aparna can provide more information on techniques used in our LMDA.

-Tim

Hi Tim,

From: "Timothy Mattausch Creech" <tcreech@umd.edu>
To: "Sebastian Dreßler" <dressler@zib.de>
Cc: "Aparna Kotha" <akotha@umd.edu>, llvmdev@cs.uiuc.edu
Sent: Sunday, March 3, 2013 11:32:49 AM
Subject: Re: [LLVMdev] AESOP autoparallelizing compiler

Hi Sebastian,
  Sure! The bulk of LMDA was written by Aparna Kotha (CCd). It
  computes dependences between all instructions, computes the
  resulting direction vectors in the function, then associates them
  all with loops.

At a high level, the dependence analysis consults with AliasAnalysis,
and ScalarEvolution before resorting to attempting to understand the
effective affine expressions and performing dependence tests (e.g.,
Banerjee). If it cannot rule out a dependence, then it will
additionally consult with an ArrayPrivatization analysis to see if
an involved memory object can be made thread private. It is probably
also worth mentioning that the LMDA has been written not only to
function well with IR from source code, but also with low level IR
from a binary to IR translator in a separate project. This has
required new techniques specific to this problem. Aparna can provide
more information on techniques used in our LMDA.

This sounds very interesting; thanks for sharing this with the community. I'd also like to know more about this.

Also, out of curiosity, two quick questions:

1. Why are you using the old induction-variable simplification?

2. Are you generating OpenMP runtime calls (I see some omp references in the CodeGenerationPass); if so, for what runtime?

Sincerely,
Hal

> From: "Timothy Mattausch Creech" <tcreech@umd.edu>
> To: "Sebastian Dreßler" <dressler@zib.de>
> Cc: "Aparna Kotha" <akotha@umd.edu>, llvmdev@cs.uiuc.edu
> Sent: Sunday, March 3, 2013 11:32:49 AM
> Subject: Re: [LLVMdev] AESOP autoparallelizing compiler
>
> Hi Sebastian,
> Sure! The bulk of LMDA was written by Aparna Kotha (CCd). It
> computes dependences between all instructions, computes the
> resulting direction vectors in the function, then associates them
> all with loops.
>
> At a high level, the dependence analysis consults with AliasAnalysis,
> and ScalarEvolution before resorting to attempting to understand the
> effective affine expressions and performing dependence tests (e.g.,
> Banerjee). If it cannot rule out a dependence, then it will
> additionally consult with an ArrayPrivatization analysis to see if
> an involved memory object can be made thread private. It is probably
> also worth mentioning that the LMDA has been written not only to
> function well with IR from source code, but also with low level IR
> from a binary to IR translator in a separate project. This has
> required new techniques specific to this problem. Aparna can provide
> more information on techniques used in our LMDA.

This sounds very interesting; thanks for sharing this with the community. I'd also like to know more about this.

Also, out of curiosity, two quick questions:

1. Why are you using the old induction-variable simplification?

2. Are you generating OpenMP runtime calls (I see some omp references in the CodeGenerationPass); if so, for what runtime?

Sincerely,
Hal

Hi Hal,
  1. We are still using the old indvars simplifcation because we were depending on "-enable-iv-rewrite" and canonical induction variables. I think "-enable-iv-rewrite" is still there in 3.0, but was behaving differently from in 2.9, so we just kept using the 2.9 implementation.
  2. We only link to OpenMP (any implementation should be fine) to get the number of threads to use (omp_get_max_threads()). This makes our binaries obey the OMP_NUM_THREADS environment variable, if present. OpenMP is _not_ used for parallelization: AESOP generates pthreads calls and does very lightweight static scheduling of loop iterations at the IR level.

One thing to note is that we do not use pthreads' barriers. We still link to a (spinning) barrier implementation which was part of a now-defunct sister project. We should eventually do away with this dependence, but haven't had time yet.

Thanks,
Tim

Hi Timothy,

We would like to inform the community that we're releasing a version of our research compiler, "AESOP", developed at UMD using LLVM. AESOP is a distance-vector-based autoparallelizing compiler for shared-memory machines. The source code and some further information is available at

http://aesop.ece.umd.edu

The main components of the released implementation are loop memory dependence analysis and parallel code generation using calls to POSIX threads. Since we currently have only a 2-person development team, we are still on LLVM 3.0, and some of the code could use some cleanup. Still, we hope that the work will be of interest to some.

  Do you have data show us that how much parallelization the AESOP can
extract from those benchmarks? :slight_smile:

Regards,
chenwj

Hi Wei-Ren,
  Sorry for the slow response. We're working on a short tech report which will be up on the website in April. This will contain a "results" section, including results from the SPEC benchmarks which we can't include in the source distribution.

Briefly, I can say that we get good speedups on some of the NAS and SPEC benchmarks, such as a 3.6x+ speedup on 4 cores on the serial version of NAS "CG" (Fortran), and "lbm" (C) from CPU2006. (These are of course among our best results.)

-Tim

Hi Timothy,

Today I happened to download the code and do some experiments.
I actually wanted to see how you handle inter-procedure alias analysis.
So, I set inline threshold to zero and tried out following example

Hi Rahul,
  Thanks for your interest!

  Our work does not attempt to make any significant contributions to alias analysis, and acts as a client to existing LLVM AA. Furthermore, the options passed to the AESOP frontend scripts are obeyed at compile time, but at link time certain transformations occur unconditionally.

Here, AESOP has actually thwarted your experiment by performing inlining just before link-time. You can disable this in ~/aesop/blank/Makefile. It does this unconditionally as a standard set of pre passes to help the analysis as much as possible. As a result, AESOP knows that A and B do not alias and is correctly parallelizing here.

Best,
Tim

Hi Timothy,

Thanks for quick reply :slight_smile:

I disabled inlining in ~/aesop/blank/Makefile. Now function “main” calls function “func1” with two arguments which alias, aesop still goes ahead and parallelize the loop in function “func”.