[Polly] Reduced code analyzability moving from LLVM 3.9.0 to 5.0.1

Hi,

Recently I was looking at the potential of optimizing through Polly. The
code that I am trying to optimize [1] adjusts a picture's colors to get
an Instagram-like effect.

To improve code analyzability on LLVM 3.9.0, I made the following changes:
- Improve SCoP detection through -polly-process-unprofitable
- Enable outer loop vectorization through -polly-vectorizer=stripmine,
disabling timeouts with -polly-dependences-computeout=0
- Avoid sign extensions by replacing all 32-bit ints with longs, as
Polly seems to model using 64-bit loop counters
- Avoid interrupting control flow through -ffast-math and moving mallocs
to the top of the code

So to compile, we have:
    clang -I. -O3 -g3 -Wall -Wextra -std=c99 -D_POSIX_C_SOURCE=200000L
-ffast-math -mllvm -polly -mllvm -polly-dot -mllvm
-polly-process-unprofitable -mllvm -polly-vectorizer=stripmine -mllvm
-polly-dependences-computeout=0 -c -o localcolorcorrection.o
localcolorcorrection.c

Unfortunately, LLVM 5.0.1 generates different results in analyzing the
CFG compared to LLVM 3.9.0. The latter version analyzes most of the CFG
[2], but 5.0.1 leaves large parts of the hot paths untouched due to "non
affine access functions" [3].

What I have tried:
    - Moving Polly to different positions in the LLVM pass pipeline
(-polly-position=early vs. -polly-position=before-vectorizer). The
latter option adds one large basic block, but otherwise doesn't seem to
analyze the hot paths.
    - Setting -polly-delicm-compute-known=true and
polly-delicm-overapproximate-writes=true. This doesn't seem to have
effect on the hot paths.

Can anyone give me some pointers on how to fix this? Or could this be a
regression in Polly?

Thanks!

[1] https://nautilus.bjornweb.nl/files/localcolorcorrection.c
[2] https://nautilus.bjornweb.nl/files/polly390-cfg.pdf
[3] https://nautilus.bjornweb.nl/files/polly501-cfg.pdf

Hi,

Polly can only analyze (multidimensional) affine memory access. Polynomial memory access don’t do well, and I see your code has some linearized arrays (that leads to polynomials).
Luckily Polly has a delinearizer that tries to recover multidimensional access from linearized ones, but the problem is that it does not always work (especially if earlier transformations “optimize” it).

That might be the problem here, you could look at the SCEV of the memory access if they look “nice”.
I don’t know how good is the delinearization in general. That is, does it survive most of LLVM transformations?

Hi Johannes,

Perfect, thanks! The CFG now looks very similar to what I got on LLVM
3.9.0 ([1] vs [2]).

Any idea why setting -simplifycfg-sink-common=false is necessary?
Similar to LLVM 5.0.1, the default for 3.9.0 is true [3], and setting it
to false wasn't necessary in the latter version.

[1]

[2] https://nautilus.bjornweb.nl/files/polly390-cfg.pdf
[3]
https://github.com/llvm-mirror/llvm/blob/release_39/lib/Transforms/Utils/SimplifyCFG.cpp#L71

Hi Björn,

check block %for.inv147 in all your versions. In the 5.0.1 output (first
email) it says that %add118.sink is involved in some non-affine accesses
and that is basically why the large region wasn't detected as a SCoP
anymore. Now the access is not "non-affine" but actually piece-wise
affine (which is indistinguishable for ScalarEvolution and thereby for
Polly [see [0]]). This is now a problem because, either:
  1) LLVM got smarter and figured that is can sink and coalesce the
  memory loads in the predecessors of %for.inv147. This "simplifies" the
  code and reduces the code size. Or,
  2) Polly runs later in the pipeline (which it does) and that means the
  code was always transformed this way but commonly after Polly was
  done.

Since I saw this problem a lot recently I would guess it is because
Polly was moved to the later position in the pipeline. Though, I did not
check if this is true.

Cheers,
  Johannes

[0] 2017 LLVM Developers’ Meeting: J. Doerfert “Polyhedral Value & Memory Analysis ” - YouTube