Identifying contained loops

One of the things I’m trying to do is perform high-level optimizations on loops, which requires first identifying them. For a simple case, suppose you have something like

for (size_t i = 0; i != n; ++i)
++a[i];

If a is a simple array, that will compile to a single basic block, which is easy enough to identify.

But the body doesn’t need to be a single basic block. It could contain complex conditions, nested loops, whatever, and for some purposes, it’s still valid to think of it as a single loop that does something to every element of a.

However, if there are branches into and out of the loop, that’s no longer valid; it might do something to half the elements of the array and leave the other half untouched, for example.

I don’t know what the concept I’m looking for - a loop that is guaranteed to proceed from beginning to end, with no branches in or out - is usually called; for the title of this post, I’ve called it a ‘contained’ loop.

I can figure out how to identify such, but I’d like to check whether this has already been done. Are there any existing passes or whatever that use this concept or identify such loops?

Hi Russel,

you might want to have a look at polly.llvm.org. We identify loop nests that can be transformed in lib/Analysis/ScopDetection.cpp. If you have specific questions I am glad to help.

Best,
Tobias

I see, this is interesting, thanks! Yeah, it seems you’re doing something similar with a specific focus on optimizing array operations; this comment seems to be the explanation of the core of it:

// Detect the maximal Scops of a function.
//
// A static control part (Scop) is a subgraph of the control flow graph (CFG)
// that only has statically known control flow and can therefore be described
// within the polyhedral model.
//
// Every Scop fullfills these restrictions:
//
// * It is a single entry single exit region
//
// * Only affine linear bounds in the loops
//
// Every natural loop in a Scop must have a number of loop iterations that can
// be described as an affine linear function in surrounding loop iterators or
// parameters. (A parameter is a scalar that does not change its value during
// execution of the Scop).
//
// * Only comparisons of affine linear expressions in conditions
//
// * All loops and conditions perfectly nested
//
// The control flow needs to be structured such that it could be written using
// just ‘for’ and ‘if’ statements, without the need for any ‘goto’, ‘break’ or
// ‘continue’.
//
// * Side effect free functions call
//
// Only function calls and intrinsics that do not have side effects are allowed
// (readnone).
//
// The Scop detection finds the largest Scops by checking if the largest
// region is a Scop. If this is not the case, its canonical subregions are
// checked until a region is a Scop. It is now tried to extend this Scop by
// creating a larger non canonical region.

And it seems to make use of llvm’s notion of SESE regions as described in http://llvm.org/docs/doxygen/html/RegionInfo_8h_source.html

I see, this is interesting, thanks! Yeah, it seems you're doing
something similar with a specific focus on optimizing array operations;
this comment seems to be the explanation of the core of it:

It describes what we optimize, even though it is outdated.

// Detect the maximal Scops of a function.
//
// A static control part (Scop) is a subgraph of the control flow graph
(CFG)
// that only has statically known control flow and can therefore be
described
// within the polyhedral model.
//
// Every Scop fullfills these restrictions:
//
// * It is a single entry single exit region
//
// * Only affine linear bounds in the loops
//
// Every natural loop in a Scop must have a number of loop iterations
that can
// be described as an affine linear function in surrounding loop
iterators or
// parameters. (A parameter is a scalar that does not change its value
during
// execution of the Scop).

Now look bounds can be arbitrary Presburger Formula, meaning we also support boolean comparisions (||, &&), the ? operator and min/max.

// * Only comparisons of affine linear expressions in conditions

Same here. We now support also boolean comparisions (||, &&), the ? operator and min/max.

// * All loops and conditions perfectly nested
//
// The control flow needs to be structured such that it could be written
using
// just 'for' and 'if' statements, without the need for any 'goto',
'break' or
// 'continue'.

We now allow almost arbitrary control flow (except irreducible loops).

// * Side effect free functions call
//
// Only function calls and intrinsics that do not have side effects are
allowed
// (readnone).

Also not true any more. In cases of exceptional function calls (bounds-checks, conditional debugging, ...) we optimistically ignore them and
dynamically verify that dropping them was valid (falling back to the
original code otherwise).

// The Scop detection finds the largest Scops by checking if the largest
// region is a Scop. If this is not the case, its canonical subregions are
// checked until a region is a Scop. It is now tried to extend this Scop by
// creating a larger non canonical region.

And it seems to make use of llvm's notion of SESE regions as described
in LLVM: include/llvm/Analysis/RegionInfo.h Source File

Yes, we do.

Best,
Tobias