[PATCH] Split LoopUnroll pass into mechanism and policy

Hi,

the attached patch splits the loop unroll pass into a LoopUnroll superclass
that implements the unrolling mechanism, and a SimpleLoopUnroll subclass
implementing the current policy. This split is modeled after the split between
Inliner and SimpleInliner.

The superclass currently still finds out the TripCount and TripMultiple, and
passes those, together with the Loop in question, to a policy method.
Currently, TripMultiple is not yet used in the SimpleLoopUnroll, but I can
imagine that this might change, so I included it already.

Currently, there is only the SimpleLoopUnroll subclass, so there is no direct
gain. However, this opens up the way for other policies and in particular I
need to implement a platform specific policy so I require this change.

Comments are welcome, I hope this change can be committed.

Gr.

Matthijs

split-unroll.diff (14.7 KB)

Hi,

please find an updated patch attached that incorporates the (trivial) changes
introduced by r50696 and which makes this patch apply cleanly again.

Gr.

Matthijs

split-unroll.diff (14.7 KB)

Hello Matthijs,

Separating mechanism from policy is a good thing for the LoopUnroll
pass. Instead of moving the policy to a subclass though, I think it'd
be better to move the mechanism, the unrollLoop function, out to be a
standalone utility function, with the LoopInfo object passed in
explicitly. FoldBlockIntoPredecessor would also be good to make into
a standalone utility function, since it isn't specific to unrolling.

This should still make it easy to write new unrolling passes with
custom heuristics, but it would also be more flexible for passes to
do unrolling in combination with other transformations.

What do you think?

Dan

Dan,

Dunno what he intends, but doing unrolling during other
transformations is useful for things like SLP style vectorization,
etc.

Hi Dan,

This should still make it easy to write new unrolling passes with
custom heuristics, but it would also be more flexible for passes to
do unrolling in combination with other transformations.

Agreed.

To shed some light on my specific requirements, I am working with an
architecture that has only limited support for loops and control flow. This
means that loop unrolling is not a matter of performance, but any loop that
can't be mapped on the architecture must be unrolled to be able to work. So,
I don't really need the added flexibility, though I agree that it is a good
idea nonetheless.

Splitting off the second part of unrollLoop (the actual unrolling, after
deciding to unroll the loop) shouldn't be a problem. I am a bit concerned
about the first part (before the decision), which finds out the trip count and
trip multiple. Since both policy and mechanism need these values, that code
must be shared somehow as well. I see two options for that:
1. Put two methods in the utility class, each of which return one of those
    values.
2. Add two methods to the Loop class. Currently, there is a getTripCount()
    method that returns the trip count as a Value*. Perhaps we could add a
    getTripCountConst() (or getTripCountInt() ?) and getTripCountMultiple() to
    Loop?

It seems to me the second option is nicer, though it shouldn't make too much
of a difference.

Furthermore, where to put this utility class? lib/Transforms/UnrollLoop.cpp
seems reasonable?

Gr.

Matthijs

Hi Dan,

This should still make it easy to write new unrolling passes with
custom heuristics, but it would also be more flexible for passes to
do unrolling in combination with other transformations.

Agreed.

To shed some light on my specific requirements, I am working with an
architecture that has only limited support for loops and control flow. This
means that loop unrolling is not a matter of performance, but any loop that
can't be mapped on the architecture must be unrolled to be able to work. So,
I don't really need the added flexibility, though I agree that it is a good
idea nonetheless.

Ok.

Splitting off the second part of unrollLoop (the actual unrolling, after
deciding to unroll the loop) shouldn't be a problem. I am a bit concerned
about the first part (before the decision), which finds out the trip count and
trip multiple. Since both policy and mechanism need these values, that code
must be shared somehow as well. I see two options for that:
1. Put two methods in the utility class, each of which return one of those
   values.
2. Add two methods to the Loop class. Currently, there is a getTripCount()
   method that returns the trip count as a Value*. Perhaps we could add a
    getTripCountConst() (or getTripCountInt() ?) and getTripCountMultiple() to
    Loop?

It seems to me the second option is nicer, though it shouldn't make too much
of a difference.

The second option sounds nicer to me too. Is the name
getSmallConstantTripCount too paranoid?

Furthermore, where to put this utility class? lib/Transforms/UnrollLoop.cpp
seems reasonable?

Transform utilities can go under lib/Transforms/Utils. :slight_smile:

Dan

Hi Dan,

> 2. Add two methods to the Loop class. Currently, there is a
> getTripCount() method that returns the trip count as a Value*. Perhaps we
> could add a getTripCountConst() (or getTripCountInt() ?) and
> getTripCountMultiple() to Loop?

The second option sounds nicer to me too. Is the name
getSmallConstantTripCount too paranoid?

Paranoid in what sense? I generally like long names, so this sounds fine by me
:slight_smile:

I assume the Small part is because it limits to 32 bits?

Transform utilities can go under lib/Transforms/Utils. :slight_smile:

Hmm, that's exactly what I _meant_ to type :slight_smile:

I'll implement this in the morning (T minus 10).

Gr.

Matthijs

Another example is the combined unroll and prefetch technique
described in the AMD 5.5

Dan

Hi Dan,

in the AMD 5.5

What's that? Got a link? Google only finds some kind of golf cart powered by
the "Advanced Motors & Drivers 5.5" engine :slight_smile:

Gr.

Matthijs

Oops, I meant AMD's Software Optimization Guide, section 5.5.

Dan

Hi All,

the attached patch performs the splitting in the proposed manner.
before applying the patch, please execute
  svn cp lib/Transforms/Scalar/LoopUnroll.cpp lib/Transforms/Utils/UnrollLoop.cpp
to make the patch apply and preserve proper history.

Transforms/Utils/UnrollLoop.cpp contains the unrollLoop function, which is now
used by the LoopUnroll pass. I've also moved the RemapInstruction and
FoldBlockIntoPredecessor support functions there.

Additionally, I've also moved the loop unrolling statistics into the Utils file. Is
that a good idea?

I've added two new methods to Loop (or LoopBase<> really):
getSmallConstantTripCount() and getSmallConstantTripMultiple(), which return
the trip count and multiple as a normal unsigned value.

There are a number of remaining issues, though. Firstly, I observe some code
duplication. Every pass that performs loop unrolling, should depend on the
LoopSimplify, LCSSA and LoopInfo passes. Also, the UnrollLoop code always
preservers the latter two passes. This means a large part of the
getAnalysisUsage for every unroll pass will be copy-pasted. This could be
solved by adding an extra "defaultAnalysisUsage()" function to
Utils/UnrollLoop.cpp or something similar, but I'm not sure that's very pretty.
Any suggestions?

Additionally, after loop unrolling, the following check is made:
  // Update the loop information for this loop.
  // If we completely unrolled the loop, remove it from the parent.
  if (L->getNumBackEdges() == 0)
    LPM.deleteLoopFromQueue(L);

This check probably needs to happen in every unroll pass that use loopUnroll.
We could move this check into the loopUnroll function, but that requires
passing in the LoopPassManager (LPM) to loopUnroll as well. Is that wanted?

Overall, I've really only moved code, so behaviour should be unchanged. The
only functional change I think I made, was that the old code checked to see if
the terminator instruction of the latch block was a conditional branch first,
before looking at the trip count and the treshold. Since that check moved
together with unrollLoop, it is done a bit later. However, AFAICS, none of the
code (getTripCount, code size estimation) actually depends on this check, so it
shouldn't change anything in the end.

Lastly, the (original comments) say "This pass will multi-block loops only if they
contain no non-unrolled subloops". I can't really find what this is supposed to
mean, in particular I can't find any explicit check for subloops anywhere.
Anyone know what this means?

I've also tried to minimize the includes required in both cpp files, seems like
there were quite some unused ones present.

The new code seems to work on an adhoc test case, but there seems to be a
problem with one of the standard testcases. I'll solve that later today. For
now, please review the patch :slight_smile:

Gr.

Matthijs

unroll-split.diff (31.8 KB)

Hi All,

The new code seems to work on an adhoc test case, but there seems to be a
problem with one of the standard testcases. I'll solve that later today. For
now, please review the patch :slight_smile:

new patch is attached. There was a small logic error, which is now fixed by
making getSmallConstantTripMultiple return 1 when it doesn't have any useful
result (instead of 0). Makes more sense, since the trip count is always a
multiple of 1 :slight_smile:

This only leaves one failing test case (test/CodeGen/X86/vec_set-5.ll), but I
don't think it's related to this patch.

Gr.

Matthijs

unroll-split.diff (31.8 KB)

Hello Matthijs,

Hi All,

the attached patch performs the splitting in the proposed manner.
before applying the patch, please execute
  svn cp lib/Transforms/Scalar/LoopUnroll.cpp lib/Transforms/Utils/UnrollLoop.cpp
to make the patch apply and preserve proper history.

Transforms/Utils/UnrollLoop.cpp contains the unrollLoop function, which is now
used by the LoopUnroll pass. I've also moved the RemapInstruction and
FoldBlockIntoPredecessor support functions there.

Additionally, I've also moved the loop unrolling statistics into the Utils file. Is
that a good idea?

I've added two new methods to Loop (or LoopBase<> really):
getSmallConstantTripCount() and getSmallConstantTripMultiple(), which return
the trip count and multiple as a normal unsigned value.

There are a number of remaining issues, though. Firstly, I observe some code
duplication. Every pass that performs loop unrolling, should depend on the
LoopSimplify, LCSSA and LoopInfo passes. Also, the UnrollLoop code always
preservers the latter two passes. This means a large part of the
getAnalysisUsage for every unroll pass will be copy-pasted. This could be
solved by adding an extra "defaultAnalysisUsage()" function to
Utils/UnrollLoop.cpp or something similar, but I'm not sure that's very pretty.
Any suggestions?

I'm not sure that's very pretty either :-}. My sense is that it's
better to just duplicate those few lines. The unrollLoop function
verifies isLCSSAForm with an assert, and LoopSimplify and LoopInfo
are required by pretty much every loop pass. For the addPreserved
lines, I guess unrollLoop should just have documented the passes
it preserves. An assert at the end that checks that if the loop
wasn't completely unrolled that it's still isLCSSAForm would be a
good addition in any case.

Additionally, after loop unrolling, the following check is made:
// Update the loop information for this loop.
// If we completely unrolled the loop, remove it from the parent.
if (L->getNumBackEdges() == 0)
   LPM.deleteLoopFromQueue(L);

This check probably needs to happen in every unroll pass that use loopUnroll.
We could move this check into the loopUnroll function, but that requires
passing in the LoopPassManager (LPM) to loopUnroll as well. Is that wanted?

I don't think it's desirable to require a LoopPassManager object,
but I think it would be fine to have an optional LoopPassManager
parameter which can be null for this purpose. If a LoopPassManager
object is provided and the loop is completely unrolled, the
loop is removed from the LoopPassManager.

Overall, I've really only moved code, so behaviour should be unchanged. The
only functional change I think I made, was that the old code checked to see if
the terminator instruction of the latch block was a conditional branch first,
before looking at the trip count and the treshold. Since that check moved
together with unrollLoop, it is done a bit later. However, AFAICS, none of the
code (getTripCount, code size estimation) actually depends on this check, so it
shouldn't change anything in the end.

Ok.

Lastly, the (original comments) say "This pass will multi-block loops only if they
contain no non-unrolled subloops". I can't really find what this is supposed to
mean, in particular I can't find any explicit check for subloops anywhere.
Anyone know what this means?

I don't.

I've also tried to minimize the includes required in both cpp files, seems like
there were quite some unused ones present.

Yep, it's not uncommon for unused includes to accumulate
as the code evolves. Thanks for cleaning that up.

The new code seems to work on an adhoc test case, but there seems to be a
problem with one of the standard testcases. I'll solve that later today. For
now, please review the patch :slight_smile:

+ /// getSmallConstantTripCount - Returns the trip count of this loop as a
+ /// normal unsigned value, if possible. Returns 0 if the trip count is unknown
+ /// of not constant. Will also return 0 if the trip count is very large
+ /// (> 2^32)

This should say (>= 2^32).

+ inline unsigned getSmallConstantTripCount() const {
+ Value* TripCount = this->getTripCount();
+ if (TripCount) {
+ if (ConstantInt *TripCountC = dyn_cast<ConstantInt>(TripCount)) {
+ // Guard against huge trip counts. This also guards against assertions in
+ // APInt from the use of getZExtValue, below.

This comment is a little stale now. Just say it guards against huge
trip counts.

+ if (TripCountC->getValue().getActiveBits() <= 32) {
+ return (unsigned)TripCountC->getZExtValue();
+ }
+ }
+ }
+ return 0;
+ }

+ /// getSmallConstantTripMultiple - Returns the largest constant divisor of the
+ /// trip count of this loop as a normal unsigned value, if possible. This
+ /// means that the actual trip count is always a multiple of the returned
+ /// value (don't forget it could very well be zero as well!).

I know you corrected this elsewhere, but this comment still says the
multiple could be zero.

+ /// Returns 1 if the trip count is unknown or not guaranteed to be the
+ /// multiple of a constant (which is also the case if the trip count is simply
+ /// constant, use getSmallConstantTripCount for that case), Will also return 1
+ /// if the trip count is very large (> 2^32).

I know you're just refactoring existing code, but as a standalone utility
getSmallConstantTripMultiple should really handle the case where the trip
count is constant. Eventually, it should be taught a few more operators
than just Mul too, such as Shl and And, but you don't need to worry
about that now.

And again >= instead of >.

--- include/llvm/Transforms/Utils/UnrollLoop.h (revision 0)
+++ include/llvm/Transforms/Utils/UnrollLoop.h (revision 0)

This file misses the comments that all LLVM headers have.

We may someday add more loop utilities, in which case we may want
to rename this header and add other stuff to it, but we can worry
about that when the need arrises. This is fine for now.

+bool unrollLoop(Loop *L, unsigned Count, LoopInfo* LI);

Please capitalize this to UnrollLoop. LLVM is a little inconsistent
about capitalization in some ways, but standalone transforming
utility functions are pretty consistently capitalized.

Thanks!

Dan

Hi Dan,

I'm not sure that's very pretty either :-}. My sense is that it's
better to just duplicate those few lines.

Then I'll just leave it at that.

An assert at the end that checks that if the loop
wasn't completely unrolled that it's still isLCSSAForm would be a
good addition in any case.

Good idea.

I don't think it's desirable to require a LoopPassManager object,
but I think it would be fine to have an optional LoopPassManager
parameter which can be null for this purpose. If a LoopPassManager
object is provided and the loop is completely unrolled, the
loop is removed from the LoopPassManager.

Sounds fine.

> Lastly, the (original comments) say "This pass will multi-block loops
> only if they contain no non-unrolled subloops". i Anyone know what this
> means?
I don't.

Then I'll just remove the comment, it's only confusing :slight_smile:

This should say (>= 2^32).

Obviously.

> + // Guard against huge trip counts. This also guards against
> assertions in
> + // APInt from the use of getZExtValue, below.
>
This comment is a little stale now. Just say it guards against huge
trip counts.

I didn't fully understand that comment either, though it seemed to make some
sense :slight_smile: I'll remove it.

> /// This means that the actual trip count is always a multiple of the
> /// returned value (don't forget it could very well be zero as well!).
I know you corrected this elsewhere, but this comment still says the
multiple could be zero.

Actually, I meant that the trip count could very well be zero, not the
multiple. I'll clarify the comment..

+ /// Returns 1 if the trip count is unknown or not guaranteed to be
the
+ /// multiple of a constant (which is also the case if the trip
count is simply
+ /// constant, use getSmallConstantTripCount for that case), Will
also return 1
+ /// if the trip count is very large (> 2^32).

I know you're just refactoring existing code, but as a standalone utility
getSmallConstantTripMultiple should really handle the case where the trip
count is constant.

Handle it how? By returning that constant? I had thought about this and
decided for returning 0 instead, but now it seems returning the trip count
instead of 1 is a better idea (since that's the largest constant that the trip
count will always be a multiple of). I'll add that.

Eventually, it should be taught a few more operators than just Mul too, such
as Shl and And, but you don't need to worry about that now.

Yeah, that would be useful.

And again >= instead of >.

Willfix.

> --- include/llvm/Transforms/Utils/UnrollLoop.h (revision 0)
> +++ include/llvm/Transforms/Utils/UnrollLoop.h (revision 0)

This file misses the comments that all LLVM headers have.

Hmm, those slipped through :slight_smile: I'll add them.

+bool unrollLoop(Loop *L, unsigned Count, LoopInfo* LI);

Please capitalize this to UnrollLoop. LLVM is a little inconsistent
about capitalization in some ways, but standalone transforming
utility functions are pretty consistently capitalized.

Okay.

Thanks for reviewing the patch. I'll incorporate the proposed changes when I'm
back in office next tuesday.

Gr.

Matthijs

Hi Dan,

please find an updated patch attached (don't forget to run the svn cp command
I posted earlier, before applying this patch).

I've incorporated all the changes we've discussed.

Gr.

Matthijs

Right, this time with a patch attached...

split-unroll.diff (33.6 KB)

Looks good. I just committed it as r51083.

Thanks,

Dan

> Hi Dan,
>
> please find an updated patch attached (don't forget to run the svn
> cp command
> I posted earlier, before applying this patch).
>
> I've incorporated all the changes we've discussed.

Looks good. I just committed it as r51083.

include/llvm/Transforms/Utils/UnrollLoop.h is missing