loop metdata instruction

Hi,

I've been looking through past threads looking for an answer to why the loop metadata is attached to the loop latch branch. What is the reason for putting the metadata inside the loop rather than outside (for example on the branch into the loop header.) Note that I'm asking about llvm.loop.parallel not llvm.mem.parallel_loop_access which obviously must be inside the loop.

It seems that if the metadata is attached outside of the loop it is less likely that the metadata will be lost during loop transformations.

paul

Latch is a branch to the header. What branch in particular do you have in mind?

Loop latch identifies the loop. Loops with multiple latches are not very amenable to optimizations.

-Krzysztof

Hi,

I've been looking through past threads looking for an answer to why the loop metadata is attached to the loop latch branch. What is the reason for putting the metadata inside the loop rather than outside (for example on the branch into the loop header.)

Latch is a branch to the header. What branch in particular do you have in mind?

On the branch into the loop:

define void @foo(i32 %n, float* %a, float* %b) #0 {
entry:
  [...]
  br label %for.cond, !llvm.loop.parallel !0
for.cond: ; preds = %for.inc, %entry
  [...]
  br i1 %cmp, label %for.body, label %for.end
for.body: ; preds = %for.cond
  [...]
  br label %for.inc
for.inc: ; preds = %for.body
  [...]
  br label %for.cond
for.end: ; preds = %for.cond
  ret void
}

Which optimizes to the following (with a small change to LoopRotation.cpp)

define void @foo(i32 %n, float* nocapture %a, float* nocapture %b) #0 {
entry:
  %cmp6 = icmp sgt i32 %n, 0
  br i1 %cmp6, label %for.body, label %for.end, !llvm.loop.parallel !0

for.body: ; preds = %entry, %for.body
  %indvars.iv = phi i64 [ %indvars.iv.next, %for.body ], [ 0, %entry ]
  %arrayidx = getelementptr inbounds float* %b, i64 %indvars.iv
  %0 = load float* %arrayidx, align 4, !tbaa !1
  %arrayidx2 = getelementptr inbounds float* %a, i64 %indvars.iv
  store float %0, float* %arrayidx2, align 4, !tbaa !1
  %indvars.iv.next = add i64 %indvars.iv, 1
  %lftr.wideiv = trunc i64 %indvars.iv.next to i32
  %exitcond = icmp eq i32 %lftr.wideiv, %n
  br i1 %exitcond, label %for.end, label %for.body

for.end: ; preds = %for.body, %entry
  ret void
}

paul

Well... In this case the metadata is not on the latch. I guess this is because the initial loop structure is this questionable jump-to-cond-then-back scheme. I agree---this doesn't look right.

-Krzysztof

Hi,

On the branch into the loop:

Well... In this case the metadata is not on the latch. I guess this is
because the initial loop structure is this questionable
jump-to-cond-then-back scheme. I agree---this doesn't look right.

This loop structure is not questionable--it's what clang generates for a
for loop..

I'm suggesting this as a possible alternative to the loop latch approach.
Please re-read my original email.

paul

Your reply seemed like you're quoting code that is currently generated, not your proposal.

I'm not sure why you would want the loop metadata to be attached to some other branch. Loop latch will always exist, while, at least at this time, there is no guarantee that the "other" branch (the loop guard) will exist. Besides, there is a member function in the Loop class that gives you the latch block, from which you can easily get to the terminator and then the metadata.

As for the questionable part---I know that this is what clang generates, and there is probably little chance of changing it in near future, but this loop structure is just bad for optimization. The loop, as it is generated, has an early exit and an unconditional latch. What comes out of loop rotation in your example is what ideally it would be from the beginning. Apart from all that, SimplifyCFG would have it a lot easier not to damage loops that at point may not even appear to be loops.

-Krzysztof

Hi,

I'm suggesting this as a possible alternative to the loop latch
approach.
Please re-read my original email.

Your reply seemed like you're quoting code that is currently generated,
not your proposal.

Oops, sorry for not being clear. The code I presented is what I was
experimenting with.

I'm not sure why you would want the loop metadata to be attached to some
other branch. Loop latch will always exist, while, at least at this
time, there is no guarantee that the "other" branch (the loop guard)
will exist. Besides, there is a member function in the Loop class that
gives you the latch block, from which you can easily get to the
terminator and then the metadata.

Is this not true for the loop pre-header as well? (Putting the metadata on
the branch in the loop pre-header is basically what I proposed)

I don't know if one way is better than the other to be honest--which is
why I'm asking. One case where having the metadata outside the loop is
more convenient is loop unrolling. With the metadata inside the loop you
have to update multiple branches whereas metadata outside the loop you
should not have to do anything. It's possible there are cases where
metadata inside the loop is more convenient.

The point is if the two representations are equivalent (which is what I'd
like to determine) and one is more convenient/less intrusive on existing
passes than that should be determined sooner rather than later :stuck_out_tongue:

paul

Hi Paul,

Hi Pekka,

Hi Dmitry,

Sorry, first reply had bad formatting..

Hi Pekka,

Isn't it possible that multiple nested loops share the header and
the pre-header in normalized loops? Thus, then adding metadata to the
preheader's branch would make the MD ambiguous for nested loops.

The header can't be shared, otherwise it's the same loop. Though strictly speaking it depends on definition of the loop, but in my experience, the most practical definition is "loop == header".

I think Pekka is correct and in general you can't assume the header belongs to a single loop. However, for reducible CFGs I believe this is true.