How to preserve loop's metadata after LoopSimplify transformation

Hi, folks. I work on a llvm issue and try to submit a patch.
It seems this discussion forum is more active, I want to repost the issue and receive more feedbacks.

define i16 @foo() {
entry:
  br label %while.cond

while.cond:                                       ; preds = %while.body.thread, %while.body, %entry
  %i.0 = phi i16 [ 0, %entry ], [ %inc4, %while.body.thread ], [ %inc, %while.body ]
  %j.0 = phi i16 [ 0, %entry ], [ %j.0, %while.body.thread ], [ %spec.select7, %while.body ]
  %cmp = icmp slt i16 %i.0, 10
  br i1 %cmp, label %while.body.thread, label %lor.end

while.body.thread:                                ; preds = %while.cond
  %inc4 = add nsw i16 %i.0, 1
  br label %while.cond, !llvm.loop !5

lor.end:                                          ; preds = %while.cond
  %cmp1.not = icmp eq i16 %j.0, 5
  br i1 %cmp1.not, label %while.end, label %while.body

while.body:                                       ; preds = %lor.end
  %inc = add nuw nsw i16 %i.0, 1
  %cmp2 = icmp eq i16 %inc, 12
  %spec.select7 = select i1 %cmp2, i16 5, i16 %j.0
  br label %while.cond, !llvm.loop !5

while.end:                                        ; preds = %lor.end
  ret i16 %i.0
}

!llvm.module.flags = !{!0, !1, !2, !3}
!llvm.ident = !{!4}

!0 = !{i32 7, !"Dwarf Version", i32 4}
!1 = !{i32 2, !"Debug Info Version", i32 3}
!2 = !{i32 1, !"wchar_size", i32 1}
!3 = !{i32 7, !"frame-pointer", i32 2}
!4 = !{!"clang version 16.0.0.prerel"}
!5 = distinct !{!5, !6}
!6 = !{!"llvm.loop.usefulinfo"}

The above loop is not in a canonical. It has a loop header (while.cond), an exit block (lor.end) and two latch blockes (while.body.thread, while.body).

The current implementation of LoopSimplify pass can transform it to a cannonical form.

define i16 @foo() {
entry:
  br label %while.cond.outer

while.cond.outer:                                 ; preds = %while.body, %entry
  %i.0.ph = phi i16 [ %inc, %while.body ], [ 0, %entry ]
  %j.0.ph = phi i16 [ %spec.select7, %while.body ], [ 0, %entry ]
  br label %while.cond

while.cond:                                       ; preds = %while.cond.outer, %while.body.thread
  %i.0 = phi i16 [ %inc4, %while.body.thread ], [ %i.0.ph, %while.cond.outer ]
  %cmp = icmp slt i16 %i.0, 10
  br i1 %cmp, label %while.body.thread, label %lor.end

while.body.thread:                                ; preds = %while.cond
  %inc4 = add nsw i16 %i.0, 1
  br label %while.cond, !llvm.loop !5

lor.end:                                          ; preds = %while.cond
  %cmp1.not = icmp eq i16 %j.0.ph, 5
  br i1 %cmp1.not, label %while.end, label %while.body

while.body:                                       ; preds = %lor.end
  %inc = add nuw nsw i16 %i.0, 1
  %cmp2 = icmp eq i16 %inc, 12
  %spec.select7 = select i1 %cmp2, i16 5, i16 %j.0.ph
  br label %while.cond.outer, !llvm.loop !5

while.end:                                        ; preds = %lor.end
  ret i16 %i.0
}

!llvm.module.flags = !{!0, !1, !2, !3}
!llvm.ident = !{!4}

!0 = !{i32 7, !"Dwarf Version", i32 4}
!1 = !{i32 2, !"Debug Info Version", i32 3}
!2 = !{i32 1, !"wchar_size", i32 1}
!3 = !{i32 7, !"frame-pointer", i32 2}
!4 = !{!"clang version 16.0.0.prerel"}
!5 = distinct !{!5, !6}
!6 = !{!"llvm.loop.usefulinfo"}

“llvm.loop.usefulinfo” is a custom metadata which describe a special property.

Basically, it breaks while.cond block into two blockes while.cond and while.cond.outer and creates nested loops (single loop → nested two loops). Using a nested structure, each loop would only have a single latch block and loops become canonical form.

The question is how to preserve metadata attached to the original latch block’s teminator.

Option 1: Preserve all metadata for both loop
Many other passes handle metadata by preserving all of them. However, it might lead to incorrect behavior. In this case, after loop-simplify, the inner loop does not hold the property “llvm.loop.usefulinfo”. It applies to the outer loop only.

Option 2: Drop all metadata for both loop
This seems the most conservative method. But it might drop useful metadata such as “llvm.loop.unroll.disable”".

Option 3: Preserve metadata only for the outer loop
In the above example, after transformation, the inner loop could drop “llvm.loop.usefulinfo”. But I do not know if it could be generalized. Especially, if the inner loop becomes more complex.

Any suggestion is welcome.
@bjope @uabelho

I would assume by default that the metadata still applies to both loops. For instance, the metadata llvm.loop.parallel_accesses still applies to both loops. This is also the default behavior for code that just duplicates loops, such as LoopVersioning, Unrolling, inlining, etc.

In a case-by-case, there might be more useful things to do depending on the metadata, and LoopSimplifty may want to handle those specifically. For instance, with llvm.loop.unroll.enable, you probably only want to unroll the inner loop, but not both.

“LoopID” is a misnomer, it is not a loop identifier. Otherwise every transformation that clones code had to specifically handle any llvm.loop that the cloned code contains. See LLVM Language Reference Manual — LLVM 16.0.0git documentation.

Thanks @Meinersbur for your reply.
This issue is not a bug.

According to Langref

Loop metadata nodes cannot be used as unique identifiers

  1. There is no need to update the LoopID.
  2. As you pointed out in the patch, different metadata should be attched to different layers of loop.
    llvm.loop.unroll.enable-> inner loop. isvectorized->both nested loop, etc. A downstream feature, llvm.loop.usefuldata should be handled by the downstream specifically.

I know, i wrote that myself after discovering myself that various passes just copy all instructions incl their metadata. The patch description seems to imply it had to be different.

We considered removing the “distinct”/first element pointing to itself requirement, but it doesn’t seem worth the compatibility issues.

I agree, although I think that LoopSimplify could still special-case some metadata it knows about. For instance, if partial unrolling the inner loop, you don’t need to also unroll the outer loop.

In any case, those loop metadata is supposed affect correctness, everything should still work even if a loop transformation is applied to both loops.

LoopUnroll might actually only ever unroll innermost loops, so the metadata on the outer loop might be just ignored (unless the inner loop is fully unrolled). A metadata design that would gracefully handle the situation could point to a “loop group” and LoopUnroll would only unroll the innermost of those loops. Something similar has been done for LoopVectorize and parallel_accesess because that indeed would affect correctness.