Debug metadata after simplifications

Hi all,

In order to identify loops I'm using the DILexicalScope metadata
attached to the loop latch, but with some combinations of optimisations
that metadata seems to disappear.

For example, when -simplifycfg removes a block because it only contains
a branch, and -loop-simplify recreates that block because it turned out
to be a back-edge of some loop, the metadata gets removed.

Specifically, using the following testcase:
  while (while_test()) {
    if (if_test()) {
      foo();
    } else {
      bar();
    }
  }

... Clang emits* the following IR:
*after slightly modifying the code generator so that it also emits
metadata for unconditional branches
  while.cond:
    %call = call zeroext i1 (...)* @while_test(), !dbg !8
    br i1 %call, label %while.body, label %while.end, !dbg !8
  while.body:
    %call1 = call zeroext i1 (...)* @if_test(), !dbg !9
    br i1 %call1, label %if.then, label %if.else, !dbg !9
  if.then:
    call void (...)* @foo(), !dbg !11
    br label %if.end, !dbg !13
  if.else:
    call void (...)* @bar(), !dbg !14
    br label %if.end, !dbg !16
  if.end:
    br label %while.cond, !dbg !17
  while.end:
    ret i32 0, !dbg !18

If I run -simplifycfg, the if.then, if.else and if.end blocks get
transformed as follows:
  if.then:
    call void (...)* @foo(), !dbg !11
    br label %while.cond, !dbg !13
  if.else:
    call void (...)* @bar(), !dbg !14
    br label %while.cond, !dbg !16

Removal of the back-edge (and its metadata) is problematic for other
loop passes, which is why -loop-simplify creates an artificial back-edge
like this:
  if.then:
    call void (...)* @foo(), !dbg !11
    br label %while.cond.backedge, !dbg !13
  if.else:
    call void (...)* @bar(), !dbg !14
    br label %while.cond.backedge, !dbg !16
  while.cond.backedge:
    br label %while.cond

Ultimately I end up with a back-edge with no metadata whatsoever,
breaking my loop identification. What is the best way to work around
this? Modify -simplifycfg so that it does not simplify loop back-edges?
Or somehow add the metadata to the new back-edges?

Sincerely,

In order to identify loops I'm using the DILexicalScope metadata

attached to the loop latch, but with some combinations of optimisations
that metadata seems to disappear.

Looks like line number information yes?

For example, when -simplifycfg removes a block because it only contains
a branch, and -loop-simplify recreates that block because it turned out
to be a back-edge of some loop, the metadata gets removed.

Specifically, using the following testcase:
  while (while_test()) {
    if (if_test()) {
      foo();
    } else {
      bar();
    }
  }

... Clang emits* the following IR:
*after slightly modifying the code generator so that it also emits
metadata for unconditional branches
  while.cond:
    %call = call zeroext i1 (...)* @while_test(), !dbg !8
    br i1 %call, label %while.body, label %while.end, !dbg !8
  while.body:
    %call1 = call zeroext i1 (...)* @if_test(), !dbg !9
    br i1 %call1, label %if.then, label %if.else, !dbg !9
  if.then:
    call void (...)* @foo(), !dbg !11
    br label %if.end, !dbg !13
  if.else:
    call void (...)* @bar(), !dbg !14
    br label %if.end, !dbg !16
  if.end:
    br label %while.cond, !dbg !17
  while.end:
    ret i32 0, !dbg !18

If I run -simplifycfg, the if.then, if.else and if.end blocks get
transformed as follows:
  if.then:
    call void (...)* @foo(), !dbg !11
    br label %while.cond, !dbg !13
  if.else:
    call void (...)* @bar(), !dbg !14
    br label %while.cond, !dbg !16

Removal of the back-edge (and its metadata) is problematic for other
loop passes, which is why -loop-simplify creates an artificial back-edge
like this:
  if.then:
    call void (...)* @foo(), !dbg !11
    br label %while.cond.backedge, !dbg !13
  if.else:
    call void (...)* @bar(), !dbg !14
    br label %while.cond.backedge, !dbg !16
  while.cond.backedge:
    br label %while.cond

Ultimately I end up with a back-edge with no metadata whatsoever,
breaking my loop identification. What is the best way to work around
this? Modify -simplifycfg so that it does not simplify loop back-edges?
Or somehow add the metadata to the new back-edges?

You're identifying loops in IR based on line number information?

That said, we could probably figure out a way to add line information to
the back edge if we're getting something wrong in our optimized line tables