[RFC] Setting the current debug loc when the insertion point changes

Hi everybody,

I’m investigating some correctness issues with debug line table information in optimized code. I’ve noticed something problematic in IRBuilder. Setting the insertion point of an IRBuilder sets the builder’s current debug loc to the one attached to the insertion point. IMO this isn’t always a correct or convenient thing to do, and it shouldn’t be the default behavior.

Correctness

The IRBuilder shouldn’t assume that the debug loc at its insertion point applies to any instructions it emits. It doesn’t have enough information to make that connection. This assumption leads to bugs like llvm.org/PR25630. I.e it creates situations in which we silently push around incorrect DILocations without realizing it, because the debug locs in the IR look nice and contiguous.

To fix the issue, clients of IRBuilder should be required to either:

  1. Explicitly set the correct debug location before inserting instructions, or
  2. Opt-in to the current behavior of propagating the debug loc from the insertion point.

Convenience

The current behavior of IRBuilder can be convenient when it’s correct. But it can be really inconvenient when it’s incorrect.

If a client sets an IRBuilder’s debug loc correctly and makes a call that changes the builder’s insertion point, all generated instructions will pick up a bogus debug loc from the IP. E.g this is what happens with SCEVExpander::expandCodeFor(). There’s no way to fix this bug without changing IRBuilder’s behavior.

Solutions

  1. Add a method called SetInsertPointAndDebugLoc() which behaves identically to the current SetInsertPoint(). Change SetInsertPoint() so that it doesn’t change the current debug loc.
  2. Add a boolean argument to SetInsertPoint() called UpdateDbgLoc which defaults to false. Only change the current debug loc if UpdateDbgLoc is true.

I have a mild preference for idea #1 because it’s easier to read, IMO.

Whichever solution we pick, my plan is to first migrate all in-tree users of SetInsertPoint() to the explicit API, and to then gradually fix passes which rely on the current incorrect behavior.

I’d appreciate your thoughts on this!

thanks,
vedant

Hi everybody,

I'm investigating some correctness issues with debug line table information in optimized code. I've noticed something problematic in IRBuilder. Setting the insertion point of an IRBuilder sets the builder’s current debug loc to the one attached to the insertion point. IMO this isn't always a correct or convenient thing to do, and it shouldn't be the default behavior.

# Correctness

The IRBuilder shouldn't assume that the debug loc at its insertion point applies to any instructions it emits. It doesn't have enough information to make that connection. This assumption leads to bugs like llvm.org/PR25630. I.e it creates situations in which we silently push around incorrect DILocations without realizing it, because the debug locs in the IR look nice and contiguous.

I agree with your analysis.

To fix the issue, clients of IRBuilder should be required to either:

1. Explicitly set the correct debug location before inserting instructions, or
2. Opt-in to the current behavior of propagating the debug loc from the insertion point.

# Convenience

The current behavior of IRBuilder can be convenient when it's correct. But it can be *really* inconvenient when it's incorrect.

If a client sets an IRBuilder's debug loc correctly and makes a call that changes the builder's insertion point, all generated instructions will pick up a bogus debug loc from the IP. E.g this is what happens with SCEVExpander::expandCodeFor(). There's no way to fix this bug without changing IRBuilder's behavior.

# Solutions

1. Add a method called SetInsertPointAndDebugLoc() which behaves identically to the current SetInsertPoint(). Change SetInsertPoint() so that it doesn't change the current debug loc.
2. Add a boolean argument to SetInsertPoint() called UpdateDbgLoc which defaults to false. Only change the current debug loc if UpdateDbgLoc is true.

I have a mild preference for idea #1 because it's easier to read, IMO.

Whichever solution we pick, my plan is to first migrate all in-tree users of SetInsertPoint() to the explicit API, and to then gradually fix passes which rely on the current incorrect behavior.

Is the behavior of inheriting the debug location of the insertion point ever what we want? I guess I can see how it can be useful when expanding an existing instruction into a sequence, but I wonder if any clients actually correctly use it like that today. If there aren't many uses, we could also add a (DILocation *) argument to SetInsertPoint to force every client to think about how to properly set the location in their use-case.

I also like the idea of renaming the method to make it really obvious what happens here. Could we *gasp* even fix the capitalization?

We should also think about how to map this to the C API.

-- adrian

Hi everybody,

I'm investigating some correctness issues with debug line table information
in optimized code. I've noticed something problematic in IRBuilder. Setting
the insertion point of an IRBuilder sets the builder’s current debug loc to
the one attached to the insertion point. IMO this isn't always a correct or
convenient thing to do, and it shouldn't be the default behavior.

# Correctness

The IRBuilder shouldn't assume that the debug loc at its insertion point
applies to any instructions it emits. It doesn't have enough information to
make that connection. This assumption leads to bugs like llvm.org/PR25630.
I.e it creates situations in which we silently push around incorrect
DILocations without realizing it, because the debug locs in the IR look nice
and contiguous.

I remember that one, and I got to your same conclusion at the time
when analyzing the bug, so thanks for writing this down :slight_smile:

To fix the issue, clients of IRBuilder should be required to either:

1. Explicitly set the correct debug location before inserting instructions,
or
2. Opt-in to the current behavior of propagating the debug loc from the
insertion point.

# Convenience

The current behavior of IRBuilder can be convenient when it's correct. But
it can be *really* inconvenient when it's incorrect.

If a client sets an IRBuilder's debug loc correctly and makes a call that
changes the builder's insertion point, all generated instructions will pick
up a bogus debug loc from the IP. E.g this is what happens with
SCEVExpander::expandCodeFor(). There's no way to fix this bug without
changing IRBuilder's behavior.

# Solutions

1. Add a method called SetInsertPointAndDebugLoc() which behaves identically
to the current SetInsertPoint(). Change SetInsertPoint() so that it doesn't
change the current debug loc.
2. Add a boolean argument to SetInsertPoint() called UpdateDbgLoc which
defaults to false. Only change the current debug loc if UpdateDbgLoc is
true.

I have a mild preference for idea #1 because it's easier to read, IMO.

Whichever solution we pick, my plan is to first migrate all in-tree users of
SetInsertPoint() to the explicit API, and to then gradually fix passes which
rely on the current incorrect behavior.

My preference is as well for 1. I kind of dislike adding bool
parameters to API as it becomes easier to get them wrong.
Your plan makes sense to me, regardless. I'll be happy to review the patches.

Thanks!

Hi everybody,

I'm investigating some correctness issues with debug line table information in optimized code. I've noticed something problematic in IRBuilder. Setting the insertion point of an IRBuilder sets the builder’s current debug loc to the one attached to the insertion point. IMO this isn't always a correct or convenient thing to do, and it shouldn't be the default behavior.

# Correctness

The IRBuilder shouldn't assume that the debug loc at its insertion point applies to any instructions it emits. It doesn't have enough information to make that connection. This assumption leads to bugs like llvm.org/PR25630. I.e it creates situations in which we silently push around incorrect DILocations without realizing it, because the debug locs in the IR look nice and contiguous.

I agree with your analysis.

To fix the issue, clients of IRBuilder should be required to either:

1. Explicitly set the correct debug location before inserting instructions, or
2. Opt-in to the current behavior of propagating the debug loc from the insertion point.

# Convenience

The current behavior of IRBuilder can be convenient when it's correct. But it can be *really* inconvenient when it's incorrect.

If a client sets an IRBuilder's debug loc correctly and makes a call that changes the builder's insertion point, all generated instructions will pick up a bogus debug loc from the IP. E.g this is what happens with SCEVExpander::expandCodeFor(). There's no way to fix this bug without changing IRBuilder's behavior.

# Solutions

1. Add a method called SetInsertPointAndDebugLoc() which behaves identically to the current SetInsertPoint(). Change SetInsertPoint() so that it doesn't change the current debug loc.
2. Add a boolean argument to SetInsertPoint() called UpdateDbgLoc which defaults to false. Only change the current debug loc if UpdateDbgLoc is true.

I have a mild preference for idea #1 because it's easier to read, IMO.

Whichever solution we pick, my plan is to first migrate all in-tree users of SetInsertPoint() to the explicit API, and to then gradually fix passes which rely on the current incorrect behavior.

Is the behavior of inheriting the debug location of the insertion point ever what we want? I guess I can see how it can be useful when expanding an existing instruction into a sequence, but I wonder if any clients actually correctly use it like that today. If there aren't many uses, we could also add a (DILocation *) argument to SetInsertPoint to force every client to think about how to properly set the location in their use-case.

There's at least one correct use of the current SetInsertPoint() API in speculateSelectInstLoads() :). As you predicted, it expands an existing instruction into a sequence.

I'm not sure how many correct uses of the API we have. There are over 300 uses in just {llvm,clang}/lib. I think that prevents us from adding a mandatory argument to SetInsertPoint(): we wouldn't be able to promptly fix each use of the API.

OTOH, if we added a method like SetInsertPointAndDebugLoc(), we could incrementally eliminate as many uses of that method as possible.

I also like the idea of renaming the method to make it really obvious what happens here. Could we *gasp* even fix the capitalization?

Sure :).

We should also think about how to map this to the C API.

Wdyt of exposing SetInsertPointAndDebugLoc() via the C API, with a note explaining that it will eventually be deprecated?

vedant

Hi everybody,

I'm investigating some correctness issues with debug line table information in optimized code. I've noticed something problematic in IRBuilder. Setting the insertion point of an IRBuilder sets the builder’s current debug loc to the one attached to the insertion point. IMO this isn't always a correct or convenient thing to do, and it shouldn't be the default behavior.

# Correctness

The IRBuilder shouldn't assume that the debug loc at its insertion point applies to any instructions it emits. It doesn't have enough information to make that connection. This assumption leads to bugs like llvm.org/PR25630. I.e it creates situations in which we silently push around incorrect DILocations without realizing it, because the debug locs in the IR look nice and contiguous.

I agree with your analysis.

To fix the issue, clients of IRBuilder should be required to either:

1. Explicitly set the correct debug location before inserting instructions, or
2. Opt-in to the current behavior of propagating the debug loc from the insertion point.

# Convenience

The current behavior of IRBuilder can be convenient when it's correct. But it can be *really* inconvenient when it's incorrect.

If a client sets an IRBuilder's debug loc correctly and makes a call that changes the builder's insertion point, all generated instructions will pick up a bogus debug loc from the IP. E.g this is what happens with SCEVExpander::expandCodeFor(). There's no way to fix this bug without changing IRBuilder's behavior.

# Solutions

1. Add a method called SetInsertPointAndDebugLoc() which behaves identically to the current SetInsertPoint(). Change SetInsertPoint() so that it doesn't change the current debug loc.
2. Add a boolean argument to SetInsertPoint() called UpdateDbgLoc which defaults to false. Only change the current debug loc if UpdateDbgLoc is true.

I have a mild preference for idea #1 because it's easier to read, IMO.

Whichever solution we pick, my plan is to first migrate all in-tree users of SetInsertPoint() to the explicit API, and to then gradually fix passes which rely on the current incorrect behavior.

Is the behavior of inheriting the debug location of the insertion point ever what we want? I guess I can see how it can be useful when expanding an existing instruction into a sequence, but I wonder if any clients actually correctly use it like that today. If there aren't many uses, we could also add a (DILocation *) argument to SetInsertPoint to force every client to think about how to properly set the location in their use-case.

There's at least one correct use of the current SetInsertPoint() API in speculateSelectInstLoads() :). As you predicted, it expands an existing instruction into a sequence.

I'm not sure how many correct uses of the API we have. There are over 300 uses in just {llvm,clang}/lib. I think that prevents us from adding a mandatory argument to SetInsertPoint(): we wouldn't be able to promptly fix each use of the API.

I thought that your proposal was to add SetInsertPointAndDebugLoc() *and* convert all existing users over at the same time, and then audit them for correctness one-by-one (and potentially move them over to the new SetInsertPoint()). I don't see how this is cheaper than adding an extra parameter to SetInsertPoint(). The auditing needs to be done either way. Am I misunderstanding your proposal?

-- adrian

Hi everybody,

I’m investigating some correctness issues with debug line table information in optimized code. I’ve noticed something problematic in IRBuilder. Setting the insertion point of an IRBuilder sets the builder’s current debug loc to the one attached to the insertion point. IMO this isn’t always a correct or convenient thing to do, and it shouldn’t be the default behavior.

Correctness

The IRBuilder shouldn’t assume that the debug loc at its insertion point applies to any instructions it emits. It doesn’t have enough information to make that connection. This assumption leads to bugs like llvm.org/PR25630. I.e it creates situations in which we silently push around incorrect DILocations without realizing it, because the debug locs in the IR look nice and contiguous.

I agree with your analysis.

To fix the issue, clients of IRBuilder should be required to either:

  1. Explicitly set the correct debug location before inserting instructions, or
  2. Opt-in to the current behavior of propagating the debug loc from the insertion point.

Convenience

The current behavior of IRBuilder can be convenient when it’s correct. But it can be really inconvenient when it’s incorrect.

If a client sets an IRBuilder’s debug loc correctly and makes a call that changes the builder’s insertion point, all generated instructions will pick up a bogus debug loc from the IP. E.g this is what happens with SCEVExpander::expandCodeFor(). There’s no way to fix this bug without changing IRBuilder’s behavior.

Solutions

  1. Add a method called SetInsertPointAndDebugLoc() which behaves identically to the current SetInsertPoint(). Change SetInsertPoint() so that it doesn’t change the current debug loc.
  2. Add a boolean argument to SetInsertPoint() called UpdateDbgLoc which defaults to false. Only change the current debug loc if UpdateDbgLoc is true.

I have a mild preference for idea #1 because it’s easier to read, IMO.

Whichever solution we pick, my plan is to first migrate all in-tree users of SetInsertPoint() to the explicit API, and to then gradually fix passes which rely on the current incorrect behavior.

Is the behavior of inheriting the debug location of the insertion point ever what we want? I guess I can see how it can be useful when expanding an existing instruction into a sequence, but I wonder if any clients actually correctly use it like that today. If there aren’t many uses, we could also add a (DILocation *) argument to SetInsertPoint to force every client to think about how to properly set the location in their use-case.

There’s at least one correct use of the current SetInsertPoint() API in speculateSelectInstLoads() :). As you predicted, it expands an existing instruction into a sequence.

I’m not sure how many correct uses of the API we have. There are over 300 uses in just {llvm,clang}/lib. I think that prevents us from adding a mandatory argument to SetInsertPoint(): we wouldn’t be able to promptly fix each use of the API.

I thought that your proposal was to add SetInsertPointAndDebugLoc() and convert all existing users over at the same time, and then audit them for correctness one-by-one (and potentially move them over to the new SetInsertPoint()).

Yes.

I don’t see how this is cheaper than adding an extra parameter to SetInsertPoint(). The auditing needs to be done either way. Am I misunderstanding your proposal?

Sorry, I actually think I misunderstood your suggestion.

Did you mean that we should add an extra parameter to SetInsertPoint(), convert all in-tree users so they call SetInsertPoint(IP, IP->getDebugLoc()), and then audit all uses of the API? If so, I think that’s workable, but it presents a difficult migration problem. There’s an overload of SetInsertPoint which accepts a BasicBlock::iterator, so we’d have to detect that and write this in-line:

IRB.SetInsertPoint(BB, IP, (IP != BB->end()) ? IP->getDebugLoc() : IBR.getCurrentDebugLocation())

I do like that this approach forces clients of IRBuilder to be intentional about setting debug locs, but it looks like it could get messy.

Wdyt of adopting parts of both solutions? I.e, we’d have the following methods:

// Transition API

  • SetInsertPointAndDebugLoc(BasicBlock *)
  • SetInsertPointAndDebugLoc(Instruction *)
  • SetInsertPointAndDebugLoc(BasicBlock *, BasicBlock::iterator)

// Future API

  • SetInsertPoint(BasicBlock *, const DebugLoc &)
  • SetInsertPoint(Instruction *, const DebugLoc &)
  • SetInsertPoint(BasicBlock *, BasicBlock::iterator, const DebugLoc &)

We’d make this work by renaming SetInsertPoint to SetInsertPointAndDebugLoc and marking it as deprecated. This would make the transition easy (run sed in a loop), we’d get the future API we want, and it becomes easy to find unaudited API calls.

vedant

There's also the opposite problem of the IRBuilder sometimes not being able to find a debug location at the insertion point when it is required to add one (e.g. when inserting a CallInst into a function that has debug info)... another argument for making it explicit!

There's also the opposite problem of the IRBuilder sometimes not being able to find a debug location at the insertion point when it is required to add one (e.g. when inserting a CallInst into a function that has debug info)... another argument for making it explicit!

Right. It's not too onerous to type "setInsertPoint(..., DebugLoc())" if the location really is unknown, and it's a lot more explicit.

Wdyt of adopting parts of both solutions? I.e, we'd have the following methods:

// Transition API
- SetInsertPointAndDebugLoc(BasicBlock *)
- SetInsertPointAndDebugLoc(Instruction *)
- SetInsertPointAndDebugLoc(BasicBlock *, BasicBlock::iterator)

// Future API
- SetInsertPoint(BasicBlock *, const DebugLoc &)
- SetInsertPoint(Instruction *, const DebugLoc &)
- SetInsertPoint(BasicBlock *, BasicBlock::iterator, const DebugLoc &)

+1 to this solution. We shouldn't silently change the behavior of the existing SetInsertPoint API.

Good to hear. I think it would be even simpler if we skip renaming the existing users, and just added the future APIs. I'll send out a patch.

thanks,
vedant