Allowed operations for passes that report "no change"

I’m digging through a build failure [1], and it looks like the loop idiom recognizer adds some instructions [2], and then removes them again [3]. I don’t understand why yet, but the LegacyPassManager detects that the structural hash of the function has changed, and complains that the pass didn’t correctly report that it changed the function [4] (even though materially, it didn’t).

This raises a broader question of what we really mean when we say that a pass is allowed to report it made no changes through its runOn* function(s):

A) The obvious, and most restrictive condition would be that the pass can only return false if it treated the Function/Module/etc that it visited purely as read-only. From what I can tell, a lot of passes conservatively assume this is the required condition.

B) Less obvious would be that we allow passes to add instructions speculatively, so long as they remove whatever they added before returning false. If one were to dump the IR before & after such a pass, you should see no change in the text/bitcode. If you compared the in-memory representations, you should see no semantic differences in the two, modulo “allowed” differences like unordered lists of pointers (I vaguely remember a few cases of this existing, but can’t remember the details).

C) Even less obvious would be that we allow a pass that removes instructions / globals, and then re-adds them. These changes would be semantics-preserving, but would of course not preserve the removed bits of IR’s addresses in memory.

I believe (A) to be the spirit of the contract between passes and the pass manager. (B) seems like a stretch, but depending on the analysis invalidation needs of the PM, might still be ok. (C) seems totally off base, and does not seem like a productive interpretation.

Does this match everyone else’s intuition, and is (B) even a valid interpretation?

Cheers,

Jon

1: http://lab.llvm.org:8080/green/job/clang-stage1-cmake-RA-expensive/16803/consoleFull#-1371525106d489585b-5106-414a-ac11-3ff90657619c
2: https://github.com/llvm/llvm-project/blob/master/llvm/lib/Transforms/Scalar/LoopIdiomRecognize.cpp#L931
3: https://github.com/llvm/llvm-project/blob/master/llvm/lib/Transforms/Scalar/LoopIdiomRecognize.cpp#L936
4: https://github.com/llvm/llvm-project/blob/master/llvm/lib/IR/LegacyPassManager.cpp#L1591

This concern was brought up in the relevant review (https://reviews.llvm.org/D81230) as well. I think we can safely go with (A) here. It’s unambiguous, and I don’t think there’s any practical benefit to be had from (B) or (C).

Regards,

Nikita

I’m digging through a build failure [1], and it looks like the loop idiom recognizer adds some instructions [2], and then removes them again [3]. I don’t understand why yet, but the LegacyPassManager detects that the structural hash of the function has changed, and complains that the pass didn’t correctly report that it changed the function [4] (even though materially, it didn’t).

This raises a broader question of what we really mean when we say that a pass is allowed to report it made no changes through its runOn* function(s):

A) The obvious, and most restrictive condition would be that the pass can only return false if it treated the Function/Module/etc that it visited purely as read-only. From what I can tell, a lot of passes conservatively assume this is the required condition.

B) Less obvious would be that we allow passes to add instructions speculatively, so long as they remove whatever they added before returning false. If one were to dump the IR before & after such a pass, you should see no change in the text/bitcode. If you compared the in-memory representations, you should see no semantic differences in the two, modulo “allowed” differences like unordered lists of pointers (I vaguely remember a few cases of this existing, but can’t remember the details).

C) Even less obvious would be that we allow a pass that removes instructions / globals, and then re-adds them. These changes would be semantics-preserving, but would of course not preserve the removed bits of IR’s addresses in memory.

I believe (A) to be the spirit of the contract between passes and the pass manager. (B) seems like a stretch, but depending on the analysis invalidation needs of the PM, might still be ok. (C) seems totally off base, and does not seem like a productive interpretation.

Does this match everyone else’s intuition, and is (B) even a valid interpretation?

This concern was brought up in the relevant review (https://reviews.llvm.org/D81230) as well.

I think we can safely go with (A) here. It's unambiguous, and I don't think there's any practical benefit to be had from (B) or (C).

+1

Regards,
Nikita

Roman

I’m digging through a build failure [1], and it looks like the loop idiom recognizer adds some instructions [2], and then removes them again [3]. I don’t understand why yet, but the LegacyPassManager detects that the structural hash of the function has changed, and complains that the pass didn’t correctly report that it changed the function [4] (even though materially, it didn’t).

This raises a broader question of what we really mean when we say that a pass is allowed to report it made no changes through its runOn* function(s):

A) The obvious, and most restrictive condition would be that the pass can only return false if it treated the Function/Module/etc that it visited purely as read-only. From what I can tell, a lot of passes conservatively assume this is the required condition.

B) Less obvious would be that we allow passes to add instructions speculatively, so long as they remove whatever they added before returning false. If one were to dump the IR before & after such a pass, you should see no change in the text/bitcode. If you compared the in-memory representations, you should see no semantic differences in the two, modulo “allowed” differences like unordered lists of pointers (I vaguely remember a few cases of this existing, but can’t remember the details).

C) Even less obvious would be that we allow a pass that removes instructions / globals, and then re-adds them. These changes would be semantics-preserving, but would of course not preserve the removed bits of IR’s addresses in memory.

I believe (A) to be the spirit of the contract between passes and the pass manager. (B) seems like a stretch, but depending on the analysis invalidation needs of the PM, might still be ok. (C) seems totally off base, and does not seem like a productive interpretation.

Does this match everyone else’s intuition, and is (B) even a valid interpretation?

This concern was brought up in the relevant review (https://reviews.llvm.org/D81230) as well. I think we can safely go with (A) here. It’s unambiguous, and I don’t think there’s any practical benefit to be had from (B) or (C).

Thanks for the perfect reference!

https://reviews.llvm.org/D84071

Jon

It may change the use-list order, which won’t show up on a usual “print” but changes the visitation order of the def-use chain in the IR and so the result of some algorithm.

I’m digging through a build failure [1], and it looks like the loop idiom recognizer adds some instructions [2], and then removes them again [3]. I don’t understand why yet, but the LegacyPassManager detects that the structural hash of the function has changed, and complains that the pass didn’t correctly report that it changed the function [4] (even though materially, it didn’t).

This raises a broader question of what we really mean when we say that a pass is allowed to report it made no changes through its runOn* function(s):

A) The obvious, and most restrictive condition would be that the pass can only return false if it treated the Function/Module/etc that it visited purely as read-only. From what I can tell, a lot of passes conservatively assume this is the required condition.

B) Less obvious would be that we allow passes to add instructions speculatively, so long as they remove whatever they added before returning false. If one were to dump the IR before & after such a pass, you should see no change in the text/bitcode. If you compared the in-memory representations, you should see no semantic differences in the two, modulo “allowed” differences like unordered lists of pointers (I vaguely remember a few cases of this existing, but can’t remember the details).

C) Even less obvious would be that we allow a pass that removes instructions / globals, and then re-adds them. These changes would be semantics-preserving, but would of course not preserve the removed bits of IR’s addresses in memory.

I believe (A) to be the spirit of the contract between passes and the pass manager. (B) seems like a stretch, but depending on the analysis invalidation needs of the PM, might still be ok. (C) seems totally off base, and does not seem like a productive interpretation.

Does this match everyone else’s intuition, and is (B) even a valid interpretation?

This concern was brought up in the relevant review (https://reviews.llvm.org/D81230) as well.
I think we can safely go with (A) here. It's unambiguous, and I don't think there's any practical benefit to be had from (B) or (C).

+1

I agree. If a pass modifies the IR in any way, even temporarily, it was changed (at some point).

I agree. If a pass modifies the IR in any way, even temporarily, it was changed (at some point).

I’d like to +1, but when doing the migration of the code base to enable the associated expensive check, I found situations where (B) was used.

For example CodeGenPrepare has a TypePromotionTransaction class which
is specifically designed to let you modify the IR, then roll back
those modifications and return false.

Jay.

Agreed, sometimes it is very convenient to create temporary IR, so various utilities that expect a piece of IR can be used.

Another example of a pass that creates temporary instructions is NewGVN. Or some users of SCEVExpander, where the result of the expansion might be thrown away, because it is not profitable.

With (A), such passes would make changes.

I think with (B), the only potential effect would be slightly different use list orders, but there should be no other observable effects, if the pass removes the instructions again and also removes them from any analysis that it might have add them. I don’t think we gain much in terms of extra checking from choosing (A) over (B), but it would have a negative impact on some passes in terms of unnecessary invalidation. But I might be missing something.

Cheers,
Florian