Hash of a module

I want to run a bunch of optimizations, iteratively, that is keep running until things stop changing (to make sure all optimization opportunities are taken). As far as I know, there is no way to copy a module or compare modules by value, so it occurs to me that a practical solution might be to take the hash code of the module and see if that changes.

A problem is that hash algorithms are designed to work on streams of bytes, not compound objects.

First attempt at a solution: iterate through all instructions in all functions and hash the instruction kinds. I can think of some possible changes that would fail to be captured by that.

Is there any already known solution?

Hi Russell,

please take a look at the file lib/Transforms/IPO/MergeFunctions.cpp. It implements comparison and hashing of functions. You can probably reuse some of it's code and extend the approach to modules. Alternatively, you could try to use code from tools/llvm-diff/ (DifferenceEngine.cpp in particular). They both implement similar functionality, one might be better suited for your particular use case.

There is code to create a copy of a module in lib/Transforms/Utils/CloneModule.cpp.


Thanks! The code to create a copy of a module is just what I needed; having done that, I’ve written my own code to compare modules for approximate equality, since I realized for my purposes, a fast, coarse-grained comparison suffices (since coarse-grained changes are the most likely to create further optimization opportunities).

Are you going to run some of the existing passes? Why can’t you just use the returned change-made value from the passes?


Yes, I’m running all the existing passes that I know how to run. I didn’t know they returned change-made. Thanks!

There is a caveat here. I was experimenting with something similar and found that this status is not always trustworthy. I fixed one bug in prune-eh. These is also a bug in reassociate pass. It returns true with no change made on the following instruction:
%0 = and i64 %b, %a
It happens because it performs two distinct transformations which nullify each other (canonicalizeOperands swaps arguments of an and and then ReassociateExpression swaps them back).

This approach might work for your set of passes, but beware of the problem.


Oh, hmm, thanks for the warning, I should probably stick with the copy and compare technique then.

(canonicalizeOperands swaps arguments of an and and then ReassociateExpression swaps them back).

That feels like its own bug, canonicalize and reassociate having different opinions of canonical order. Just saying.


It definitely is. However, we seem to have a good number of these types of bugs. Nothing in our current test infrastructure reveals them, so they seem to be relatively widespread. Might be worthing adding an assertion to the pass manager that the hash of a changed module differs from the hash of the original?

I’ve tried this kind of thing before – one thing to be aware of is that some code will not converge on a single fixed point under repeated optimizations, and instead will cycle through a couple different versions. So regardless of the way you have of determining if changes were made, the cycle case might need to be taken into account. I handled it by saying “stop if the number of IR instructions increases” which ended up being a decent heuristic for my use case (starting from something very unoptimized so there were a lot of optimizations available). This was probably a bug I was running into and it might have been fixed (I think this was using llvm 3.3), so not sure if this will end up affecting you.

I also ran into another case where the “changes made” flag was set despite no changes in the IR. My sense is that this feature is not used very often so these kind of things can sneak in. Anyway, here’s the patch, not sure if it is still relevant.


The opposite seems like it might be useful too: assert that the hash of an unchanged module remains unchanged.