Hi LLVM,
I'm currently setting up some tools to investigate the influence of the order of optimization passes on the performance of compiled programs -nothing exceptional here.
I noticed something inconvenient with opt, namely that splitting a call does not always give the same output:
% llvm-stress > stress.ll
% opt -dse -verify -dce stress.ll -o stress-1.bc
% opt -dse stress.ll | opt -dce -o stress-2.bc
% diff stress-{1,2}.bc
Binary files stress-1.bc and stress-2.bc differ
The difference seems meaningful; it's ~180 bytes out of ~1400 bytes of output in my random case. I can't decode it however, because disassembling the bytecode produces identical text files, even with annotations. (!)
I made sure that the sequence for [-dse -verify -dce] is the concatenation of the individual sequences; this falls in place naturally because -dce has no dependencies. The verifier pass helps make two function pass managers, just in case.
Now if I do the same thing but staying in text format, I get the same IR (up to module name):
% opt -S -dse -verify -dce stress.ll -o stress-1.ll
% opt -S -dse stress.ll | opt -S -dce -o stress-2.ll
% diff -y --suppress-common-lines stress-{1,2}.ll
; ModuleID = 'stress.ll' | ; ModuleID = '<stdin>'
Is there a specific behavior of opt that could explain this situation? What kind of difference could there be in the bytecode files that is lost in translation to text format ?
Cheers,
Sébastien Michelland
Are you passing -preserve-ll-uselistorder when you create the .ll files? It's off by default because the output tends to be sort of unreadable, but it could explain some of the differences you're seeing.
-Eli
Hi Eli,
Unfortunately the differences remain, I do not observe a significant change in the output besides the fact that it's random.
I noticed that running opt without options on the random file changes the order of references in the predecessors of basic blocks (sample below). Further invocations of opt are idempotent.
I don't know of this information is stored in the bytecode file as well.
< ; preds = %CF, %CF80, %CF78
> ; preds = %CF80, %CF, %CF78
FWIW, the conflicting section of the bytecode file is likely not a permutation because the byte patterns don't match (some of the btte values of stress-1.bc are not present in stress-2.bc).
Thanks for your help 
Sébastien Michelland
Hi,
I would give try to run llvm-bcanalyzer on these bc files, this may help to understand where the discrepancy is coming from.
Best,
Hi Mehdi,
Thank you for mentioning this tool, I was looking for something like this. By default the analyzer produces identical output on both files, but a complete -dump shows that the storage order of the symbol table is different.
This would explain why text files are not affected: the symbols are used directly in text form so there is no need for this table.
I suppose that settles the question of where. Out of curiosity, I'd like to know if there is a way to order the table in a canonical form. I found -preserve-bc-uselistorder which makes more sense (and seems to correspond because the table lists all uses of each symbol), but no luck yet.
At least now I'm sure that there is no semantic difference between the programs so it's a great help. 
Thanks,
Sébastien Michelland
Thanks for the update!
It may be desirable to sort the table before writing the bitcode out, adding Peter to the thread for his opinion.
Hello again,
It may be desirable to sort the table before writing the bitcode out, adding Peter to the thread for his opinion.
Thanks for this!
Now it seems I've been optimistic about this result. I have instrumented the test suite to check it on a wider amount of files and quickly discovered that it fails for larger optimization sequences.
In particular, the default -O3 set in which I'm interested is not reproduced easily. I'm attaching a script that demonstrates this.
It contains the extracted -O3 set in two groups, and checks that [opt -debug-pass=Arguments] reports the same sequences when called with -O3 and the individual arguments. If a file name is provided, it also checks that the outputs are the same (or in our case, different).
Many real files fail to pass this test, for instance bilateral_grid.bc:
<https://github.com/llvm/llvm-test-suite/blob/master/Bitcode/Benchmarks/Halide/bilateral_grid/bilateral_grid.bc>
The diffs are very large even in text mode, and include lots of code.
I'm puzzled again. Any clue on the behavior of opt is very welcome. 
Cheers,
Sébastien Michelland
not-associative.sh (4.06 KB)
There is a non-deterministic problem with the uselists. The code causing this is almost identical in the IR and the bc writer. In some invocations of opt the uselists are shuffled in others (same input) they are not. I haven’t nailed the root cause yet. It has the flavor of a stack memory corruption.
For a quick check that you see the same issue you could disable the shuffle code in the writers.
Gerolf
I finally got back to this. It is a known and endemic issue that pops up from time to time. The issues I’m aware of so far are related to random sets being used where strict order is required. This may result in non-deterministic uselists issued by the bitcode/assembly writers.
There is no great way to go about pro-active testing for this. Collecting the tests so far and running them as regression tests occasionally might serve as a feel better bandage. Neither can I think of good checks in a verifier. These bugs show up from time to time, disappear, show up again, and essentially any commit could expose them or make them disappear. The medicine to take may well be supplying deterministic implementations for DenseMap, SmallPtrSet, and probably DenseSet - and have current usage limited to cases where order is irrelevant, like in data flow analysis etc.
I pushed back one fix for sccp, and will post one for adce later. Hopefully they will help in your case, but I doubt they are exhaustive.
FWIW, there is one bright spot here: I have no (not yet…) example where incorrect code is generated.
-Gerolf
Isn’t LLVM_REVERSE_ITERATION intended to catch issues where we depend on the iteration order? I don’t know if we have bots using this flag though.
Would building something like LLVM itself with a host clang built with and without this flag and comparing the output be enough?
I finally got back to this. It is a known and endemic issue that pops up from time to time. The issues I’m aware of so far are related to random sets being used where strict order is required. This may result in non-deterministic uselists issued by the bitcode/assembly writers.
There is no great way to go about pro-active testing for this. Collecting the tests so far and running them as regression tests occasionally might serve as a feel better bandage. Neither can I think of good checks in a verifier. These bugs show up from time to time, disappear, show up again, and essentially any commit could expose them or make them disappear. The medicine to take may well be supplying deterministic implementations for DenseMap, SmallPtrSet, and probably DenseSet - and have current usage limited to cases where order is irrelevant, like in data flow analysis etc.
Isn’t LLVM_REVERSE_ITERATION intended to catch issues where we depend on the iteration order? I don’t know if we have bots using this flag though.
Would building something like LLVM itself with a host clang built with and without this flag and comparing the output be enough?
This is a good idea. You would want to check the usual test suites first. For debugging simplicity, but also to increase the probability of finding issues. The crux is that you usually need a random number of runs before you can spot the issue. A simple a/b I would not consider prove for the absence of the issue, even with the forward/reverse perturbation. But I could see a bot running many a/b tests to spit them out over time.