Determine reason for failure at -O1

Hi Everyone,

We caught a report for a failed self test when using Clang 5.0 and 6.0
with -DDEBUG and -O1 (i.e., a "debug build"). The code is question is
located at
. It is the SSSE3 code path for CHAM64.

Other optimizations levels are OK for Clang. GCC, ICC and MSVC are OK.
The code is valgrind, Sanitizer, Coverity and Enterprise Analysis

An objdump is available at but it is not
very helpful to me.

My question is, how can we begin to troubleshoot the issue?

Thanks in advance,


I’d suggest approach this like any other bug in source code (not in the compiler) - reduce the example, isolate the failure - until you either find a source bug, or a small, standalone example that seems to demonstrate a contradiction or bug in the way the compiler is interpreting the source.

Thanks Davis.

According to the list of
optimizations applied at -O1 include:

-globalopt -demanded-bits -branch-prob -inferattrs -ipsccp -dse
-loop-simplify -scoped-noalias -barrier -adce -deadargelim -memdep
-licm -globals-aa -rpo-functionattrs -basiccg -loop-idiom -forceattrs
-mem2reg -simplifycfg -early-cse -instcombine -sccp -loop-unswitch
-loop-vectorize -tailcallelim -functionattrs -loop-accesses -memcpyopt
-loop-deletion -reassociate -strip-dead-prototypes -loops -basicaa
-correlated-propagation -lcssa -domtree -always-inline -aa -block-freq
-float2int -lower-expect -sroa -loop-unroll
-alignment-from-assumptions -lazy-value-info -prune-eh -jump-threading
-loop-rotate -indvars -bdce -scalar-evolution -tbaa

I tried backing off some of the optimizations to begin to isolate:

$ CXX=clang++ CXXFLAGS="-g2 -O1 -no-loop-vectorize" make cham-simd.o
clang++ -g2 -O1 -no-loop-vectorize -fPIC -pthread -pipe -c cham-simd.cpp
clang-6.0: error: unknown argument: '-no-loop-vectorize'
make: *** [GNUmakefile:1069: cham-simd.o] Error 1

So I guess my question is, how do I disable a particular optimization?


You can use -opt-bisect-limit to narrower down the optimization that causes the issue.

We cleared the issue at . We did not
have a methodology. We teased-out the solution through knob turning.

The aa80c7d4acb6 change modifies a loop to perform 4
encryption/decryption rounds per iteration rather than 8. 8 rounds per
iteration decryption was bad. Dropping from 8 to 4 relieved some
register pressure by requiring 4 fewer loads of a mask used to shuffle
a key.

Thanks for the help and suggestions.


Were you able to determine if it was a compiler bug? (an isolated example would help to make that clear - it’s hard to know from the workaround whether that suppressed a legitimate compiler optimization that was tickling some bug in the code or working around a bug in the compiler)