I meet this problem when compiling the TREAM benchmark (http://www.cs.virginia.edu/stream/FTP/Code/) with enable-misched
The small function will be scheduled as good code, but if opt inline this function, the inline part will be scheduled as bad code.
so I rewrite a simple code as attached link (foo.c), and compiled with two different methods:
$clang -O3 foo.c -static -S -o foo.s -mllvm -enable-misched -mllvm -unroll-count=4 --target=arm -mfloat-abi=hard -mcpu=cortex-a9 -fno-vectorize -fno-slp-vectorize
$clang foo.c -S -emit-llvm -o foo.bc --target=arm -mfloat-abi=hard -mcpu=cortex-a9
$opt foo.bc -O3 -unroll-count=4 -o foo.opt.bc
$llc foo.opt.bc -o foo.opt.s -march=arm -mcpu=cortex-a9 -enable-misched
(ps. I had checked with debug-pass=structure, so I think they are equivalently)
but the result is different:
You can find the LBB1_4 of foo.s, it always reuses the same reg for computation, but LBB1_4 of foo.opt.s doesn’t.
My question is how to just use clang (method A) to achieve B result?
Or i am missing something here?
I really appreciate any help and suggestions.
------- file link -------