I created a .ll file ( say ‘test.ll’ ) and generated the assembly file ‘test.s’ as following: clang -S -o test.s test.ll
The generated assembly is then compiled/linked with the rest of the package to obtain the executable. Works fine as described.
However when an optimization is attempted, e.g. clang -O1 -S -o test.s test.ll, the generated assembly file is wrong.
What is the right way to generate an optimized object from a llvm IR file?
clang used is version 13.0.1-++20220120110955+75e33f71c2da-1~exp1~20220120231006.63.
The command you provided looks exactly right to me. But “the generated assembly file is wrong” is too vague a description for anyone to be able to help you. In what way is it wrong?
- assembler reports errors
- assembles fine but linker reports errors
- links fine but program crashes
- links fine but program gives wrong results
Also, providing the test.ll file will help immensely.
I work on a project where the compiler currently doesn’t have an option for optimisation - it will come, but there’s other things that are more important.
So I use precisely
clang -O1 somefile.ll -c, which, aside from generating an object file is equivalent to the one posted in the first post here. It works fine for my work - and I’ve got .ll files in the order of 100-500 KB, the whole project is ~80k lines in LLVM-IR.
When it doesn’t work, it’s almost always because the .ll file itself is “bad” - the compiler didn’t generate the correct code.
Sounds like you have worked out how to generate an optimized object, then! If you have other questions feel free to ask.
Sorry about my sloppy post.
Please visit https://dalsoft.com/bug_report.html to find sources and explanations.
Would someone please look at the included file parP.ll and tell me if it is correctly created; note that no errors or warning were reported while processing it.
I had a quick look, I can’t see anything OBVIOUSLY wrong. When I compile with clang on my machine, I get identical .s output.
What exactly do you see that is different from what you expect - do you get the wrong result?
I don’t follow exactly what the generated code is meant to do - I think you’re trying to chunk up 1000 additions per thread, right?
This looks a bit dubious:
%26 = alloca %___dcp_parBufType115, align 8 - as it is in the middle of a function - alloca’s should be at the start of functions in general.
Sorry, this is probably not super helpful.
The code generated by the auto-parallelizer and attempts to perform additions in PARALLEL.
When the generated ( by the auto-parallelizer ) IR .ll file compiled ( by clang ) without optimizations, the generated executable file produces the right result.
When the generated ( by the auto-parallelizer ) IR .ll file compiled ( by clang ) with optimizations requested ( -O1 ), the generated executable file produces wrong result.
So, is the result “random garbage” or “kind of right, but not quite”.
Random garbage: you’re reading uninitialized addresses - valgrind may tell you where that is? It is most likely some index calculation that goes wrong.
If it’s “kind of right, but not quite” (it’s ballpark right, but not precisely right, e.g. getting 996 or 999 instead of 1000). I haven’t looked at your input values, but it may be an idea to have a fairly small test that you can predict the correct value - fill with 1.0 for example, and then change half of them to 2.0 - half could be every other, the low half, the high half, etc.
In general -O1 isn’t very agressive opts, so it should be kinda safe, but it’s of course not ALWAYS the case.
I can assure you that you’re not doing something wrong in generating the optimised file itself. It’s something in your generated code that isn’t working correctly. This is the hard part about compiler work, when your compiler (or similar) generates code that is technically correct, but doesn’t do quite what you expected.
It was suggested by Leporacanthicus and others that alloca shall be placed in the entry block - that indeed solved the problem.