Bug in optimization pass related to strcmp and big endian back-ends

Stripf_Timo · August 11, 2009, 8:13am

Hi all,

i’m working on a LLVM back-end right now and i think I found a bug in an optimization pass. When compiling the following code using llvm-gcc (the current 2.5 release) with –O2

int main(int argc, char** argv) {

char* pStr = “I” + (argc > 100);

printf(“%d\n”, strcmp(pStr, “I”) == 0);

}

the strcmp function is replaced by a 16 bit load and compared against the integer value of ‘I’:

define i32 @main(i32 %argc, i8** nocapture %argv) nounwind {

entry:

%0 = icmp sgt i32 %argc, 100 ; [#uses=1]

%1 = zext i1 %0 to i32 ; [#uses=1]

%2 = getelementptr [2 x i8]* @.str, i32 0, i32 %1 ; <i8*> [#uses=1]

%tmp = bitcast i8* %2 to i16* ; <i16*> [#uses=1]

%lhsv = load i16* %tmp, align 1 ; [#uses=1]

%3 = icmp eq i16 %lhsv, 73 ; [#uses=1]

%4 = zext i1 %3 to i32 ; [#uses=1]

%5 = tail call i32 (i8*, …)* @printf(i8* getelementptr ([4 x i8]* @.str1, i32 0, i32 0), i32 %4) nounwind ; [#uses=0]

ret i32 undef

}

On little endian machines the code works correct but on big endian %lhsv must be compared against 73 << 8.

Kind regards

Timo Stripf

Eli_Friedman1 · August 11, 2009, 8:27am

If llvm-gcc thinks it's compiling for a little-endian target, the
optimizers will assume the target is little-endian... what are you
trying to do?

-Eli

Stripf_Timo · August 11, 2009, 9:24am

I thought the LLVM IR is target independent and that "llvm-gcc -c -emit-llvm -O2" produces target independent code.

I'm working on a back-end and use llvm-gcc to first generate the bc file. Afterwards I use llc including the new back-end to produce the assembler file.

-Timo

Richard_Pennington1 · August 11, 2009, 10:04am

Stripf, Timo wrote:

I thought the LLVM IR is target independent and that "llvm-gcc -c -emit-llvm -O2" produces target independent code.

I'm working on a back-end and use llvm-gcc to first generate the bc file. Afterwards I use llc including the new back-end to produce the assembler file.

-Timo

LLVM IR is very target dependent. The IR knows about things like endian-ness, alignment, etc.

I'm currently building newlib for several LLVM targets and I create separate bitcode for each target. Each module has the target triple and target data string specific to the target.

Unfortunately you need an llvm-gcc for the target you want to support. M I think clang can generate code for multiple targets. Maybe you should try that.

-Rich

akorobeynikov · August 11, 2009, 10:38am

Hello

Unfortunately you need an llvm-gcc for the target you want to support. M
I think clang can generate code for multiple targets. Maybe you should
try that.

Usually adding new target to clang is a matter of few lines of code.
Some more things were needed if your target uses, for example, complex
calling conventions (e.g. like x86_64), but this can be easily skipped
for the first time.

You might want to see how different targets are hooked into clang
(e.g. msp430, s390x, ppc, pic16, etc)

Dan_Gohman3 · August 11, 2009, 4:16pm

More precisely, LLVM IR generated from C is very target dependent. See
http://llvm.org/docs/FAQ.html#platformindependent
for details.

Dan

Nick_Lewycky · August 12, 2009, 2:34am

Stripf, Timo wrote:

I thought the LLVM IR is target independent

Yes.

and that "llvm-gcc -c -emit-llvm -O2" produces target independent code.

No.

I'm working on a back-end and use llvm-gcc to first generate the bc file. Afterwards I use llc including the new back-end to produce the assembler file.

LLVM IR contains a target-information line but is otherwise target independent. This does *not* mean that you can convert C to LLVM IR in a target independent way.

C code may contain "#ifdef __ppc__". Now what? Or how about "switch (x) { case sizeof(int): ... }". This question is a FAQ: Frequently Asked Questions (FAQ) — LLVM 18.0.0git documentation

LLVM IR is portable in the sense that it will run the same on any platform. C is portable in the sense that you can detect things about the platform so you may correct for them. It turns out that these are two fundamentally incompatible paradigms.

Some LLVM optimizations take advantage of the information in the target info line which could change the behaviour of the program if your target system doesn't match the one described in the info line.

Nick

Stripf_Timo · August 12, 2009, 12:03pm

Alright thank you all for your help and information and sry for describing it as a bug.

For a "fast" workaround I simple use llvm-gcc with -O0, modify the endian information within the ll file and use opt to optimize the code. That way also the debugging information is not removed and everything works atm fine for a non-trivial application. Later I'll also modify the front-end to support the back-end but atm I think this is easier.

Just out of curiosity, are there any plans/ideas to increase the platform independence of optimization steps or the LLVM IR? It is clear that for C/C++ front-ends it is useless but maybe it is useful for other platform independent front-ends like java. Maybe add some kind of is_big_endian function to express "is_big_endian ? 30 : 30<<8" within the IR that is replaced within the back-end.

-Timo

akorobeynikov · August 12, 2009, 12:32pm

Hello, Timo

Just out of curiosity, are there any plans/ideas to increase the platform independence of optimization steps or the LLVM IR? It is clear that for C/C++ front-ends it is useless but maybe it is useful for other platform independent front-ends like java. Maybe add some kind of is_big_endian function to express "is_big_endian ? 30 : 30<<8" within the IR that is replaced within the back-end.

Both optimization steps and LLVM IR are target neutral by themselves.
It's the "information" encoded into the IR is target-dependent (and
it's not only the endianess thing). That's why LLVM IR obtained from
java bytecode should be target-neutral in theory.

Topic		Replies	Views
Bug on llvm backend (llc) when the integer type is not power of 2? LLVM Dev List Archives	1	81	March 10, 2021
Missed optimization opportunity LLVM Dev List Archives	3	66	December 29, 2010
Another memory fun LLVM Dev List Archives	13	65	January 7, 2008
instcombine produces strange i32* LLVM Dev List Archives	1	70	September 14, 2016
Getelementptr woes LLVM Dev List Archives	3	87	June 18, 2004

Bug in optimization pass related to strcmp and big endian back-ends

Related Topics