detailed comparison of generated code size for LLVM and other compilers

regehr · December 14, 2009, 3:33pm

See here:

http://embed.cs.utah.edu/embarrassing/

There is a lot of data there. Please excuse bugs and other problems. Feedback would be appreciated.

John Regehr

Chris_Lattner · December 14, 2009, 5:46pm

Hi John,

I agree with the gcc folks: it's really important to get frame pointer emission etc aligned across the various compilers. ICC defaulting to frame pointers off and GCC/llvm defaulting to them on will seriously warp the numbers. Also, while I agree that compilers should all optimize undefined behavior away as aggressively as possible, I also think it's not a very "interesting" test in the grand scheme of things.

Very interesting work though, I hope to have some time to dive in more later,

-Chris

Dale_Johannesen · December 14, 2009, 7:04pm

Where did the gcc folks express this opinion?

There are probably other differing defaults besides frame pointers. Stack protectors, perhaps.

Chris_Lattner · December 14, 2009, 7:05pm

I agree with the gcc folks: it's really important to get frame pointer
emission etc aligned across the various compilers. ICC defaulting to
frame pointers off and GCC/llvm defaulting to them on will seriously
warp the numbers.

Where did the gcc folks express this opinion?

The GCC mailing list.

There are probably other differing defaults besides frame pointers. Stack protectors, perhaps.

Yes, that's another good example,

-Chris

regehr · December 14, 2009, 8:23pm

There are probably other differing defaults besides frame pointers. Stack protectors, perhaps.

Yes, that's another good example,

Ok-- I assume I should just add '-fno-stack-protector' to the llvm-gcc and clang command lines? I had totally missed that something like this was turned on by default.

I'll re-run everything with this change, with frame pointers omitted, and dropping testcases that contain uses of uninititalized locals and post here again when that's done.

Thanks for the feedback,

John

Chris_Lattner · December 14, 2009, 8:37pm

Sounds great, thanks John!

-Chris

Dale_Johannesen · December 14, 2009, 9:27pm

There are probably other differing defaults besides frame pointers. Stack
protectors, perhaps.

Yes, that's another good example,

Ok-- I assume I should just add '-fno-stack-protector' to the llvm-gcc and
clang command lines? I had totally missed that something like this was
turned on by default.

Yes.

I'll re-run everything with this change, with frame pointers omitted, and
dropping testcases that contain uses of uninititalized locals and post
here again when that's done.

You'll probably need to some digging to make sure the defaults are exactly comparable. Are SSE and/or MMX used? This affects floating point codegen quite a bit. Are the target CPUs the same?

I think some useful information is going to come out of this, but there's some cruft to clear out first.

regehr · December 14, 2009, 9:49pm

You'll probably need to some digging to make sure the defaults are exactly comparable. Are SSE and/or MMX used? This affects floating point codegen quite a bit. Are the target CPUs the same?

Good point. Would "generic i686" be a reasonable choice? Does that even mean anything these days?

Anyway-- I'd appreciate some guidance on what you folks would find most interesting and useful. Then I can do some homework and figure out how to get the rest of the compilers to assume an equivalent ISA.

Thanks,

John

akorobeynikov · December 14, 2009, 9:54pm

You'll probably need to some digging to make sure the defaults are
exactly comparable. Are SSE and/or MMX used? This affects floating
point codegen quite a bit. Are the target CPUs the same?

I saw in some tests the difference of the size was caused by different
defaults for FP codegen - x87 math or sse2.

Chris_Lattner · December 14, 2009, 9:57pm

I'd recommend targeting (with both -march and -mtune) a simple and commonly available CPU type like "core2" or "pentium4". ICC should have both of these and gcc/llvm definitely do.

-Chris

Renato_Golin3 · December 14, 2009, 11:29pm

While I would say that, to be fair, the comparison should be made with
the same options (-O3 only or something of the sort), ICC is specific
for Intel and GCC is highly tuned to, which is not the case of LLVM.
Still, if the target is specified, I'd assume we should enable all
tested and proven (AFAP) optimizations to that particular platform by
default.

So, my question is: are those optimizations turned off by default
because they're experimental?

--renato

Reclaim your digital rights, eliminate DRM, learn more at
http://www.defectivebydesign.org/what_is_drm

Eli_Friedman1 · December 14, 2009, 11:41pm

The issue here is more arbitrary differences due to different default
code generation choices; for example, clang defaults to generating
SSE2 code, while llvm-gcc defaults to using x87 FP.

-Eli

regehr · December 15, 2009, 12:34am

The issue here is more arbitrary differences due to different default
code generation choices; for example, clang defaults to generating
SSE2 code, while llvm-gcc defaults to using x87 FP.

Aha, this explains some apparently bizarre results such as the second one (018427, d) on this page:

http://embed.cs.utah.edu/embarrassing/dec_09/harvest/llvm-gcc-head_clang-head/

I had been wondering about this one.

John

akorobeynikov · December 15, 2009, 1:01am

Aha, this explains some apparently bizarre results such as the second one
(018427, d) on this page:

Right. However, I saw the opposite case with sse2 code being 4x larger

Chris_Lattner · December 15, 2009, 1:05am

"return 1.0" is an example that is larger with SSE codegen, because the ABI requires stuff in the FP stack. X86-64 doesn't have this issue.

-Chris

Renato_Golin3 · December 15, 2009, 9:51am

This might be a very stupid question, but can we not choose to disable
SSE code generating in a case-by-case basis, even when those
optimizations are turned on?

In that case, I imagine that 1.0 is considered double and would
normally fill one or two registers, thus easy enough to return it via
registers. Unless, of course, the ABI mandates that SSE is
all-or-nothing...

cheers,
--renato

Reclaim your digital rights, eliminate DRM, learn more at
http://www.defectivebydesign.org/what_is_drm

akorobeynikov · December 15, 2009, 9:54am

This might be a very stupid question, but can we not choose to disable
SSE code generating in a case-by-case basis, even when those
optimizations are turned on?

In that case, I imagine that 1.0 is considered double and would
normally fill one or two registers, thus easy enough to return it via
registers. Unless, of course, the ABI mandates that SSE is
all-or-nothing...

ABI does not mandate the usage of SSE or x86 FP math exclusively.
It just requires the FP values to be *always* returned via FP stack.

Renato_Golin3 · December 15, 2009, 10:16am

Makes sense, probably avoiding duplicating hardware logic.

cheers,
--renato

Reclaim your digital rights, eliminate DRM, learn more at
http://www.defectivebydesign.org/what_is_drm

akorobeynikov · December 15, 2009, 11:39am

Makes sense, probably avoiding duplicating hardware logic.

No. Surely not to break legacy code

Topic		Replies	Views
updated code size comparison LLVM Dev List Archives	7	61	May 7, 2012
quantitative comparison of correctness of llvm-gcc 2.x versions LLVM Dev List Archives	7	74	November 21, 2008
tot clang/llvm and tot gcc performance comparision LLVM Dev List Archives	5	71	November 24, 2010
Poor register allocations vs gcc Clang Frontend	0	70	July 13, 2015
updated code size comparison LLVM Dev List Archives	14	76	January 26, 2010

detailed comparison of generated code size for LLVM and other compilers

Related Topics