detailed comparison of generated code size for LLVM and other compilers

See here:

   http://embed.cs.utah.edu/embarrassing/

There is a lot of data there. Please excuse bugs and other problems. Feedback would be appreciated.

John Regehr

Hi John,

I agree with the gcc folks: it's really important to get frame pointer emission etc aligned across the various compilers. ICC defaulting to frame pointers off and GCC/llvm defaulting to them on will seriously warp the numbers. Also, while I agree that compilers should all optimize undefined behavior away as aggressively as possible, I also think it's not a very "interesting" test in the grand scheme of things. :slight_smile:

Very interesting work though, I hope to have some time to dive in more later,

-Chris

Where did the gcc folks express this opinion?

There are probably other differing defaults besides frame pointers. Stack protectors, perhaps.

I agree with the gcc folks: it's really important to get frame pointer
emission etc aligned across the various compilers. ICC defaulting to
frame pointers off and GCC/llvm defaulting to them on will seriously
warp the numbers.

Where did the gcc folks express this opinion?

The GCC mailing list.

There are probably other differing defaults besides frame pointers. Stack protectors, perhaps.

Yes, that's another good example,

-Chris

There are probably other differing defaults besides frame pointers. Stack protectors, perhaps.

Yes, that's another good example,

Ok-- I assume I should just add '-fno-stack-protector' to the llvm-gcc and clang command lines? I had totally missed that something like this was turned on by default.

I'll re-run everything with this change, with frame pointers omitted, and dropping testcases that contain uses of uninititalized locals and post here again when that's done.

Thanks for the feedback,

John

Sounds great, thanks John!

-Chris

There are probably other differing defaults besides frame pointers. Stack
protectors, perhaps.

Yes, that's another good example,

Ok-- I assume I should just add '-fno-stack-protector' to the llvm-gcc and
clang command lines? I had totally missed that something like this was
turned on by default.

Yes.

I'll re-run everything with this change, with frame pointers omitted, and
dropping testcases that contain uses of uninititalized locals and post
here again when that's done.

You'll probably need to some digging to make sure the defaults are exactly comparable. Are SSE and/or MMX used? This affects floating point codegen quite a bit. Are the target CPUs the same?

I think some useful information is going to come out of this, but there's some cruft to clear out first.

You'll probably need to some digging to make sure the defaults are exactly comparable. Are SSE and/or MMX used? This affects floating point codegen quite a bit. Are the target CPUs the same?

Good point. Would "generic i686" be a reasonable choice? Does that even mean anything these days?

Anyway-- I'd appreciate some guidance on what you folks would find most interesting and useful. Then I can do some homework and figure out how to get the rest of the compilers to assume an equivalent ISA.

Thanks,

John

You'll probably need to some digging to make sure the defaults are
exactly comparable. Are SSE and/or MMX used? This affects floating
point codegen quite a bit. Are the target CPUs the same?

I saw in some tests the difference of the size was caused by different
defaults for FP codegen - x87 math or sse2.

I'd recommend targeting (with both -march and -mtune) a simple and commonly available CPU type like "core2" or "pentium4". ICC should have both of these and gcc/llvm definitely do.

-Chris

While I would say that, to be fair, the comparison should be made with
the same options (-O3 only or something of the sort), ICC is specific
for Intel and GCC is highly tuned to, which is not the case of LLVM.
Still, if the target is specified, I'd assume we should enable all
tested and proven (AFAP) optimizations to that particular platform by
default.

So, my question is: are those optimizations turned off by default
because they're experimental?

--renato

Reclaim your digital rights, eliminate DRM, learn more at
http://www.defectivebydesign.org/what_is_drm

The issue here is more arbitrary differences due to different default
code generation choices; for example, clang defaults to generating
SSE2 code, while llvm-gcc defaults to using x87 FP.

-Eli

The issue here is more arbitrary differences due to different default
code generation choices; for example, clang defaults to generating
SSE2 code, while llvm-gcc defaults to using x87 FP.

Aha, this explains some apparently bizarre results such as the second one (018427, d) on this page:

http://embed.cs.utah.edu/embarrassing/dec_09/harvest/llvm-gcc-head_clang-head/

I had been wondering about this one.

John

Aha, this explains some apparently bizarre results such as the second one
(018427, d) on this page:

Right. However, I saw the opposite case with sse2 code being 4x larger

"return 1.0" is an example that is larger with SSE codegen, because the ABI requires stuff in the FP stack. X86-64 doesn't have this issue.

-Chris

This might be a very stupid question, but can we not choose to disable
SSE code generating in a case-by-case basis, even when those
optimizations are turned on?

In that case, I imagine that 1.0 is considered double and would
normally fill one or two registers, thus easy enough to return it via
registers. Unless, of course, the ABI mandates that SSE is
all-or-nothing...

cheers,
--renato

Reclaim your digital rights, eliminate DRM, learn more at
http://www.defectivebydesign.org/what_is_drm

This might be a very stupid question, but can we not choose to disable
SSE code generating in a case-by-case basis, even when those
optimizations are turned on?

In that case, I imagine that 1.0 is considered double and would
normally fill one or two registers, thus easy enough to return it via
registers. Unless, of course, the ABI mandates that SSE is
all-or-nothing...

ABI does not mandate the usage of SSE or x86 FP math exclusively.
It just requires the FP values to be *always* returned via FP stack.

Makes sense, probably avoiding duplicating hardware logic.

cheers,
--renato

Reclaim your digital rights, eliminate DRM, learn more at
http://www.defectivebydesign.org/what_is_drm

Makes sense, probably avoiding duplicating hardware logic.

No. Surely not to break legacy code :slight_smile: