linking and stripping compared to GCC 4.4

Hi,

consider the following Boost.Python code:

// demo.cpp

#include <boost/python.hpp>

template <typename T, typename U>
T foo(T i, U j) { return i*j; }

#define ARGS , (arg(“i”)=0, arg(“j”)=0)

BOOST_PYTHON_MODULE(demo_ext)
{
using namespace boost::python;
def(“fii”, foo<int,int> ARGS);
def(“fil”, foo<int,long> ARGS);
def(“fif”, foo<int,float> ARGS);
def(“fid”, foo<int,double> ARGS);
def(“fic”, foo<int,char> ARGS);
def(“fli”, foo<long,int> ARGS);
def(“fll”, foo<long,long> ARGS);
def(“flf”, foo<long,float> ARGS);
def(“fld”, foo<long,double> ARGS);
def(“flc”, foo<long,char> ARGS);
def(“ffi”, foo<float,int> ARGS);
def(“ffl”, foo<float,long> ARGS);
def(“fff”, foo<float,float> ARGS);
def(“ffd”, foo<float,double> ARGS);
def(“ffc”, foo<float,char> ARGS);
def(“fdi”, foo<double,int> ARGS);
def(“fdl”, foo<double,long> ARGS);
def(“fdf”, foo<double,float> ARGS);
def(“fdd”, foo<double,double> ARGS);
def(“fdc”, foo<double,char> ARGS);
}

Then compare

~> g++ -c -DBOOST_PYTHON_MAX_BASES=2 -fPIC -fno-strict-aliasing -DNDEBUG -march=native -O3 -DBOOST_ALL_NO_LIB -DBOOST_DISABLE_THREADS -I…/cctbx/boost -I/usr/include/python2.6 demo.cpp
~> g++ -o demo.so -shared demo.o -L…/cctbx/cctbx_build/lib -lboost_python~>> g++ -o demo_strip.so -s -shared demo.o -L…/cctbx/cctbx_build/lib -lboost_python
~> ls -lhS
total 540K
-rw-r–r-- 1 luc devcom 240K Nov 27 07:50 demo.o
-rwxr-xr-x 1 luc devcom 170K Nov 27 07:50 demo.so*
-rwxr-xr-x 1 luc devcom 121K Nov 27 07:50 demo_strip.so*
-rw-r–r-- 1 luc devcom 902 Nov 27 07:18 demo.cpp
~> g++ --version
g++ (GCC) 4.4.4 20100630 (Red Hat 4.4.4-10)

to

~> clang++ -c -DBOOST_PYTHON_MAX_BASES=2 -fPIC -fno-strict-aliasing -DNDEBUG -O3 -DBOOST_ALL_NO_LIB -DBOOST_DISABLE_THREADS -I/Users/luc/Developer/cctbx/boost -I/System/Library/Frameworks/Python.framework/Versions/2.6/include/python2.6 demo.cpp
~> clang++ -o demo.so -w -bundle -undefined dynamic_lookup demo.o -L$cctbx_build_clang_pch/lib -lboost_python
~> clang++ -o demo_strip.so -s -w -bundle -undefined dynamic_lookup demo.o -L$cctbx_build_clang_pch/lib -lboost_python
~> ls -lhS
total 1184
-rwxr-xr-x 1 luc luc 203K Nov 27 16:56 demo.so
-rwxr-xr-x 1 luc luc 202K Nov 27 16:56 demo_strip.so
-rw------- 1 luc luc 178K Nov 27 16:56 demo.o
-rw-r–r–@ 1 luc luc 952B Nov 27 05:13 demo.cpp

~> > clang++ --version
clang version 2.9 (trunk 116866)
Target: x86_64-apple-darwin10
Thread model: posix

First demo.so is significantly smaller than demo.o with gcc 4.4 whereas this is the other way around with clang. Moreover the stripping -s option significantly reduce the size of the .so further whereas clang -s does not do so.

Is there any way to make clang++ competitive with GCC 4.4 here?

Thanks for any insight,

Luc Bourhis

There certainly is, but only if we narrow down the problem to something manageable. Are there functions in the Clang-compiled demo.so that are significantly larger than their GCC-compiled counterparts? If so, we could look at those particular functions to see why we’re generating more code.

Also, for code size, -O2 or -Os are generally a better bet than -O3.

  • Doug

Thanks for your answer.

Are there functions in the Clang-compiled demo.so that are significantly larger than their GCC-compiled counterparts?

Could you give me a few pointers as to how I would find those size? I would then be more than happy to do a thorough investigation.

Also, for code size, -O2 or -Os are generally a better bet than -O3.

clang++ -Os and -O3 gives nearly the same size for the .o file here. The problem appears when clang++ links that .o to make the .so
It's is linking against a dylib Boost.Python library by the by. But I may be missing your point completely here.

Luc

Any reason why demo.cpp is 952 bytes for clang and 902 bytes for gcc?

Also, it looks like you’re comparing Mac OS X to Linux. The binary format on each is very different.

-Henry

Also you’re creating a bundle rather than a dylib. You want -dylib not -bundle.

-eric

Any reason why demo.cpp is 952 bytes for clang and 902 bytes for gcc?

Different hard drives. They are the same file, I swear.

Also, it looks like you're comparing Mac OS X to Linux.

I ran gcc 4.4 on Linux and clang on MacOS X indeed.

The binary format on each is very different.

Fair enough. I did not take the time to install gcc 4.4 on my MacOS X machine indeed. Nevertheless, this begs the question: is clang++ hampered by MacOS X binary format to such an extent that the -s stripping option has no effect whatsoever? Whereas g++ 4.4 on Linux is able with the same option to significantly reduce the binary size thanks to the superior ELF 64-bit format? Or could clang++ do better as a linker driver on MacOS X here?

Luc Bourhis

Also, it looks like you're comparing Mac OS X to Linux. The binary format on each is very different.

Also you're creating a bundle rather than a dylib. You want -dylib not -bundle.

Could you explain how a dylib would be superior to a bundle for the binary size problem at stake?

Luc Bourhis

Should be nothing here, but bundles aren’t quite dylibs (they can be unloaded etc) and as you were creating an elf dso I thought I’d mention it.

Mostly this appears to be a linker problem and I’m not sure why - a testcase would help immensely.

-eric

It's not just file format. The two have very different codegen for things like accessing globals (position independent code), ABI differences as well as other things. You really have to compare on the same machine in the same config to do a useful comparison.

-Chris

You really have to compare on the same machine in the same config to do a useful comparison

So I have installed clang trunk on the very same machine where I ran gcc 4.4. Compiling the very same demo.cpp, here is the result: 240K with g++ and 294K with clang++, i.e. 22% more. I stared at the nm output for 5 mins without much success.

Worth a bug report?

Luc Bourhis

It sounds like an important issue to track down, but unless there is a reduced testcase, no one is likely to have time to look at it in the near future.

-Chris