computed goto/labels as values in interpreters

After searching list archives and bug reports I wasn't able to conclude
whether computed goto/labels as values as used in Erlang and other (Guile)
interpreters is supported in clang 3.1. With some BSDs working to make
clang/clang++ the base cc/c++ this is going to be even more important.

Is clang 3.1 supposed to compile beam_emu.c correctly and with the
same or comparable optimization result?

https://github.com/erlang/otp/blob/master/erts/emulator/beam/beam_emu.c

Yes, it's supposed to.

-Eli

Yes, seems to work but with some interesting differences compared to
gcc. While gcc-4.7 takes 6s to recompile Erlang's emulator if I
'touch beam_emu.c', clang takes 2m11s and consumes around 141MB RES vs
40MB gcc. Not sure why but clang also seemed to result in a longer
time to run ar. Tests were conducted on i386 linux 3.5.1 with
glibc-2.15 against erlang's maint branch. Next, a process which is
known to be slow is building some of the generated C++ files in lib/wx
(wxWidgets Erlang bindings). Now, when linking erl_gl.so with clang
it took forever to finish (didn't time). This is similar to the ar
archive step in erts/emulator. Is there anything inherently slow with
gcc-4.7, glibc-2.15, and binutils-2.22? ar and ld from binutils at
fault? While linking erl_gl.so RES of clang was at 265MB. It seemed
to be an infinite loop and I stopped it after more than 5 minutes.

Are these expected performance results?

After searching list archives and bug reports I wasn't able to conclude
whether computed goto/labels as values as used in Erlang and other (Guile)
interpreters is supported in clang 3.1. With some BSDs working to make
clang/clang++ the base cc/c++ this is going to be even more important.

Is clang 3.1 supposed to compile beam_emu.c correctly and with the
same or comparable optimization result?

https://github.com/erlang/otp/blob/master/erts/emulator/beam/beam_emu.c

Yes, it's supposed to.

Yes, seems to work but with some interesting differences compared to
gcc. While gcc-4.7 takes 6s to recompile Erlang's emulator if I
'touch beam_emu.c', clang takes 2m11s and consumes around 141MB RES vs
40MB gcc.

IIRC, there's been some work in this area since 3.1. If you're still
seeing issues on trunk, please file a bug.

Not sure why but clang also seemed to result in a longer
time to run ar. Tests were conducted on i386 linux 3.5.1 with
glibc-2.15 against erlang's maint branch. Next, a process which is
known to be slow is building some of the generated C++ files in lib/wx
(wxWidgets Erlang bindings). Now, when linking erl_gl.so with clang
it took forever to finish (didn't time). This is similar to the ar
archive step in erts/emulator. Is there anything inherently slow with
gcc-4.7, glibc-2.15, and binutils-2.22? ar and ld from binutils at
fault? While linking erl_gl.so RES of clang was at 265MB. It seemed
to be an infinite loop and I stopped it after more than 5 minutes.

I'm not following this... linking shouldn't be spending any
substantial amount of time in the "clang" process. clang just invokes
ld.

If you have a reproducible infinite loop, please file a bug.

-Eli

After searching list archives and bug reports I wasn't able to conclude
whether computed goto/labels as values as used in Erlang and other (Guile)
interpreters is supported in clang 3.1. With some BSDs working to make
clang/clang++ the base cc/c++ this is going to be even more important.

Is clang 3.1 supposed to compile beam_emu.c correctly and with the
same or comparable optimization result?

https://github.com/erlang/otp/blob/master/erts/emulator/beam/beam_emu.c

Yes, it's supposed to.

Yes, seems to work but with some interesting differences compared to
gcc. While gcc-4.7 takes 6s to recompile Erlang's emulator if I
'touch beam_emu.c', clang takes 2m11s and consumes around 141MB RES vs
40MB gcc.

IIRC, there's been some work in this area since 3.1. If you're still
seeing issues on trunk, please file a bug.

Thanks for the info. From what I can see beam_emu did get built and
I was able to run the emulator. I didn't do any performance test because
AFAIR previously it didn't result in a runnable executable at all. This
result with enough success for now.

Not sure why but clang also seemed to result in a longer
time to run ar. Tests were conducted on i386 linux 3.5.1 with
glibc-2.15 against erlang's maint branch. Next, a process which is
known to be slow is building some of the generated C++ files in lib/wx
(wxWidgets Erlang bindings). Now, when linking erl_gl.so with clang
it took forever to finish (didn't time). This is similar to the ar
archive step in erts/emulator. Is there anything inherently slow with
gcc-4.7, glibc-2.15, and binutils-2.22? ar and ld from binutils at
fault? While linking erl_gl.so RES of clang was at 265MB. It seemed
to be an infinite loop and I stopped it after more than 5 minutes.

I'm not following this... linking shouldn't be spending any
substantial amount of time in the "clang" process. clang just invokes
ld.

Me neither, it's totally unexpected and surprising.

If you have a reproducible infinite loop, please file a bug.

I won't be able to conduct another set of tests including clang trunk
before the middle of next week, sorry.

If anyone wants to try in the meantime here are the steps:

$ git clone git://github.com/erlang/otp.git
$ cd otp
$ export ERL_TOP=$PWD
$ export PATH=$ERL_TOP/bin:$PATH
$ export CC=clang
$ export CXX=clang++
$ ./otp_build setup -a --prefix=$PWD/localinstall
$ make install
You can run and test the emulator without installing locally.
If you want to build the emulator only to see about beam_emu.c
$ ./otp_build autoconf && ./otp_build configure --prefix=$PWD/localinstall
$ cd erts && make
erl_gl.so is built in lib/wx
$ cd lib/wx && make
This naturally requires wxWidgets

Could the long and in wxErlang's case never ending link phase be explained
by the use of the gold linker in this linux distro's clang 3.1 package?
Are there known issues/incompatibilities with that?

I've built llvm and clang from git today and built otp.git maint branch
with both clang and gcc and didn't forget to time it.

clang:
$ export CC=clang
$ export CXX=clang++
$ export MAKEFLAGS=-j2
$ /usr/bin/time ./otp_build setup -a
1549.29user 90.32system 17:42.24elapsed 154%CPU (0avgtext+0avgdata
167100maxresident)k
8968inputs+314328outputs (36major+21665287minor)pagefaults 0swaps

gcc in a new shell after dropping/flusing the kernel's fs cache:
$ export MAKEFLAGS=-j2
$ /usr/bin/time ./otp_build setup -a
1355.43user 89.78system 14:22.01elapsed 167%CPU (0avgtext+0avgdata
73656maxresident)k
218288inputs+335936outputs (333major+19829177minor)pagefaults 0swaps

This seems to support the previous results.

I don't know whether this clang build uses the gold linker, it's
all just defaults detected by ./configure.

Can Eli or someone else confirm that clang/llvm is slower and uses
more memory when building Erlang/OTP?

Can Eli or someone else confirm that clang/llvm is slower and uses
more memory when building Erlang/OTP?

Can you isolate which compilation unit is slower and/or uses more memory?

Thanks,
Rafael

I can try picking likely files and confirm they're such.
That wouldn't mean it's the only files. Would that help?
Is it important to compare compile and link steps?
If it is I'd have to see how I can measure it without messing
around too much.

I can try picking likely files and confirm they're such.
That wouldn't mean it's the only files. Would that help?
Is it important to compare compile and link steps?
If it is I'd have to see how I can measure it without messing
around too much.

It should help yes. Even if there are multiple problems, fixing one
makes it easy to isolate others. I would suggest starting with the
file clang takes the longest to compile relative to gcc.

Cheers,
Rafael

With no MAKEFLAGS set and in the trees configured and already built
as described in my earlier email:

$ cd otp/gcc
$ export ERL_TOP=$PWD
$ export PATH=$ERL_TOP/bin:$PATH
$ cd erts
$ touch emulator/beam/beam_emu.c
$ /usr/bin/time make
11.09user 0.83system 0:18.99elapsed 62%CPU (0avgtext+0avgdata 63448maxresident)k
89712inputs+37832outputs (124major+265927minor)pagefaults 0swaps
$ cd ../lib/wx
$ touch c_src/*.c* c_src/gen/*.c* && /usr/bin/time make
113.29user 6.28system 2:00.54elapsed 99%CPU (0avgtext+0avgdata
623272maxresident)k
0inputs+38112outputs (0major+1684363minor)pagefaults 0swaps

# new shell
$ cd otp/clang
$ export ERL_TOP=$PWD
$ export PATH=$ERL_TOP/bin:$PATH
# even though it's all set to clang and clang++ from the ./configure step,
# let's make sure the we have the same environment configured
$ export CC=clang
$ export CXX=clang++
$ cd erts
$ touch emulator/beam/beam_emu.c
$ /usr/bin/time make
270.64user 0.96system 4:42.57elapsed 96%CPU (0avgtext+0avgdata
164248maxresident)k
118008inputs+26656outputs (298major+284692minor)pagefaults 0swaps
$ cd ../lib/wx
$ touch c_src/*.c* c_src/gen/*.c* && /usr/bin/time make
# c_src/gen/wxe_funcs.cpp is generated code and seems to infinite loop
on compile
# killed the make process after about 32mins of trying to build wxe_funcs.cpp
^Cmake[1]: *** [i686-pc-linux-gnu/wxe_funcs.o] Interrupt
make: *** [opt] Interrupt
Command terminated by signal 2
8.65user 0.56system 33:25.77elapsed 0%CPU (0avgtext+0avgdata 63860maxresident)k
48inputs+4104outputs (0major+177432minor)pagefaults 0swaps

excerpt from top for the offending clang invocation before killing:
PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND
20 0 339m 314m 21m R 99.8 15.8 32:44.97 clang

llvm/clang is a fresh build from last night just after the compile fix
in projects/compiler-rt was committed:
$ clang -v
clang version 3.2 (http://llvm.org/git/clang.git
6defd9f31eec51278d056f1bff885018e2321373)
(http://llvm.org/git/llvm.git ff1547890a5af47c215bf7e1f1da85bae6aabe4d)
Target: i386-pc-linux-gnu
Thread model: posix

# c_src/gen/wxe_funcs.cpp is generated code and seems to infinite loop
on compile
# killed the make process after about 32mins of trying to build wxe_funcs.cpp

Can you provide wxe_funcs.ii an the "clang -cc1" command line?

Thanks,
Rafael

I tried but seem to have forgotten something important as
clang++ failed to find <assert.h>:

$ clang++ -cc1 -I/usr/lib/wx/include/gtk2-unicode-release-static-2.8 \
    -I/usr/include/wx-2.8 -D_FILE_OFFSET_BITS=64 -D_LARGE_FILES \
    -D__WXGTK__ -pthread -g -Wall -O2 -D_GNU_SOURCE -D_THREAD_SAFE \
    -D_REENTRANT -DPACKAGE_NAME=\"\" -DPACKAGE_TARNAME=\"\" \
    -DPACKAGE_VERSION=\"\" -DPACKAGE_STRING=\"\" -DPACKAGE_BUGREPORT=\"\" \
    -DPACKAGE_URL=\"\" -DSTDC_HEADERS=1 -DHAVE_SYS_TYPES_H=1 \
    -DHAVE_SYS_STAT_H=1 -DHAVE_STDLIB_H=1 -DHAVE_STRING_H=1 \
    -DHAVE_MEMORY_H=1 -DHAVE_STRINGS_H=1 -DHAVE_INTTYPES_H=1 \
    -DHAVE_STDINT_H=1 -DHAVE_UNISTD_H=1 -DSIZEOF_VOID_P=4 \
    -DHAVE_GL_GL_H=1 -DHAVE_GL_SUPPORT=1 -DHAVE_GLINTPTR=1 \
    -DHAVE_GLINTPTRARB=1 -DHAVE_GLCHAR=1 -DHAVE_GLCHARARB=1 \
    -DHAVE_GLHALFARB=1 -DHAVE_GLINT64EXT=1 -DHAVE_WX_STC_STC_H=1 \
    gen/wxe_funcs.cpp
In file included from gen/wxe_funcs.cpp:22:
In file included from /usr/include/wx-2.8/wx/wx.h:15:
In file included from /usr/include/wx-2.8/wx/defs.h:521:
/usr/include/wx-2.8/wx/debug.h:18:11: fatal error: 'assert.h'
file not found
#include <assert.h>
          ^
1 error generated.

Rafael any idea?

Hi Carsten,

Just run clang++ normally with the extra option -###. That'll tell you
the exact cc1 command line you need.

Cheers,

James

Thanks, I was able to get the missing flags and had to add two -I paths.

I still cannot create the .ii files Rafael asked for. Any hints?

It's really trivial to build Erlang and you don't need any special dependencies
except that wxErlang naturally requires wxWidgets 2.8, if you want to try:
$ git clone git://github.com/erlang/otp.git
$ cd otp
$ export CC=clang CXX=clang++
$ ./otp_build setup -a
# if you have wxWidgets and OpenGL headers installed,
# you will see the native part of wxErlang building and infinite looping

Hi Carsten,

He's asking for preprocessed source. If you run clang -cc1 in '-E'
mode instead of the normal '-c', it will run only the preprocessor and
emit source that is easy to compile elsewhere without dependencies.

Cheers,

James

Hi Carsten,

He's asking for preprocessed source. If you run clang -cc1 in '-E'
mode instead of the normal '-c', it will run only the preprocessor and
emit source that is easy to compile elsewhere without dependencies.

Mailed it privately to Eli, Rafael and you as it's a large file
and may contain info not meant for the public.

Thanks for the help.

Sending the files again, now publicly. Can you test and try to fix
the infinite loop now?

Any news on the slow and memory intensive compile of beam_emu.c
mentioned earlier?

wxe_funcs.cpp.gz (88.9 KB)

wxe_funcs.cpp.preprocessed.gz (457 KB)