Help needed on debugging llvm

Hi,

http://llvm.org/bugs/show_bug.cgi?id=14185

I am stuck on analysis. Does any one have alternate suggestions on debugging llvm? (Please refer to comments for the work done so far)

Hi Anitha,

http://llvm.org/bugs/show_bug.cgi?id=14185
I am stuck on analysis. Does any one have alternate suggestions on debugging
llvm? (Please refer to comments for the work done so far)

try to reduce a small standalone testcase which is an LLVM IR (.ll) file.

Ciao, Duncan.

Yes. Unfortunately, that is the challenge at the moment.

Hi Duncan

I am facing a build error about __builtin_iceil when compiled with dragonegg using -ffast-math option. My dragonegg is built with gcc-4.7.0
(I am compiling namd spec benchmark here again).

Any idea?

g++ -march=bdver2 -save-temps -fplugin=/home/anboyapa/install/bin/dragonegg.so -O2 -march=bdver2 -save-temps -fplugin=/home/anboyapa/install/bin/dragonegg.so -mno-fma -mfma4 -ffast-math -DSPEC_CPU_LP64 Compute.o ComputeList.o ComputeNonbondedUtil.o LJTable.o Molecule.o Patch.o PatchList.o ResultSet.o SimParameters.o erf.o spec_namd.o -o namd
spec_namd.o: In function main': spec_namd.C:(.text+0x2a3): undefined reference to __builtin_iceil’
collect2: error: ld returned 1 exit status
specmake: *** [namd] Error 1

Hi Duncan

I am facing a build error about __builtin_iceil when compiled with dragonegg using -ffast-math option. My dragonegg is built with gcc-4.7.0
(I am compiling namd spec benchmark here again).

Any idea?

Just wanted to add that without -ffast-math, build goes fine.

Hi Anitha,

Hi Duncan
I am facing a build error about __builtin_iceil

it's surely just that dragonegg doesn't have any support for this builtin.
Please open a bug report with a minimal test case.

Ciao, Duncan.

  when compiled

Hi Anitha,

Hi Anitha,

Hi Duncan
I am facing a build error about __builtin_iceil

it’s surely just that dragonegg doesn’t have any support for this builtin.

ok. Just verified that Target.cpp and x86_builtins do not have iceil support.

I have this tricky situation - I use dragonegg generated LLVM IR as input to clang for some analysis (well it is clang++ actually). Understably,clang cribs looking at __builtin_iceil. Any idea how to resolve that as well?

clang++ -O2 -march=bdver2 -mno-fma -save-temps -mfma4 -ffp-contract=fast -DSPEC_CPU_LP64 Compute.o ComputeList.o ComputeNonbondedUtil.o LJTable.o Molecule.o Patch.o PatchList.o ResultSet.o SimParameters.o erf.o spec_namd.o -o namd
spec_namd.o: In function main': spec_namd.ll:(.text+0x2a3): undefined reference to __builtin_iceil’

Please open a bug report with a minimal test case

Will do. Thanks

-Anitha

Hi Anitha,

    it's surely just that dragonegg doesn't have any support for this builtin.

ok. Just verified that Target.cpp and x86_builtins do not have iceil support.
I have this tricky situation - I use dragonegg generated LLVM IR as input to
clang for some analysis (well it is clang++ actually). Understably,clang cribs
looking at __builtin_iceil. Any idea how to resolve that as well?

adding dragonegg support for iceil would solve both problems.

Ciao, Duncan.

Hi Anitha,

it’s surely just that dragonegg doesn’t have any support for this builtin.

ok. Just verified that Target.cpp and x86_builtins do not have iceil support.
I have this tricky situation - I use dragonegg generated LLVM IR as input to
clang for some analysis (well it is clang++ actually). Understably,clang cribs
looking at __builtin_iceil. Any idea how to resolve that as well?

adding dragonegg support for iceil would solve both problems.

Cool. http://llvm.org/bugs/show_bug.cgi?id=14270

Thanks Duncan.

  • Anitha

I could corner down the segfault to a single function in source file. But the problem is - if that function is responsible for segfault or if it is the optimization somewhere else that is driving the segfault. In the worst case it could be so. I am yet to dive deeper there.

Meanwhile, I have some question w.r.t “-fplugin-arg-dragonegg-emit-ir”. Lets say I use the following command:

[1]. g++ -O2 -march=bdver2 fplugin=dragonegg.so -mno-fma -mfma4 -fplugin-arg-dragonegg-emit-ir -S -ffast-math <test.c> -o <test.ll>

Does the above command produce an IR that is already optimized because of “-O2 -ffast-math -mno-fma -mfma4” ?

[2]. If I feed the above generated <test.ll> to clang as follows:

clang -O3 -march=bdver2 -mno-fma -mfma4 -ffp-contract=fast <test.ll>

Does clang proceed to optimize the test.ll w.r.t “-O3 -ffp-contract=fast -mno-fma -mfma4” ? (I am not sure if -ffp-contract=fast has any effect there, but I could be wrong though)

Also thanks for the immediate fix for __builtin_iceil(). I can see that the issue got resolved.

-Anitha

Hi Anitha,

        http://llvm.org/bugs/show_bug.____cgi?id=14185
        <http://llvm.org/bugs/show_bug.__cgi?id=14185>

                 <http://llvm.org/bugs/show___bug.cgi?id=14185
        <http://llvm.org/bugs/show_bug.cgi?id=14185>>
                 I am stuck on analysis. Does any one have alternate suggestions
        on debugging
                 llvm? (Please refer to comments for the work done so far)

             try to reduce a small standalone testcase which is an LLVM IR (.ll)
        file

        Yes. Unfortunately, that is the challenge at the moment.

    did you reduce everything down to one problematic source file? If so, you can
    then start moving stuff out of that file to an auxiliary file until you only
    have left a minimal core that shows the problem. But maybe you did that
    already?

I could corner down the segfault to a single function in source file. But the
problem is - if that function is responsible for segfault or if it is the
optimization somewhere else that is driving the segfault. In the worst case
it could be so. I am yet to dive deeper there.

you should try to determine which compilation stage introduces the segmentation
fault (optimizers, codegen?). It sounds like you are trying to do so already,
more comments below.

Meanwhile, I have some question w.r.t "-fplugin-arg-dragonegg-emit-ir". Lets say
I use the following command:
[1]. g++ -O2 -march=bdver2 fplugin=dragonegg.so -mno-fma -mfma4
-fplugin-arg-dragonegg-emit-ir -S -ffast-math <test.c> -o <test.ll>
Does the above command produce an IR that is already optimized because of "-O2
-ffast-math -mno-fma -mfma4" ?

Yes, it produces optimized IR due to -O2. If you want unoptimized IR then add
-fplugin-arg-dragonegg-llvm-ir-optimize=0

That way the output should be exactly the same as the output dragonegg would
normally run the LLVM optimizers on, e.g. GCC constant folding and other such
optimizations which get turned on at -O2 will still have happened (dragonegg
turns off almost all GCC optimizations by default, but turning everything off
isn't practical).

Running the output through "opt -O2" should then do the same optimizations as
dragonegg would have done. I say "should" because in my experience this isn't
always true, though it is supposed to be true.

Next comes the codegen stage, which you can emulate using llc (or clang like
you do below, but llc is more direct). It isn't that easy finding out exactly
what flags dragonegg passes to llc, so this might be a bit painful.

[2]. If I feed the above generated <test.ll> to clang as follows:
clang -O3 -march=bdver2 -mno-fma -mfma4 -ffp-contract=fast <test.ll>
Does clang proceed to optimize the test.ll w.r.t "-O3 -ffp-contract=fast
-mno-fma -mfma4" ? (I am not sure if -ffp-contract=fast has any effect there,
but I could be wrong though)

I don't know. You can always run it with and without -O3 to see if the output
changes. Likewise for the other options. The advantage of using llc is that
you have a better idea what is being done.

Also thanks for the immediate fix for __builtin_iceil(). I can see that the
issue got resolved.

Thanks for confirming.

Ciao, Duncan.

One of the many combinations that gives me some clue that Code Generation (or optimizations during CodeGen) is going wrong.

I have compiled the problematic function with DragonEgg to emit llvm IR (say test.ll). This is now fed to llc as follows:

Original Buggy Case: [Turns off FMA3, Turns on FMA4]

llc -fp-contract=fast -O0 -mcpu=bdver2 -mattr=-fma,+fma4 test.ll -o ComputeNonbondedUtil.s

Correct Case: [Turns on FMA3]

llc -fp-contract=fast -O0 -mcpu=bdver2 -mattr=+fma test.ll -o ComputeNonbondedUtil.s

Then compiled and linked as usual with clang. The first case where FMA4 is turned on reproduces the bug even though -O0 is used. Thus, it rules out any front-end specific possibilities.

However it is a little surprising why the same does not show when dragonegg is fully used for end-to-end compilation.

-Anitha

That way the output should be exactly the same as the output dragonegg would
normally run the LLVM optimizers on, e.g. GCC constant folding and other such
optimizations which get turned on at -O2 will still have happened (dragonegg
turns off almost all GCC optimizations by default, but turning everything off
isn’t practical).

Running the output through “opt -O2” should then do the same optimizations as
dragonegg would have done. I say “should” because in my experience this isn’t
always true, though it is supposed to be true.

Ok.

Next comes the codegen stage, which you can emulate using llc (or clang like
you do below, but llc is more direct). It isn’t that easy finding out exactly
what flags dragonegg passes to llc, so this might be a bit painful.

Yes. This is probably what I need now. Code Generation options used by dragonegg vs clang (or llc as I referred to in last email).

Thanks for all the suggestions.

  • Anitha

Are you still having issues with FMA4? I wonder if PR15040 is related. A fix was just committed.

Are you still having issues with FMA4? I wonder if PR15040 is related. A
fix was just committed.

It seems to be so! I will look into it immediately.

Apologies for the late e-mail. I ran out of time devoted for this PR
and moved on. Coincidentally, only today I came back to this PR for
further debugging.

Thanks!

Are you still having issues with FMA4? I wonder if PR15040 is related. A
fix was just committed.

Unfortunately r173176 does not fix this. I have updated the trunk and
ran...Miscompare still persists.

Going by http://llvm.org/bugs/show_bug.cgi?id=14185#c12 , I am
inclined to think that it is an optimization. (If it is an encoding
issue, dragonegg would have failed as well...). or maybe I am missing
something here.

-Anitha

Ah, I am taking back my above words w.r.t encoding. -no-integrated-as
does fix the issue! This definitely points towards FMA4 encoding in
clang's integrated assembler. This fits into the analysis as well -
dragonegg *might not* be using integrated assembler at all.

Here is a log of my run that ended on happy note using
-no-integrated-as. Thanks Craig :slight_smile:

/local/home/anitha/cpu2006/bin/specinvoke -E -d /local/home/anitha/cpu2006/bench
spec/CPU2006/444.namd/run/run_peak_ref_llvm.0001 -c 1 -e compare.err -o compare.
stdout -f compare.cmd
Success: 1x444.namd
Producing Raw Reports
mach: default
  ext: llvm
    size: ref
      set: int
      set: fp
        format: raw -> /local/home/anitha/cpu2006/result/CFP2006.102.ref.rsf
Parsing flags for 444.namd peak: done
Doing flag reduction: done
        format: flags -> /local/home/anitha/cpu2006/result/CFP2006.102.ref.flags
.html
        format: ASCII -> /local/home/anitha/cpu2006/result/CFP2006.102.ref.txt
        format: HTML -> /local/home/anitha/cpu2006/result/CFP2006.102.ref.html,
/local/home/anitha/cpu2006/result/CFP2006.102.ref.gif

The log for this run is in /local/home/anitha/cpu2006/result/CPU2006.102.log

Hi Anitha,

Ah, I am taking back my above words w.r.t encoding. -no-integrated-as
does fix the issue! This definitely points towards FMA4 encoding in
clang's integrated assembler. This fits into the analysis as well -
dragonegg *might not* be using integrated assembler at all.

you are right, dragonegg does not use the integrated assembler.

Ciao, Duncan.

Thanks for the confirmation Duncan. The bug turned out to be a hard
nut. I have taken assembler for granted.

-Anitha