Help needed after hiatus

Hi,

I've restarted my Elsa/LLVM project after three months of having real life intrude. I upgraded my LLVM source to the current trunk. I had to make a few changes to my source, e.g. LLVMFoldingBuilder became IRBuilder and several instances of "new" became "Create".

Now, a test case that previously succeeded fails. I run the following script:

#!/bin/sh
if [ 1 -ne 0 ] ; then
   echo -n "ellsif:"
   /usr/bin/time -f "real %e user %U sys %S" ../elsa/ellsif/ellsif -v -O0 -g -o ellbzip2 -D_FILE_OFFSET_BITS=64 $* bzip2.c crctable.c randtable.c compress.c blocksort.c huffman.c decompress.c bzlib.c
   if [ $? -ne 0 ] ; then exit $?; fi
   echo -n "1:"; /usr/bin/time -f "real %e user %U sys %S" ./ellbzip2 -1 < sample1.ref > sample1.rb2
   echo -n "2:"; /usr/bin/time -f "real %e user %U sys %S" ./ellbzip2 -2 < sample2.ref > sample2.rb2
   echo -n "3:"; /usr/bin/time -f "real %e user %U sys %S" ./ellbzip2 -3 < sample3.ref > sample3.rb2
   echo -n "4:"; /usr/bin/time -f "real %e user %U sys %S" ./ellbzip2 -d < sample1.bz2 > sample1.tst
   echo -n "5:"; /usr/bin/time -f "real %e user %U sys %S" ./ellbzip2 -d < sample2.bz2 > sample2.tst
   echo -n "6:"; /usr/bin/time -f "real %e user %U sys %S" ./ellbzip2 -ds < sample3.bz2 > sample3.tst
   cmp sample1.bz2 sample1.rb2
   cmp sample2.bz2 sample2.rb2
   cmp sample3.bz2 sample3.rb2
   cmp sample1.tst sample1.ref
   cmp sample2.tst sample2.ref
   cmp sample3.tst sample3.ref
fi

At the -O0 optimization level, it works just fine:

~/bzip2-1.0.4] main% ./ellsamake
ellsif:<premain>: CommandLine Error: Argument 'machine-licm' defined more than once!
ellsif: CommandLine Error: Argument 'machine-licm' defined more than once!
   adding bzip2.c as a C file
   adding crctable.c as a C file
   adding randtable.c as a C file
   adding compress.c as a C file
   adding blocksort.c as a C file
   adding huffman.c as a C file
   adding decompress.c as a C file
   adding bzlib.c as a C file
Phase: Preprocessing
   preprocess bzip2.c to become a preprocessed C file
     /usr/bin/gcc -E -o bzip2.i bzip2.c
   preprocess crctable.c to become a preprocessed C file
     /usr/bin/gcc -E -o crctable.i crctable.c
   preprocess randtable.c to become a preprocessed C file
     /usr/bin/gcc -E -o randtable.i randtable.c
   preprocess compress.c to become a preprocessed C file
     /usr/bin/gcc -E -o compress.i compress.c
   preprocess blocksort.c to become a preprocessed C file
     /usr/bin/gcc -E -o blocksort.i blocksort.c
   preprocess huffman.c to become a preprocessed C file
     /usr/bin/gcc -E -o huffman.i huffman.c
   preprocess decompress.c to become a preprocessed C file
     /usr/bin/gcc -E -o decompress.i decompress.c
   preprocess bzlib.c to become a preprocessed C file
     /usr/bin/gcc -E -o bzlib.i bzlib.c
Phase: Translation
   compile bzip2.i to become an unoptimized LLVM bitcode file
   bzip2.i has been deleted
   compile crctable.i to become an unoptimized LLVM bitcode file
   crctable.i has been deleted
   compile randtable.i to become an unoptimized LLVM bitcode file
   randtable.i has been deleted
   compile compress.i to become an unoptimized LLVM bitcode file
   compress.i has been deleted
   compile blocksort.i to become an unoptimized LLVM bitcode file
   blocksort.i has been deleted
   compile huffman.i to become an unoptimized LLVM bitcode file
   huffman.i has been deleted
   compile decompress.i to become an unoptimized LLVM bitcode file
   decompress.i has been deleted
   compile bzlib.i to become an unoptimized LLVM bitcode file
   bzlib.i has been deleted
Phase: Optimization
   optimize bzip2.ubc to become an LLVM bitcode file
   optimize crctable.ubc to become an LLVM bitcode file
   optimize randtable.ubc to become an LLVM bitcode file
   optimize compress.ubc to become an LLVM bitcode file
   optimize blocksort.ubc to become an LLVM bitcode file
   optimize huffman.ubc to become an LLVM bitcode file
   optimize decompress.ubc to become an LLVM bitcode file
   optimize bzlib.ubc to become an LLVM bitcode file
Phase: Bitcode linking
   bclink bzip2.bc to become a file that has been linked
   bclink crctable.bc to become a file that has been linked
   bclink randtable.bc to become a file that has been linked
   bclink compress.bc to become a file that has been linked
   bclink blocksort.bc to become a file that has been linked
   bclink huffman.bc to become a file that has been linked
   bclink decompress.bc to become a file that has been linked
   bclink bzlib.bc to become a file that has been linked
   bzip2.bc was consumed by the bitcode linker
   crctable.bc was consumed by the bitcode linker
   randtable.bc was consumed by the bitcode linker
   compress.bc was consumed by the bitcode linker
   blocksort.bc was consumed by the bitcode linker
   huffman.bc was consumed by the bitcode linker
   decompress.bc was consumed by the bitcode linker
   bzlib.bc was consumed by the bitcode linker
   bclink ellbzip2.bc added to the file list
Phase: Generating
   generate ellbzip2.bc to become an assembly source file
Phase: Linking
   assemble ellbzip2.s to become a file that has been linked
     /usr/bin/gcc -fno-strict-aliasing -O3 -o ellbzip2 ellbzip2.s
   ellbzip2.s has been deleted
real 5.72 user 5.50 sys 0.13
1:real 0.03 user 0.03 sys 0.00
2:real 0.07 user 0.07 sys 0.00
3:real 0.17 user 0.16 sys 0.00
4:real 0.00 user 0.00 sys 0.00
5:real 0.02 user 0.01 sys 0.00
6:real 0.01 user 0.01 sys 0.00
[~/bzip2-1.0.4] main%

If I use a higher optimization level, the compilation fails (e.g. -O5):

[~/bzip2-1.0.4] main% ./ellsamake
ellsif:<premain>: CommandLine Error: Argument 'machine-licm' defined more than once!
ellsif: CommandLine Error: Argument 'machine-licm' defined more than once!
   adding bzip2.c as a C file
   adding crctable.c as a C file
   adding randtable.c as a C file
   adding compress.c as a C file
   adding blocksort.c as a C file
   adding huffman.c as a C file
   adding decompress.c as a C file
   adding bzlib.c as a C file
Phase: Preprocessing
   preprocess bzip2.c to become a preprocessed C file
     /usr/bin/gcc -E -o bzip2.i bzip2.c
   preprocess crctable.c to become a preprocessed C file
     /usr/bin/gcc -E -o crctable.i crctable.c
   preprocess randtable.c to become a preprocessed C file
     /usr/bin/gcc -E -o randtable.i randtable.c
   preprocess compress.c to become a preprocessed C file
     /usr/bin/gcc -E -o compress.i compress.c
   preprocess blocksort.c to become a preprocessed C file
     /usr/bin/gcc -E -o blocksort.i blocksort.c
   preprocess huffman.c to become a preprocessed C file
     /usr/bin/gcc -E -o huffman.i huffman.c
   preprocess decompress.c to become a preprocessed C file
     /usr/bin/gcc -E -o decompress.i decompress.c
   preprocess bzlib.c to become a preprocessed C file
     /usr/bin/gcc -E -o bzlib.i bzlib.c
Phase: Translation
   compile bzip2.i to become an unoptimized LLVM bitcode file
   bzip2.i has been deleted
   compile crctable.i to become an unoptimized LLVM bitcode file
   crctable.i has been deleted
   compile randtable.i to become an unoptimized LLVM bitcode file
   randtable.i has been deleted
   compile compress.i to become an unoptimized LLVM bitcode file
   compress.i has been deleted
   compile blocksort.i to become an unoptimized LLVM bitcode file
   blocksort.i has been deleted
   compile huffman.i to become an unoptimized LLVM bitcode file
   huffman.i has been deleted
   compile decompress.i to become an unoptimized LLVM bitcode file
   decompress.i has been deleted
   compile bzlib.i to become an unoptimized LLVM bitcode file
   bzlib.i has been deleted
Phase: Optimization
   optimize bzip2.ubc to become an LLVM bitcode file
ellsif: /home/rich/llvm-trunk-new/lib/VMCore/Value.cpp:63: virtual llvm::Value::~Value(): Assertion `use_empty() && "Uses remain when a value is destroyed!"' failed.
../elsa/ellsif/ellsif[0x899d61e]
../elsa/ellsif/ellsif[0x899d750]
[0x110420]
/lib/libc.so.6(abort+0x101)[0xa62f91]
/lib/libc.so.6(__assert_fail+0xee)[0xa5a93e]
../elsa/ellsif/ellsif[0x894eee4]
../elsa/ellsif/ellsif[0x821e061]
../elsa/ellsif/ellsif[0x890c215]
../elsa/ellsif/ellsif[0x890c863]
../elsa/ellsif/ellsif[0x891c54f]
../elsa/ellsif/ellsif[0x81b3b73]
../elsa/ellsif/ellsif[0x88004fd]
../elsa/ellsif/ellsif[0x87ff0da]
../elsa/ellsif/ellsif[0x87f9c27]
../elsa/ellsif/ellsif[0x87f9e9d]
../elsa/ellsif/ellsif[0x892ef3f]
../elsa/ellsif/ellsif[0x885d5f5]
../elsa/ellsif/ellsif[0x892ebc6]
../elsa/ellsif/ellsif[0x892ed7e]
../elsa/ellsif/ellsif[0x892edd1]
../elsa/ellsif/ellsif[0x80542ea]
../elsa/ellsif/ellsif[0x80578e9]
/lib/libc.so.6(__libc_start_main+0xe0)[0xa4e390]
../elsa/ellsif/ellsif[0x804d7a1]
Command terminated by signal 6
real 2.70 user 2.54 sys 0.12
[~/bzip2-1.0.4] main%

Strangely enough, I get the same results for -O2 .. -O4 but -O1 fails by getting stuck in the bitcode linker and consuming memory.

I assume I am creating some malformed bitcode and I'll track it down by elimination, but I was hoping that someone might have some insight that would help me narrow down my search.

-Rich

Hi,

I know my last question was very vague (i.e. "It stopped working, what went wrong?"), so here is a little more concrete example:

If I run the optimizer (opt) on this code snippet with -std-compile-opts the optimizer hangs.

; ModuleID = 'test.ubc'
target datalayout = "e-p:32:32:32-i1:8:8-i8:8:8-i16:16:16-i32:32:32-i64:32:64-f32:32:32-f64:32:64-v64:64:64-v128:128:128-a0:0:64-s0:0:64-f80:32:32"
target triple = "i686-pc-linux-gnu"

declare void @BZALLOC(i32)

define void @f(i32) {
entry:
         %blockSize100k = alloca i32 ; <i32*> [#uses=2]
         store i32 %0, i32* %blockSize100k
         %n = alloca i32 ; <i32*> [#uses=2]
         load i32* %blockSize100k ; <i32>:1 [#uses=1]
         store i32 %1, i32* %n
         load i32* %n ; <i32>:2 [#uses=1]
         add i32 %2, 2 ; <i32>:3 [#uses=1]
         mul i32 %3, ptrtoint (i32* getelementptr (i32* null, i32 1) to i32) ; <i32>:4 [#uses=1]
         call void @BZALLOC( i32 %4 )
         br label %return

return: ; preds = %entry
         ret void
}

This is generated from this test program:

extern void BZALLOC(int s);
void f(int blockSize100k)
{
     int n = blockSize100k;
     BZALLOC( (n+2) * sizeof(unsigned int) );
}

Besides the optimizer hanging, the strange thing about this is that it doesn't hang if the blockSize100k variable is a local rather than a parameter or if the n+2 is changed to n+1 (!?!).

Is this code intrinsically incorrect, especially the getelementptr for sizeof(), or should I look at the optimizer? It seems to be hanging in the createInstructionCombiningPass.

-Rich

Hello, Richard

I know my last question was very vague (i.e. "It stopped working, what
went wrong?"),

Ah, I'm sorry - I completely forgot about your case, will look into it.

BTW, It's usually better to file a bug for this sort of thing.

The issue is around InstructionCombining:2507:
  // W*X + Y*Z --> W * (X+Z) iff W == Y
  if (I.getType()->isIntOrIntVector()) {
    Value *W, *X, *Y, *Z;
    if (match(LHS, m_Mul(m_Value(W), m_Value(X))) &&
        match(RHS, m_Mul(m_Value(Y), m_Value(Z)))) {

The issue starts with the lines:
        add i32 %2, 2 ; <i32>:3 [#uses=1]
        mul i32 %3, ptrtoint (i32* getelementptr (i32* null, i32 1) to
i32) ; <i32>:4 [#uses=1]

Roughly, the multiplication gets distributed, resulting in something
like (loaded value) * (ptrtointexpr) + 2 * (ptrtointexpr). This gets
matched by the match, which then reverses the transformation. This,
of course, gets matched by the code to distribute the multiply,
resulting in a never-ending cycle.

A side note: I know I've seen suggestions that "ptrtoint (i32*
getelementptr (i32* null, i32 1) to i32)" is a suitable replacement
for sizeof, but if it is supposed to be legal, the documentation for
getelementptr should make that clear. I'm pretty sure the equivalent
C, "((int*)0)+1", has undefined behavior.

-Eli

Hello, Eli

BTW, It's usually better to file a bug for this sort of thing.

I already filled a PR for this stuff. In any case instcombine shouldn't
cycle :slight_smile:

Eli Friedman wrote:
[snip]

BTW, It's usually better to file a bug for this sort of thing.

I just got a Bugzilla account. I will file a bug next time. Anton did it for me this time. (Thanks Anton!)

The issue is around InstructionCombining:2507:
  // W*X + Y*Z --> W * (X+Z) iff W == Y
  if (I.getType()->isIntOrIntVector()) {
    Value *W, *X, *Y, *Z;
    if (match(LHS, m_Mul(m_Value(W), m_Value(X))) &&
        match(RHS, m_Mul(m_Value(Y), m_Value(Z)))) {

The issue starts with the lines:
        add i32 %2, 2 ; <i32>:3 [#uses=1]
        mul i32 %3, ptrtoint (i32* getelementptr (i32* null, i32 1) to
i32) ; <i32>:4 [#uses=1]

Roughly, the multiplication gets distributed, resulting in something
like (loaded value) * (ptrtointexpr) + 2 * (ptrtointexpr). This gets
matched by the match, which then reverses the transformation. This,
of course, gets matched by the code to distribute the multiply,
resulting in a never-ending cycle.

Thanks for the explanation.

A side note: I know I've seen suggestions that "ptrtoint (i32*
getelementptr (i32* null, i32 1) to i32)" is a suitable replacement
for sizeof, but if it is supposed to be legal, the documentation for
getelementptr should make that clear. I'm pretty sure the equivalent
C, "((int*)0)+1", has undefined behavior.

I think that this is not undefined behavior unless the result is dereferenced. In particular, I think it would be a common way of implementing offsetof().

-Rich

If I run the optimizer (opt) on this code snippet with -std-compile-opts
the optimizer hangs.

wow, that's not polite :slight_smile:

BTW, It's usually better to file a bug for this sort of thing.

Absolutely. In any case, this is fixed, patch here:
http://lists.cs.uiuc.edu/pipermail/llvm-commits/Week-of-Mon-20080512/062585.html

Thanks for letting us know about this!

A side note: I know I've seen suggestions that "ptrtoint (i32*
getelementptr (i32* null, i32 1) to i32)" is a suitable replacement
for sizeof, but if it is supposed to be legal, the documentation for
getelementptr should make that clear. I'm pretty sure the equivalent
C, "((int*)0)+1", has undefined behavior.

Is there a great need to do this? It is useful to have a page that explains useful idioms, but there is something to be said for keeping langref.html relatively focused on defining semantics, not giving lots of examples of use. If the wording around GEP can be improved, to make it more clear that restrictions on C don't apply, then we should definitely improve the wording of course.

-Chris

Chris Lattner wrote:

Absolutely. In any case, this is fixed, patch here:
http://lists.cs.uiuc.edu/pipermail/llvm-commits/Week-of-Mon-20080512/062585.html

Thanks! It works.

-Rich