llc c backend can produce code that doesn't compile on gcc 4.x

Hello,

I would like to ask the llvm developers to have a look at http://llvm.org/bugs/show_bug.cgi?id=918 .
This bug has been reported 4 month ago but is none the less a somewhat serious one.

Below I have pasted the test case and output of the issue running on my ppc machine.

thank you
Eric

pb:~ eric$ cat testme.ll;llvm-as -f testme.ll;llc -march=c -f testme.bc;gcc -c testme.cbe.c ;gcc --version;llc --version
%structtype_s = type { i32 }
%fixarray_array3 = type [3 x %structtype_s]

define i32 %witness(%fixarray_array3* %p) {
%q = getelementptr %fixarray_array3* %p, i32 0, i32 0, i32 0
%v = load i32* %q
ret i32 %v
}
testme.cbe.c:106: error: array type has incomplete element type
testme.cbe.c:119: warning: conflicting types for built-in function ‘malloc’
powerpc-apple-darwin8-gcc-4.0.0 (GCC) 4.0.0 (Apple Computer, Inc. build 5026)
Copyright (C) 2005 Free Software Foundation, Inc.
This is free software; see the source for copying conditions. There is NO
warranty; not even for MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.

Low Level Virtual Machine (http://llvm.org/):
llvm version 2.0cvs
Optimized build with assertions.
pb:~ eric$

Eric van Riet Paap wrote:

*testme.cbe.c:106: error: array type has incomplete element type*

The problem code boils down to:

  /* Structure forward decls */
  struct l_structtype_s;

  /* Typedefs */
  typedef struct l_structtype_s l_fixarray_array3[3];

which is illegal C, but perfectly valid C++, and g++ accepts it.

The structure contents are defined right afterwards, but I assume that
the typedefs are used when emitting the structure contents? We may have
to put fully defined structures first and typedefs second.

Nick Lewycky

Looks like it. It sounds like the CBE should build an ordering of types, based on their nesting properties, then emit them in nesting order. Because all recursive types have to go through a pointer in C, we should get a dag of types, which is easy to emit.

Anyone want to take a crack at it?

-Chris

A DAG traversal is already performed, but only for struct types. It is implemented in CWriter::printContainedStructs (lib/Target/CBackend/CBackend.cpp:1722).

The bug is in the calling function, printModuleTypes, which prints typedefs for all named types before performing the dependency-sensitive traversal. For array typedefs, this ordering is incorrect; the element type must be defined first. But consider something like [2 x { [2 x {int}] }]; these two stages must be interleaved.

The simplest solution is to avoid typedefs for array types. With names of array types removed from the symbol table, llc will simply output (for instance) 'l_structtype_s[3]' instead of the equivalent 'l_fixarray_array3'.

A more ambitious fix would be to merge the "typedef" and "structure contents" loops into a single, substantially more complex dependency-ordered traversal. This would preserve type names from the bytecode file—but to what end?

Here's the patch and test:
http://lists.cs.uiuc.edu/pipermail/llvm-commits/Week-of-Mon-20070115/042760.html

A workaround for this bug was to avoid naming array types in .ll files.

— Gordon

The simplest solution is to avoid typedefs for array types. With
names of array types removed from the symbol table, llc will simply
output (for instance) 'l_structtype_s[3]' instead of the equivalent
'l_fixarray_array3'.

Very nice catch and approach.

A more ambitious fix would be to merge the "typedef" and "structure
contents" loops into a single, substantially more complex dependency-
ordered traversal. This would preserve type names from the bytecode
file—but to what end?

Good question, I can't think of a reason! I applied your patch but generalized it slightly. The CBE now strips off any non-struct type names, to avoid issues with vectors (if the CBE starts supporting them in the future).

Thanks Gordon!

-Chris

The simplest solution is to avoid typedefs for array types. With names of array types removed from the symbol table, llc will simply output (for instance) ‘l_structtype_s[3]’ instead of the equivalent ‘l_fixarray_array3’.

Very nice catch and approach.

Thanks!

A more ambitious fix would be to merge the “typedef” and “structure contents” loops into a single, substantially more complex dependency-ordered traversal. This would preserve type names from the bytecode file—but to what end?

Good question, I can’t think of a reason! I applied your patch but generalized it slightly. The CBE now strips off any non-struct type names, to avoid issues with vectors (if the CBE starts supporting them in the future).

Excellent. I was going to suggest just that generalization, but didn’t feel comfortable enough with the Type hierarchy to be sure it was safe.

— Gordon