libstdc++ as bytecode, and compiling C++ to C

Emil:

I'm using LLVM 1.9 now. When I tried to do what you did I got the
following though:

$ llvm-g++ -emit-llvm -c x.cpp
$ llvm-link -o=linked.o x.o std/*.o sup/*.o
WARNING: Linking two modules of different target triples!
WARNING: Linking two modules of different target triples!
WARNING: Linking two modules of different target triples!
...

$ lli linked.o
lli((anonymous namespace)::PrintStackTrace()+0x19)[0x846d7f9]
lli(llvm::MachineFunctionPass::runOnFunction(llvm::Function&)+0x29)[0x811af59]
Segmentation fault

What could be the problem?

Thanks.

Napi

I don't know. =/

All my bytecode files are built for:

  target datalayout = "e-p:32:32"
  target endian = little
  target pointersize = 32
  target triple = "i386-portbld-freebsd7.0"

Build a small bytecode file on your end, disassemble it, and compare.
(llvm-dis < yourfile.o | head -5)

LLVMers, given the same endianness and pointersize, can one mix and
match LLVM bytecode files produced on different platforms?

--Emil

No, not in general. For example, on the mac, printf it often #defined to printf$ldbl, which doesn't exist on linux. System headers generally foil the ability to move stuff around like that.

-Chris

Chris Lattner wrote:

Yes. Many aspects of the target compiler can leak through. One trivial example is:

int X = sizeof(long);

gcc evaluates 'sizeof' before LLVM gets control, therefore the var will get initialized with 8 or 4 depending on the target.

Another example is code like:

#ifdef __POWERPC__
int X = 1;
#else
int X = 0;
#endif

There are many other examples.

-Chris

Chris Lattner wrote:

Many aspects of the target compiler can leak through.

So if one wants to use the LLVM system as a cross compiler, one
has to configure llvm-gcc as a cross compiler? Fair enough, I guess.

One trivial example is:

int X = sizeof(long);

So I assume this also means that while getelementptr insulates
llvm byte code from the details of target specific address calculations,
the target back end has to agree with gcc on how much space values
of each data type consume.

Chris Lattner wrote:
> Many aspects of the target compiler can leak through.

So if one wants to use the LLVM system as a cross compiler, one
has to configure llvm-gcc as a cross compiler? Fair enough, I guess.

Yes.

> One trivial example is:
>
> int X = sizeof(long);

So I assume this also means that while getelementptr insulates
llvm byte code from the details of target specific address calculations,
the target back end has to agree with gcc on how much space values
of each data type consume.

They agree through the llvm IR. LLVM integer and floating point types
are fixed in size. The front end compiler must use the appropriate type
to get the size that it wants. The only difference might be in structure
layout as LLVM doesn't support alignment attributes in structures.

Reid.

Pertti Kellomäki schrieb:

Chris Lattner wrote:

Many aspects of the target compiler can leak through.

So if one wants to use the LLVM system as a cross compiler, one
has to configure llvm-gcc as a cross compiler? Fair enough, I guess.

I hope the C backend is still meant to generate portable code though.

Philipp

Pertti Kellomäki schrieb:
> Chris Lattner wrote:
>> Many aspects of the target compiler can leak through.
>
> So if one wants to use the LLVM system as a cross compiler, one
> has to configure llvm-gcc as a cross compiler? Fair enough, I guess.

I hope the C backend is still meant to generate portable code though.

It generates C99. Its portability is no better or worse than any other
backend. Again, it is the front end that decides how portable the
outcome can be. If the front end uses non-portable constructs (sizeof,
ifdef code, etc.) then the code produced won't be portable either. To
get LLVM to generate portable code, you must start with a portable
language. This is the goal of HLVM (http://hlvm.org/) which is no where
near done yet.

Reid Spencer schrieb:

Pertti Kellomäki schrieb:

Chris Lattner wrote:

Many aspects of the target compiler can leak through.

So if one wants to use the LLVM system as a cross compiler, one
has to configure llvm-gcc as a cross compiler? Fair enough, I guess.

I hope the C backend is still meant to generate portable code though.

It generates C99. Its portability is no better or worse than any other
backend.

Does that mean that I will have to configure llvm as a cross-compiler
even when using the C backend?

I want to use LLVM to translate C++ into C and compile the resulting C
code using sdcc.
I already noticed some problems (resulting C code uses different data
types than input).
Philipp

Reid Spencer schrieb:
> It generates C99. Its portability is no better or worse than any other
> backend.

Does that mean that I will have to configure llvm as a cross-compiler
even when using the C backend?

LLVM doesn't need to be configured as a cross compiler. It can generate
code for a variety of platforms on any platform. What you do need to do
is configure your front end to be a cross compiler. Then it will
generate the correct LLVM input for that platform (and consequently LLVM
will generate code for that platform) regardless of the platform on
which either LLVM or your front end are running.

I want to use LLVM to translate C++ into C and compile the resulting C
code using sdcc.

I'm not familiar with that compiler, but it should be fine as long as it
can handle C99. Simply target llvm-gcc for platform you want to compile
for and the resulting code should be suitable for compilation by sdcc on
that platform.

I already noticed some problems (resulting C code uses different data
types than input).

Note that C and LLVM types are *not* the same things (despite the
similar names). We are in the process of making this abundantly clear.
The LLVM IR will soon use names like i8, i16, i32, and i64 (signless
integer quantities of specific sizes, regardless of platform).

Philipp

Hope that helps, Phillip.

Reid.

Reid Spencer schrieb:

Note that C and LLVM types are *not* the same things (despite the
similar names). We are in the process of making this abundantly clear.
The LLVM IR will soon use names like i8, i16, i32, and i64 (signless
integer quantities of specific sizes, regardless of platform).

I had explicitly specified the size in the input code using a uint32_t
type, the resulting C code used short, which is smaller on my target
platform.

Philipp

Reid Spencer schrieb:

hat you do need to do
is configure your front end to be a cross compiler. Then it will
generate the correct LLVM input for that platform (and consequently LLVM
will generate code for that platform) regardless of the platform on
which either LLVM or your front end are running.

Is that needed for thing like sizeof() and size of native datatypes only
or for other things, too?

Philipp

Hi Philipp,

Reid Spencer schrieb:

> Note that C and LLVM types are *not* the same things (despite the
> similar names). We are in the process of making this abundantly clear.
> The LLVM IR will soon use names like i8, i16, i32, and i64 (signless
> integer quantities of specific sizes, regardless of platform).

I had explicitly specified the size in the input code using a uint32_t
type, the resulting C code used short, which is smaller on my target
platform.

One would think that it should result in a 32-bit unsigned type but its
hard to say without something concrete. Can you supply a small,
pre-processed example of some code that results in a CBE short for an
input C/C++ uint32_t ?

Reid Spencer schrieb:

> hat you do need to do
> is configure your front end to be a cross compiler. Then it will
> generate the correct LLVM input for that platform (and consequently LLVM
> will generate code for that platform) regardless of the platform on
> which either LLVM or your front end are running.
>

Is that needed for thing like sizeof() and size of native datatypes only
or for other things, too?

Yes, platform specific #ifdef too. When you tell the llvm-gcc front end
which target to generate code for then its output (input to llvm) will
be correct for that target and llvm will generate the corresponding code
for that target. If you don't set the target (i.e. configure a
cross-compiler) then llvm-gcc will be assuming the target is your
llvm-gcc host and the resulting C code won't work on the target (if it
differs from the llvm-gcc host).

Reid.

Reid Spencer schrieb:

Hi Philipp,

Reid Spencer schrieb:

Note that C and LLVM types are *not* the same things (despite the
similar names). We are in the process of making this abundantly clear.
The LLVM IR will soon use names like i8, i16, i32, and i64 (signless
integer quantities of specific sizes, regardless of platform).

I had explicitly specified the size in the input code using a uint32_t
type, the resulting C code used short, which is smaller on my target
platform.

One would think that it should result in a 32-bit unsigned type but its
hard to say without something concrete. Can you supply a small,
pre-processed example of some code that results in a CBE short for an
input C/C++ uint32_t ?

Hmm the problem was a bit different. I just reproduced it.

I used this input file:

#include <stdint.h>

uint32_t test(uint32_t t)
{
  return(t + 42);
}

and got the following code:

unsigned test(unsigned ltmp_0_1) {
  return (ltmp_0_1 + 42u);
}

unsigned is 16 bit on my target platform.

Philipp

Hello, Philipp.

unsigned is 16 bit on my target platform.

Could you please show LLVM bytecode?

Reid Spencer schrieb:

Hmm the problem was a bit different. I just reproduced it.

I used this input file:

#include <stdint.h>

uint32_t test(uint32_t t)
{
  return(t + 42);
}

and got the following code:

unsigned test(unsigned ltmp_0_1) {
  return (ltmp_0_1 + 42u);
}

unsigned is 16 bit on my target platform.

Sure, but what is it on the target that llvm-gcc is configured for? If
you're running llvm-gcc on a 32-bit platform without configuring it as a
cross-compiler then the above is correct. 32-bit unsigned is what is
expected on the target you're compiling for. This is exactly why its
important to configure llvm-gcc as a cross-compiler for your target. If
you do, I'm sure that you'll find it will generate:

unsigned long test (unsigned long ltmp_0_1) {
  return (ltmp_0_1 + 42ul);
}

assuming that "unsigned long" is a 32-bit unsigned long on your target.

Reid.

Anton Korobeynikov schrieb:

Hello, Philipp.

unsigned is 16 bit on my target platform.

Could you please show LLVM bytecode?

I've attached the .bc file and the .c source and output files.
I compiled dusing llvm-gcc (not configured as cross-compiler though, so
that might be the problem).

Nevertheless I don't really see why portable source shouldn't be
translated into portable source.

Philipp

test.bc (127 Bytes)

test.c (69 Bytes)

test2.c (2.86 KB)

Hi Phillipp,

Anton Korobeynikov schrieb:
> Hello, Philipp.
>
>> unsigned is 16 bit on my target platform.
> Could you please show LLVM bytecode?
>

I've attached the .bc file and the .c source and output files.

The LLVM Assembly looks like:
; ModuleID = '/tmp/test.bc'
target datalayout = "e-p:32:32"
target endian = little
target pointersize = 32
target triple = "i686-pc-linux-gnu"
deplibs = [ "c", "crtend" ]

implementation ; Functions:

uint %test(uint %t) {
entry:
        %tmp.1 = add uint %t, 42 ; <uint> [#uses=1]
        ret uint %tmp.1
}

Note the "target pointersize = 32"

This is how LLVM says "the source compiler things the target is 32-bit"

I compiled using llvm-gcc (not configured as cross-compiler though, so
that might be the problem).

Yes. That is the problem.

Nevertheless I don't really see why portable source shouldn't be
translated into portable source.

The source isn't portable. Look at how uint32_t is defined. Undoubtedly
it is in an ifdef that defines it as "unsigned int" on your 32-bit
platform. The code generated is for a 32-bit platform so the "uint" type
in LLVM and the "unsigned" type in the CBE output are correct (for a
32-bit platform). However, you're compiling the CBE output on a 16-bit
platform!

You *must* configure llvm-gcc as a cross-compiler for your 16-bit
platform.