I have a few questions I was saving for later and never got around to
ask them, so I'll send a few emails to the list, one with each
question, to ease the further discussions that may come from them...
The first question is:
According to the language reference, LLVM IR is type safe. It means,
for instance, that you won't be able to perform ADD operations in two
different types or call functions with the wrong arguments, etc.
But, when declaring two types that happen to (supposedly) have the
same layout, LLVM ignores the second type and use the first's name
instead.
In one module, it doesn't matter, but once you join different modules
with, possibly, different data layouts, the data types are not the
same any more.
Is this a declaration that you will never be able (with an error
message, assert or whatever) to join two IRs with different data
layouts? Or it was never thought that you could mix them?
In my view, that is the precise reason why we have the data layout.
Unions can't rely on them (why we don't have unions any more) and
compiler data (RTTI, VT, VTT, etc) are all statically created with the
correct size.
I have a few questions I was saving for later and never got around to
ask them, so I'll send a few emails to the list, one with each
question, to ease the further discussions that may come from them...
The first question is:
According to the language reference, LLVM IR is type safe. It means,
for instance, that you won't be able to perform ADD operations in two
different types or call functions with the wrong arguments, etc.
First, this is only partially correct. LLVM IR is typed, and most operations are type-safe. However, LLVM can represent type-unsafe code through at least the following:
1) LLVM has a cast instruction (and cast constant expression) that can cast one type to another. It's possible to take a float, cast it to an int, and add it to another int.
2) LLVM does not require garbage collection or region-based memory management. You can get implicit casting of values if you dereference a dangling pointer.
3) LLVM does not prevent a function from returning a pointer to stack-allocated memory. Dangling pointers to stack-allocated objects is possible.
That said, you can generate type-safe LLVM IR, and if you force your front-end to generate IR with certain restrictions, you can probably prove that it is type-safe.
But, when declaring two types that happen to (supposedly) have the
same layout, LLVM ignores the second type and use the first's name
instead.
In one module, it doesn't matter, but once you join different modules
with, possibly, different data layouts, the data types are not the
same any more.
Is this a declaration that you will never be able (with an error
message, assert or whatever) to join two IRs with different data
layouts? Or it was never thought that you could mix them?
I think linking two LLVM bitcode files with different data layouts would be hard (especially given different endians); I think LLVM 2.7 prints a warning when data layout doesn't match. However, I'll let people more knowledgeable of LLVM data layout answer this part of your question.
That's not quite what I was thinking... Maybe I explained badly...
Imagine this:
-- a.ll --
%struct.x = type { i32, i32 }
%a = call void @func (%struct.x %b)
-- b.ll --
%struct.y = type { i32, i32 }
declare i32 @func (%struct.y)
Now, imagine that X and Y are completely different structures, they
don't reflect the same type in the code, but in the IR it got
flattened out, so the modules can't distinguish between X or Y.
If I distribute IR (with the same data layout, target triple, etc),
and you try to link against it, it will allow you to put apples in
place of bananas...
In the combined llvm IR, @p3 and @p won't match as expected.
Hi Devang,
That's not quite what I was thinking... Maybe I explained badly...
Imagine this:
-- a.ll --
%struct.x = type { i32, i32 }
%a = call void @func (%struct.x %b)
-- b.ll --
%struct.y = type { i32, i32 }
declare i32 @func (%struct.y)
Now, imagine that X and Y are completely different structures, they
don't reflect the same type in the code, but in the IR it got
flattened out, so the modules can't distinguish between X or Y.
If I distribute IR (with the same data layout, target triple, etc),
and you try to link against it, it will allow you to put apples in
place of bananas...
Does it make sense?
Type names don't have meaning. If you want this not to happen, you
can generate a different opaque type for each type in your language to
prevent merging.
I think you're just trying to do something that by design doesn't
work. Your complaint seems to be that LLVM's type system doesn't give
you nominal type safety with transparent types, and you're right, it
doesn't, because transparent types work structurally.
If you want nominal type safety, then you need something else. Two
possible options are name mangling the functions (as you mentioned C++
does), or use LLVM's opaque types, which work nominally.
LLVM's transparent types do, however, give you far more than C does.
Try linking these together:
-- a.c --
int foo(int);
int bar(int a) { return a * foo(a); }
-- b.c --
int foo(char *p) { return puts(p); }
HTH,
~ Scott
P.S. I don't think "transparent types" is an official LLVM term, but
it seemed a reasonable opposite for "opaque types".
This is a nominative vs. structural type system issue. You assume the
type system to be nominative, while LLVM uses a structural one. In
this type system Foo and Bar is the same type. There are various pros
and cons for both systems. For LLVM it seems appropriate to use
structural typing as it only uses types to calculate sizes, offsets
and alignments.
Btw arguably this is not a type safety problem -- either way the code
is "safe" since you can't access a value using incompatible view (e.g.
pointer as double).