portable sizeof


while playing with clang I found that sizeof() is evaluated in the frontend,
e.g. generating llvm-code like

store i32 16, i32* %tmp1

(the size of the struct Foo is 16 byte).

would it be better to generate code like this?

store i32 ptrtoint (struct.Foo* getelementptr inbounds (%struct.Foo* null, i32 1) to i32), i32* %tmp1

this way c-code containing a sizeof could be compiled to llvm bytecode
while the target platfrom is still mostly unknown (at least size_t has to be known
and influences whether i16, i32 or i64 is used)

In my example Foo contains and int and a double. Therefore sizeof(Foo) also may be 12.


I don't think we should do that. The reason is that this makes the bytecode slightly more portable (at the expense of producing worse code) but it doesn't solve the issue (sizeof is an integer constant expression so it has to be folded to a constant in various places, e.g. a 'case' value) and the C -> LLVM IR conversion is not portable anyway. Once you've gotten to an LLVM IR type, it is already non-portable.


Platform-specific behaviour comes a long time before then. Any #ifdef statements depending on architecture-specific built-in defines, for example, will be evaluated in the preprocessor. For example, the definition of the C99 standard intptr_t type depends on this.

Preprocessed C is intrinsically not portable. If you want to make clang emit portable IR from C then you will need to modify the IR significantly to be able to incorporate compile-time conditionals and a few other things.

The IR generated by clang even contains things like details of the calling convention used for passing and returning by-value structures, because LLVM doesn't hide this detail from front ends adequately.

You can, sometimes, if you're careful, write C code that can be compiled to platform-independent LLVM IR, but you should not expect to be able to do so in the general case.

I'm not sure why you think that it is more portable:

store i32 ptrtoint (struct.Foo* getelementptr inbounds (%struct.Foo* null, i32 1) to i32), i32* %tmp1

By this point, the types are already defined. The struct.Foo type has already been defined and so this is a constant expression just as much as 16 is. You could rewrite the IR so that struct.Foo was different (this is nontrivial, because a huge number of things in the IR will depend on the types), but if you want to do that then you would be better off just embedding a metadata node indicating that the constant 16 is the size of struct.Foo. Actually, if you want some portable serialisation of the program, the unpreprocessed C code is a much better choice than the LLVM IR...


-- Sent from my Difference Engine