Idea for Google Summer Code : C Compiler for EFI Byte Code implement in LLVM

Hello all,

I am highly interestd in implementing C compiler for EFI Byte Code in LLVM and participate in Google Summer Code.

EFI is a much larger, more complex,OS-like replacement for the older BIOS
firmware interface present in all IBM PC-compatible personal computers.
and the EFI specification provides for a processor-independent device driver environment(like virtualmachine), called EFI Byte Code or EBC.

Intel(R) C Compiler for EFI Byte Code, the only one C compiler for EFI Byte Code
(http://sx.intel.com/p-553-intel-c-compiler-for-efi-byte-code.aspx)
is not open source, and also a pay software.

I think the main issue is that EFI C dialect is not ANSI-C compliant: the size of pointer is determined at the run-time and therefore the layout of the structure is not static. Does LLVM support this model?

And I am wondering whether this kind of idea is valuable to the LLVM
community? or are there any other related ideas is more valuable?

thanks

ching

I think the main issue is that EFI C dialect is not ANSI-C compliant: the
size of pointer is determined at the run-time and therefore the layout of
the structure is not static. Does LLVM support this model?

Hi Ching,

The LLVM IR doesn't care about the size of your pointers, and this is
why you have the 'datalayout' explicit on the object file. I don't
know, however, if you can omit the layout definition and leave it for
run time.

If LLVM doesn't allow omitting the layout, it should, as we now have
an use case that needs it. If it does, it should be just a matter of
converting the current IR into EFI bytecode and creating intrinsics to
deal with the run-time variables. You could even benefit from having
different languages (LLVM supports) into EFI bytecode...

And I am wondering whether this kind of idea is valuable to the LLVM
community? or are there any other related ideas is more valuable?

I think that an open source compiler to EFI byte code is not only
desirable, but necessary.

cheers,
--renato

http://systemcall.org/

Reclaim your digital rights, eliminate DRM, learn more at
http://www.defectivebydesign.org/what_is_drm

Hello Renato and all,

I have few questions about your mail

2010/3/15 Renato Golin <rengolin@systemcall.org>

I think the main issue is that EFI C dialect is not ANSI-C compliant: the
size of pointer is determined at the run-time and therefore the layout of
the structure is not static. Does LLVM support this model?

Hi Ching,

The LLVM IR doesn’t care about the size of your pointers, and this is
why you have the ‘datalayout’ explicit on the object file. I don’t
know, however, if you can omit the layout definition and leave it for
run time.

As you say LLVM IR doesn’t care the size of pointer, does it mean what I
only need to implement is the part that convert LLVM to EFI Byte code (just
like we regard native assembly) ??

If LLVM doesn’t allow omitting the layout, it should, as we now have
an use case that needs it. If it does, it should be just a matter of
converting the current IR into EFI bytecode and creating intrinsics to
deal with the run-time variables. You could even benefit from having
different languages (LLVM supports) into EFI bytecode…

I feel sorry about I am not vary familer with structure of LLVM that I feel
some confuse.With turely appreciation if you would explain more detail.
thanks.

ching

I think the main issue is that EFI C dialect is not ANSI-C compliant: the
size of pointer is determined at the run-time and therefore the layout of
the structure is not static. Does LLVM support this model?

Hi Ching,

The LLVM IR doesn't care about the size of your pointers, and this is
why you have the 'datalayout' explicit on the object file. I don't
know, however, if you can omit the layout definition and leave it for
run time.

The layout for non-packed structures is, in this sense, left for
runtime. The definition, if lacking any alignment attributes, will be
layed out in the backend according to the alignment rules then. The
IR layout definition can be happily cross-platform if you always
access the structures in a type-safe way (using GEP, never castes,
etc).

If LLVM doesn't allow omitting the layout, it should, as we now have
an use case that needs it. If it does, it should be just a matter of
converting the current IR into EFI bytecode and creating intrinsics to
deal with the run-time variables. You could even benefit from having
different languages (LLVM supports) into EFI bytecode...

How does EFI describe structures if the pointer size can change? This
shouldn't be a harder problem than C struct -> llvm struct. I assume
the EFI bytecode has some way to describe them. What is it?
(Question is for Ching)

Andrew

How does EFI describe structures if the pointer size can change? This
shouldn't be a harder problem than C struct -> llvm struct. I assume
the EFI bytecode has some way to describe them. What is it?

EFI describe structures almost like C.

There are EBC instructions that have two immediates: one for 32bits pointers and one for 64bits pointers.

Hello Tristan and all,

I have already know that if I want to do this feature(c → EFI Byte code) for GCC
I should further modify the GCC front end(parser) to solve the problem (the size
of pointer is determined at run time).

I have read a powerpoint about LLVM (http://llvm.org/pubs/2008-10-04-ACAT-LLVM-Intro.pdf)
It is the LLVM-GCC design graph (http://www.im.ntu.edu.tw/~b95030/llvm_gcc.png).
According to the above discussion , LLVM IR doesn’t care about the size of pointers.
I am wondering how could LLVM support dynamic pointer size model without modifying
GCC front end??

thanks

ching

What do you mean by “variable sized pointers”? What does:

struct S {void *X; };

return for sizeof(struct S); ?

-Chris

It doesn’t, at least not for Intel’s EBC compiler. They error out on any sizeof that include a pointer. A piece of EBC code can run in either a 32 bit or 64 bit environment, and everything in the compiler either needs to cope with it (by conditionally choosing the size of offsets into structs, for instance) or give up on it and abort. That also means that you cannot compile code that depends on knowing pointer sizes in the preprocessor, etc.

I suspect getting something like this to work would require substantial changes to any existing C frontend, since as a language assumes knowledge of pointer size. On the other hand, it would allow for some neat tricks since it would allow one to compile a significant subset of C code to a pointer neutral intermediary form. Off the top of my head I can think of several potential uses for that, such as PNaCl <http://blog.chromium.org/2010/03/native-client-and-web-portability.html>.

Louis

What do you mean by “variable sized pointers”? What does:

struct S {void *X; };

return for sizeof(struct S); ?

It doesn’t, at least not for Intel’s EBC compiler. They error out on any sizeof that include a pointer. A piece of EBC code can run in either a 32 bit or 64 bit environment, and everything in the compiler either needs to cope with it (by conditionally choosing the size of offsets into structs, for instance) or give up on it and abort. That also means that you cannot compile code that depends on knowing pointer sizes in the preprocessor, etc.

Ok, that makes sense. It could be done by generalizing the notions of variably modified types (which are VLAs in C99) to include pointers.

I suspect getting something like this to work would require substantial changes to any existing C frontend, since as a language assumes knowledge of pointer size. On the other hand, it would allow for some neat tricks since it would allow one to compile a significant subset of C code to a pointer neutral intermediary form. Off the top of my head I can think of several potential uses for that, such as PNaCl <http://blog.chromium.org/2010/03/native-client-and-web-portability.html>.

PNaCL is already (planned to be) built with LLVM/Clang. They just fix the pointer size at 32-bits, which also simplifies their SFI approach on 64-bit hosts.

-Chris

2010/3/20 Louis Gerbarg <lgerbarg@gmail.com>

Hello Tristan and all,

I have already know that if I want to do this feature(c → EFI Byte code) for GCC
I should further modify the GCC front end(parser) to solve the problem (the size
of pointer is determined at run time).

I have read a powerpoint about LLVM (http://llvm.org/pubs/2008-10-04-ACAT-LLVM-Intro.pdf)
It is the LLVM-GCC design graph (http://www.im.ntu.edu.tw/~b95030/llvm_gcc.png).
According to the above discussion , LLVM IR doesn’t care about the size of pointers.
I am wondering how could LLVM support dynamic pointer size model without modifying
GCC front end??

What do you mean by “variable sized pointers”? What does:

struct S {void *X; };

return for sizeof(struct S); ?

I have surveyed the UEFI spec2.3.
In my opinion, if the EBC VM is running on 32-bit processor, return value is 4
if the EBC VM is running on 64-bit processor, return value is 8

If error out on any sizeof that include a pointer, does it means that no issue about pointer size determined at runtime??

It doesn’t, at least not for Intel’s EBC compiler. They error out on any sizeof that include a pointer. A piece of EBC code can run in either a 32 bit or 64 bit environment, and everything in the compiler either needs to cope with it (by conditionally choosing the size of offsets into structs, for instance) or give up on it and abort. That also means that you cannot compile code that depends on knowing pointer sizes in the preprocessor, etc.

I suspect getting something like this to work would require substantial changes to any existing C frontend, since as a language assumes knowledge of pointer size. On the other hand, it would allow for some neat tricks since it would allow one to compile a significant subset of C code to a pointer neutral intermediary form. Off the top of my head I can think of several potential uses for that, such as PNaCl <http://blog.chromium.org/2010/03/native-client-and-web-portability.html>.

Louis

thanks

ching

Yes, if it is an error, it makes it much more feasible to implement.

-Chris

Hello Tristan,

How does EFI describe structures if the pointer size can change? This
shouldn’t be a harder problem than C struct → llvm struct. I assume
the EFI bytecode has some way to describe them. What is it?

EFI describe structures almost like C.

There are EBC instructions that have two immediates: one for 32bits pointers and one for 64bits pointers.

I have read the uefi specification 2.3 and survey the code of EBC VM
It seems that there is no relationship between the EBC byte code and the processor.
like CMPI[32|64][w|d]eq {@}R1 {Index16}, Immed16|Immed32,
choosing which form is not depending on the processor but compiler(means 32-bit processor could even support 64-bit operand)
Does it more feasible to implement it??

thanks

ching

But it is not an error, otherwise it would be hard to use malloc() like functions.

I don't see why it should be that difficult.

If sizeof becomes an intrinsic that is called at runtime to determine
the pointer size (probably stored in some global or read from a
configuration register), than the problem is solved. If the types'
sizes change too, this intrinsic could accept a parameter (enum?) with
the type of the type.

cheers,
--renato

http://systemcall.org/

Reclaim your digital rights, eliminate DRM, learn more at
http://www.defectivebydesign.org/what_is_drm

Hi Tristan,

But it is not an error, otherwise it would be hard to use malloc() like functions.

I cannot understand that it would be hard to use malloc() like functions
the parameter passed to malloc is evaluated at runtime
what is the issue of malloc when sizeof is determined at runtime?

thanks

ching

It would be hard to use malloc *iff* sizeof of a structure that includes a pointer is flagged as an error
by the compiler.

EBC C compilers are clearly non ansi-C conformant.

Tristan.

Hello, Chris

2010/3/20 Chris Lattner <clattner@apple.com>

What do you mean by “variable sized pointers”? What does:

struct S {void *X; };

return for sizeof(struct S); ?

It doesn’t, at least not for Intel’s EBC compiler. They error out on any sizeof that include a pointer. A piece of EBC code can run in either a 32 bit or 64 bit environment, and everything in the compiler either needs to cope with it (by conditionally choosing the size of offsets into structs, for instance) or give up on it and abort. That also means that you cannot compile code that depends on knowing pointer sizes in the preprocessor, etc.

Ok, that makes sense. It could be done by generalizing the notions of variably modified types (which are VLAs in C99) to include pointers.

I have read the sizeof and VLA in C99
I found a example:
EXAMPLE 3 In this example, the size of a variable-length array is computed and returned from a function:
#include <stddef.h>
size_t fsize3(int n)
{
char b[n+3]; // variable length array
return sizeof b; // execution time sizeof
}
int main()
{
size_t size;
size = fsize3(10); // fsize3 returns 13
return 0;
}
And I found some information with clang about VLA
(http://clang.llvm.org/cxx_compatibility.html#vla)
Does llvm/clang doesn’t support sizeof is evaluated at run time??

thanks

ching

Yes, clang supports vlas as defined in C99 and sizeof can return a dynamic value. C99 vlas cannot occur in structs though, and clang does not support them in structs.

-Chris

Hello Chris,

I have survey the efi specification and ask some question to efi engineer.
Difference between EFI C and ANSI C is as following:

  1. void*
    In EFI C, the void* is 4-byte for 32-bit processor and 8-byte for 64-bit processor.
    And it can appears in any where like ANSI C.
    So the main problem is that struct layout like
    struct S{
    void* X;
    };
    is not static.
  2. no floating support in EFI C
  3. no C++ support in EFI C
  4. no assembly support in EFI C, all assembly must convert to C

I am wondering that does LLVM support model which structure layout is determined at run time??
If not, do I need to modify the parser in clang to support this feature??

thanks

ching

Hello Chris,

I have survey the efi specification and ask some question to efi engineer.
Difference between EFI C and ANSI C is as following:
1. void*
    In EFI C, the void* is 4-byte for 32-bit processor and 8-byte for 64-bit processor.
    And it can appears in any where like ANSI C.
    So the main problem is that struct layout like
    struct S{
        void* X;
    };
    is not static.
2. no floating support in EFI C
3. no C++ support in EFI C
4. no assembly support in EFI C, all assembly must convert to C

Ok, all of this is easy except #1.

I am wondering that does LLVM support model which structure layout is determined at run time??

No.

If not, do I need to modify the parser in clang to support this feature??

No, please don't. This is something we specifically do not want to support. The issue is not the parser, the issue is that struct field offsets are no longer constant in this model.

-Chris