Idea for Google Summer Code : C Compiler for EFI Byte Code implement in LLVM

Hello Chris

If not, do I need to modify the parser in clang to support this feature??

No, please don’t. This is something we specifically do not want to support. The issue is not the parser, the issue is that struct field offsets are no longer constant in this model.

Does that struct field offsets are no longer constant would crash the following optimization in LLVM??

thanks

ching

LLVM may not be a good match for this project, but there's prior art elsewhere; have a look at ANDF.

What about declaring that pointers are always 64 bits, for all
purposes other than final code generation of actual pointer
instructions? Would that solve the problem?

Yep. That would be a fine approach, and probably conformant to the spec.

-Chris

Hello Chris, Rusell

What about declaring that pointers are always 64 bits, for all
purposes other than final code generation of actual pointer
instructions? Would that solve the problem?

Yep. That would be a fine approach, and probably conformant to the spec.

I have some question about your disscussion.
the main difference is as follows:

  1. void*
    In EFI C, the void* is 4-byte for 32-bit processor and 8-byte for 64-bit processor.
    And it can appears in any where like ANSI C.
    So the main problem is that struct layout like
    struct S{
    void* X;
    };
    is not static.
  2. no floating support in EFI C
  3. no C++ support in EFI C
  4. no assembly support in EFI C, all assembly must convert to C
    In my opinion, the main compiling process is C --a–> LLVM IR --b–> EBC byte code
    So as you say, solve the difference 2,3,4 in process a and assumed void * be 64-bit in process a
    Then solve the difference 1 in process b??

thanks

ching

No, sizeof will report wrong values.

Tristan.

Which won't matter (as long as sizeof is consistent), because EFI is a closed system.

-Chris

Hello Chris,

Which won’t matter (as long as sizeof is consistent), because EFI is a closed system.

What is the meaning of closed system??

Is it 1. not open source
2. EBC binary is only running on a single EFI EBC interpreter and never interfacing with the outside VM??

thanks

ching

#2.

Hello Chris

I have survey the efi specification and ask some question to efi engineer.
Difference between EFI C and ANSI C is as following:

  1. void*
    In EFI C, the void* is 4-byte for 32-bit processor and 8-byte for 64-bit processor.
    And it can appears in any where like ANSI C.
    So the main problem is that struct layout like
    struct S{
    void* X;
    };
    is not static.
  2. no floating support in EFI C
  3. no C++ support in EFI C
  4. no assembly support in EFI C, all assembly must convert to C

Ok, all of this is easy except #1.

I am wondering that does LLVM support model which structure layout is determined at run time??

No.

If not, do I need to modify the parser in clang to support this feature??

No, please don’t. This is something we specifically do not want to support. The issue is not the parser, the issue is that struct field offsets are no longer constant in this model.

I have read the EFI specification v1.10 and find the natural indexing about solving the dynamic structure layout problem:

19.4.4 Natural Units
Natural units are used when a structure has fields that can vary with the architecture of the
processor. Fields that precipitate the use of natural units include pointers and EFI INTN and
UINTN data types. The size of one pointer or INTN/UINTN equals one natural unit. The natural
units field in an index encoding is a count of the number of natural fields whose sizes (in bytes)
must be added to determine a field offset.
As an example, assume that a given EBC instruction specifies a 16-bit index of 0xA048. This
breaks down into:
• Sign bit (bit 15) = 1 (negative offset)
• Bits assigned to natural units (w, bits 14-12) = 2. Multiply by index size in bytes = 2 x 2 = 4 (A)
• c = bits 11-4 = 4
• n = bits 3-0 = 8
On a 32-bit machine, the offset is then calculated to be:
• Offset = (4 + 8 * 4) * -1 = -36
On a 64-bit machine, the offset is calculated to be:
• Offset = (4 + 8 * 8) * -1 = -68

By this indexing model, the dynamic struture layout problem seems to be solved by the underlying EBC vm.
Although the data field is at different address between 32-bit and 64-bit processor.
Both can use the same encoding.

Does it means that the issue 1 can be solved??

thanks

ching

No, EFI is not that closed: boot loaders interface with EFI.

Don't forget that EBC code can call native functions. If sizeof or fields offsets mismatch I fear that the
program won't work.

Tristan.

Helo Tristan

What about declaring that pointers are always 64 bits, for all
purposes other than final code generation of actual pointer
instructions? Would that solve the problem?

No, sizeof will report wrong values.

Could I modify the parser to let sizeof be a function.
And use natual indexing(ie. the Natural Units) :

19.4 Natural Indexing
The natural indexing mechanism is the critical functionality that enables EBC to be executed
unchanged on 32- or 64-bit systems. Natural indexing is used to specify the offset of data relative
to a base address. However, rather than specifying the offset as a fixed number of bytes, the offset
is encoded in a form that specifies the actual offset in two parts: a constant offset, and an offset
specified as a number of natural units (where one natural unit = sizeof (VOID *)). These two
values are used to compute the actual offset to data at runtime. When the VM decodes an index
during execution, the resultant offset is computed based on the natural processor size. The encoded
indexes themselves may be 16, 32, or 64 bits in size. Table 19-4 describes the fields in a natural
index encoding.

Table 19-4. Index Encoding
Bit # Description
N Sign bit (sign), most significant bit
N-3…N-1 Bits assigned to natural units (w)
A…N-4 Constant units (c)
0…A-1 Natural units (n)

As shown in Table 19-4, for a given encoded index, the most significant bit (bit N) specifies the
sign of the resultant offset after it has been calculated. The sign bit is followed by three bits
(N-3…N-1) that are used to compute the width of the natural units field (n). The value (w) from
this field is multiplied by the index size in bytes to determine the actual width (A) of the natural
units field (n). Once the width of the natural units field has been determined, then the natural units
(n) and constant units (c) can be extracted. The offset is then calculated at runtime according to the
following equation:
Offset = (c + n * (sizeof (VOID *))) * sign

to evaluate sizeof(void*) at running time?

Maybe it can use the following code to get the sizeof(void*) at whether 32-bit and 64-bit processor:

MOVI R1, 0
MOVI R2, Label
ADD32 R1, @R2 (1, 0)
Label:

And use natural indexing mechanism to solve the dynamic structure layout problem??

thanks

ching