If not, do I need to modify the parser in clang to support this feature??
No, please don’t. This is something we specifically do not want to support. The issue is not the parser, the issue is that struct field offsets are no longer constant in this model.
Does that struct field offsets are no longer constant would crash the following optimization in LLVM??
What about declaring that pointers are always 64 bits, for all
purposes other than final code generation of actual pointer
instructions? Would that solve the problem?
What about declaring that pointers are always 64 bits, for all
purposes other than final code generation of actual pointer
instructions? Would that solve the problem?
Yep. That would be a fine approach, and probably conformant to the spec.
I have some question about your disscussion.
the main difference is as follows:
void*
In EFI C, the void* is 4-byte for 32-bit processor and 8-byte for 64-bit processor.
And it can appears in any where like ANSI C.
So the main problem is that struct layout like
struct S{
void* X;
};
is not static.
no floating support in EFI C
no C++ support in EFI C
no assembly support in EFI C, all assembly must convert to C
In my opinion, the main compiling process is C --a–> LLVM IR --b–> EBC byte code
So as you say, solve the difference 2,3,4 in process a and assumed void * be 64-bit in process a
Then solve the difference 1 in process b??
I have survey the efi specification and ask some question to efi engineer.
Difference between EFI C and ANSI C is as following:
void*
In EFI C, the void* is 4-byte for 32-bit processor and 8-byte for 64-bit processor.
And it can appears in any where like ANSI C.
So the main problem is that struct layout like
struct S{
void* X;
};
is not static.
no floating support in EFI C
no C++ support in EFI C
no assembly support in EFI C, all assembly must convert to C
Ok, all of this is easy except #1.
I am wondering that does LLVM support model which structure layout is determined at run time??
No.
If not, do I need to modify the parser in clang to support this feature??
No, please don’t. This is something we specifically do not want to support. The issue is not the parser, the issue is that struct field offsets are no longer constant in this model.
I have read the EFI specification v1.10 and find the natural indexing about solving the dynamic structure layout problem:
19.4.4 Natural Units
Natural units are used when a structure has fields that can vary with the architecture of the
processor. Fields that precipitate the use of natural units include pointers and EFI INTN and
UINTN data types. The size of one pointer or INTN/UINTN equals one natural unit. The natural
units field in an index encoding is a count of the number of natural fields whose sizes (in bytes)
must be added to determine a field offset.
As an example, assume that a given EBC instruction specifies a 16-bit index of 0xA048. This
breaks down into:
• Sign bit (bit 15) = 1 (negative offset)
• Bits assigned to natural units (w, bits 14-12) = 2. Multiply by index size in bytes = 2 x 2 = 4 (A)
• c = bits 11-4 = 4
• n = bits 3-0 = 8
On a 32-bit machine, the offset is then calculated to be:
• Offset = (4 + 8 * 4) * -1 = -36
On a 64-bit machine, the offset is calculated to be:
• Offset = (4 + 8 * 8) * -1 = -68
By this indexing model, the dynamic struture layout problem seems to be solved by the underlying EBC vm.
Although the data field is at different address between 32-bit and 64-bit processor.
Both can use the same encoding.
What about declaring that pointers are always 64 bits, for all
purposes other than final code generation of actual pointer
instructions? Would that solve the problem?
No, sizeof will report wrong values.
Could I modify the parser to let sizeof be a function.
And use natual indexing(ie. the Natural Units) :
19.4 Natural Indexing
The natural indexing mechanism is the critical functionality that enables EBC to be executed
unchanged on 32- or 64-bit systems. Natural indexing is used to specify the offset of data relative
to a base address. However, rather than specifying the offset as a fixed number of bytes, the offset
is encoded in a form that specifies the actual offset in two parts: a constant offset, and an offset
specified as a number of natural units (where one natural unit = sizeof (VOID *)). These two
values are used to compute the actual offset to data at runtime. When the VM decodes an index
during execution, the resultant offset is computed based on the natural processor size. The encoded
indexes themselves may be 16, 32, or 64 bits in size. Table 19-4 describes the fields in a natural
index encoding.
Table 19-4. Index Encoding
Bit # Description
N Sign bit (sign), most significant bit
N-3…N-1 Bits assigned to natural units (w)
A…N-4 Constant units (c)
0…A-1 Natural units (n)
As shown in Table 19-4, for a given encoded index, the most significant bit (bit N) specifies the
sign of the resultant offset after it has been calculated. The sign bit is followed by three bits
(N-3…N-1) that are used to compute the width of the natural units field (n). The value (w) from
this field is multiplied by the index size in bytes to determine the actual width (A) of the natural
units field (n). Once the width of the natural units field has been determined, then the natural units
(n) and constant units (c) can be extracted. The offset is then calculated at runtime according to the
following equation:
Offset = (c + n * (sizeof (VOID *))) * sign
to evaluate sizeof(void*) at running time?
Maybe it can use the following code to get the sizeof(void*) at whether 32-bit and 64-bit processor: