# Question: Bytecode Representation of Type Definitions Table

Distinguished LLVM Creators,

I've been looking through the bytecode representation of the type definition table and had a few questions about it. There's an enum in Types.h that defines all bytecodes that represent the primitive types and a few other necessary things:

0 = 0x00 = Void
1 = 0x01 = Bool
2 = 0x02 = UByte
3 = 0x03 = SByte
4 = 0x04 = UShort (16 bits)
5 = 0x05 = Short (16 bits)
6 = 0x06 = UInt (32 bits)
7 = 0x07 = Int (32 bits)
8 = 0x08 = ULong (64 bits)
9 = 0x09 = Long (64 bits)
10 = 0x0a = Float (32 bits)
11 = 0x0b = Double (64 bits)
12 = 0x0c = Type definition
13 = 0x0d = Label
14 = 0x0e = Function
15 = 0x0f = Struct
16 = 0x10 = Array
17 = 0x11 = Pointer
18 = 0x12 = Opaque

As far as I can figure, the type definition table itself starts back at 0x0e and I'm thinking that's because the label is the last thing that wouldn't have to be only part of a derived type. But it still seems to make some of the low entries in the table ambiguous (at least to me!). I compiled a nice little hello world program into LLVM and then into bytecodes (see complete results attached). Here is the start of the type definition table:

Entry 0x0e: Pointer to type 0x0f
0000001a 11 0f

Entry 0x0f: Array of SByte [14] (presumably for "Hello World!\n" constant)
0000001c 10 03 0e

Entry 0x10: Pointer to type 0x12
0000001f 11 |....n...n.......|
00000020 12

Entry 0x11: Pointer to SByte
00000021 11 03

Entry 0x12: Function returning Pointer ( UInt )
00000023 0e 11 01 06

Okay, so looking at entry 0x10: is it a pointer to Opaque or a pointer to a function returning Pointer ( UInt )? I'm guessing the latter. Similarly, entry 0x0e could be a pointer to Struct or a pointer to Array of SByte [14]. Again I'm guessing the latter. I'm worried this low table stuff isn't unambiguous in all cases, but then again I'm a nervous guy. If you could set my mind at ease with regard to the lack of ambiguity that would be great.

And what's with this Opaque type anyway? It's in the enum but I haven't found an instance of its use, unless of course it's used in entry 0x10. The whole missing Opaque thing makes me nervous too. It seems like it was just put there to be unclear. But seriously, is it used for anything now? Will it start to get used sometime?

Regards,

-- Robert.

Robert Mykland Voice: (831) 462-6725

hello.hexdump (26 KB)

hello.c (77 Bytes)

hello.s (10.2 KB)

As far as I can figure, the type definition table itself starts back at
0x0e and I'm thinking that's because the label is the last thing that
wouldn't have to be only part of a derived type.

Exactly right. The types starting with the function type never appear
explictly in the table/they don't occupy a "slot". Derived types are only
used to build concrete types from other things.

But it still seems to make some of the low entries in the table
ambiguous (at least to me!). I compiled a nice little hello world
program into LLVM and then into bytecodes (see complete results
attached). Here is the start of the type definition table:

Ok.

Entry 0x0e: Pointer to type 0x0f
0000001a 11 0f

Yes, since type 0x0F is '[14 x sbyte]', this is '[14 x sbyte]*'. Forward
references are required for things like recursive types.

Entry 0x0f: Array of SByte [14] (presumably for "Hello World!\n" constant)
0000001c 10 03 0e

Yup.

Entry 0x10: Pointer to type 0x12
0000001f 11 |....n...n.......|
00000020 12

Yup: 'sbyte* (uint)*'

Entry 0x11: Pointer to SByte
00000021 11 03

'sbyte*'

Entry 0x12: Function returning Pointer ( UInt )
00000023 0e 11 01 06

'sbyte* (uint)

Okay, so looking at entry 0x10: is it a pointer to Opaque or a pointer to a
function returning Pointer ( UInt )? I'm guessing the latter. Similarly,
entry 0x0e could be a pointer to Struct or a pointer to Array of SByte
[14]. Again I'm guessing the latter.

You're right. The parsing algorithm goes like this:

Read a byte. This defines the 'typeid' to use for the type. This is
ne of the values from the Type.h file, including things like
structure, pointer, opaque, function, ... as well as the primitive
types.

If it's a derived type, extra information is read indicating what type of
parameters there are for functions, which the pointee of a pointer is,
etc. These type id's are type #'s, not primitive ID numbers. You cannot
refer to a "generic" structure or function or anything like that. Forward
references are allowed.

I'm worried this low table stuff isn't unambiguous in all cases, but
then again I'm a nervous guy. If you could set my mind at ease with
regard to the lack of ambiguity that would be great.

It seems to work so far. It should be ambiguous, we haven't had any
problems.

And what's with this Opaque type anyway? It's in the enum but I haven't
found an instance of its use, unless of course it's used in entry
0x10. The whole missing Opaque thing makes me nervous too. It seems like
it was just put there to be unclear.

Opaque type is for a type that does not have a definition yet. In C, for
example, if you say 'struct foo;' and never provide the body, you get an
llvm type like:

%struct.foo = opaque;

Allowing you to build definitions like '%struct.foo*', etc. Later, when
the type is resolved in the linking phase, all of these types are updated
to have their "true" values.

But seriously, is it used for anything now? Will it start to get used
sometime?

It is used extensively for a lot of things, including the "forward
referencing" of types in the bytecode and asm files...

-Chris