Here's my more detailed feedback, inline with your comments. The patch
for the changes to the documentation I've made is here:
Dear Reid and Chris,
I thought I should send this to the list in case anyone else is struggling
to interpret bytecode files with the new docs.
(1) First a bug I already mentioned to Reid. Unlike the other new
module headers module 0x01 still uses the old 32-bit and 32-bit format
instead of the new 5-bit and 27-bit format. Thus the first module in the
file will be 0x00000001 followed by a 32-bit size.
This is a doc bug. Its fixed per my previous email.
(2) Though it states in the docs that module size numbers do not
include padding bytes, they often do. Here's an incomplete list of which
do and which don't:
module 0x01 includes align bytes
GTP block size does not include align bytes
module global info does
function definitions sometimes do and sometimes don't
compaction tables do not
instruction list -- unknown
function symbol table does not
Hrm. This is not good. I'm going to revisit the whole need for alignment
in bytecode files. I need to check with Chris about why this is
necessary in the first place. There's definitely something screwball
here and it might take a code change (yet another bytecode format change
in 1.4) to fix it.
I'll provide more feedback on this point later.
(3) In Reid's documentation, his "opcode" link is bad. His doc does
not yet contain an opcode section. Presumably this would contain the info
from the include file Instruction.def.
This is also a doc bug. Its fixed by just referencing the
Instruction.def file on the cvsweb which will always contain the correct
list of instruction opcode values for the latest release. Note that that
might not be correct for *your* release
(4) Something is messed up about the symbol table definitions. The
number of entries and the type referred to by the symbol are correct, then
there is a third byte that was either a zero or a one in my simple
test. This third byte is not described in the docs I don't believe. Then
there is the size byte and the string.
The extra byte you're referring to is probably the typeid for the value
plane. It is correct and as designed.
Because types and values got disassociated in 1.3 (Type no longer
derives from Value), they are handled in separate data structures
internally and written to the symbol table block separately. The
documentation is only slightly deficient in this area as it didn't
correctly identify the type of lists used for the types and value
planes. The documentation has been updated to correctly reflect the
nature of the lists.
Robert: if you could, please review:
and let me know if it makes sense now.
(5) Labels used to have their own type. If this is still the case, its
not discussed in Reid's document. It looks like the new type slot for
label is 12, the same as raw function. Presumably this would be the secret
type slot between the last primitive type (11) and the new start of the
defined types table (13).
This is probably a result of the "Type != Value" change that happened in
1.3. In 1.2, we had (in Type.h):
DoubleTyID = 11
TypeTyID = 12
LabelTyID = 13
FunctionTyID = 14
We now have:
DoubleTyID = 11
LabelTyID = 12
FunctionTyID = 13
So, we eliminated TypeTyID (one of the main points of the change was to
get rid of this) and shifted down the values higher than it was before.
So, the new value for a label is 12 and, as you noted, the derived types
start at 13 now instead of 14. This is by design. The bytecode reader
accommodates this change and will read 1.2 bytecode files correct even
though all the primitive type IDs have changed.
Sorry if this causes you grief, but its important for the design of our
internal data structures that the type ids be contiguous. Hopefully this
is the last change in this area for a long time.
This change is briefly documented here:
in the "Type Derives From Value" section.
If I were sorting these into bugs vs. doc changes for 1.31, I would fix #1,
#2, and possibly #4 in the code depending on what the byte is used for.
I don't think any code changes are needed except possibly to clean up
The documentation changes have been made already.
It might be a really good bug hunting expedition before each release to
decode a few bytecode files by hand. In my experience this is the only way
to get physical protocols like this right.
Agreed, but its pretty labor intensive.
What we do have a start on is some regression tests that ensure that at
least backwards compatibility is retained. Its been on my "to do" list
for a while now .. guess I should get it done. We have a regression test
directory specifically set aside for this but we haven't checked in
enough test cases yet. This will get sorted out fairly soon since we
don't want to have any issues going forward.
As far as I know, the current LLVM bytecode reader will read 1.0, 1.1,
1.2 and 1.3 format bytecode files. If you find otherwise, please submit
a bug with your bytecode test case. I will fix any problems and add the
test case to the regression tests.
Hope this helps!
Immensely. Its difficult documenting this in a vacuum, checking it with
the code, looking at it with a hex dumper and "checking it twice". This
kind of feedback is very valuable and I *really* appreciate it.
Thanks again to Reid for the great docs and to Chris and crew for the whole
Sorry the docs weren't "perfect" for you. Hopefully they're better now.