Question about METADATA_BLOCKs in bitcode

I have a question about module-level METADATA_BLOCKs in the bitcode.
There are currently two blocks with the METADATA_BLOCK id at module
scope. The first has the module-level metadata values (consisting of
some combination of METADATA_* record codes except for METADATA_KIND).
The second consists only of METADATA_KIND records. The latter is used
only in the METADATA_ATTACHMENT block within function blocks (for
metadata attached to instructions).

For ThinLTO we want to delay the parsing of module level metadata
until all functions have been imported from that module (there is some
bookkeeping used to suture it up when we read it during a post-pass).
However, I do need the METADATA_KIND records since that is used when
parsing the function body being imported. In my ThinLTO prototype I
was simply skipping the first METADATA_BLOCK and always parsing the
second, since I knew from inspecting the BitcodeWriter that the second
was the one with the METADATA_KIND records.

When working on the implementation of this handling for the upstream
patches, I am reassessing how to do this perhaps more robustly. A few
options:

1) Should the two module-level blocks use different block ids since
they hold different types of records? I.e. a METADATA_BLOCK_ID
(holding the actual metadata) and a new METADATA_KIND_BLOCK_ID (for
the metadata kind records). That way it is more obvious what the two
separate blocks are for and it is also much more trivial to identify
and skip just the former.

2) Should I simply continue to assume that we will have two
METADATA_BLOCKs at module level and that the kinds will always be in
the latter?

3) Another option is to peek inside the METADATA_BLOCK to check the
first record's type, and if it isn't METADATA_KIND, skip the rest of
the block. Looks like EnterSubBlock already optionally returns the
size of the block in words, so I could use that to compute the bitcode
offset to jump to before reading the first record, in case it is the
wrong METADATA_BLOCK and I want to skip it.

Thoughts?

Thanks,
Teresa

I have a question about module-level METADATA_BLOCKs in the bitcode.
There are currently two blocks with the METADATA_BLOCK id at module
scope. The first has the module-level metadata values (consisting of
some combination of METADATA_* record codes except for METADATA_KIND).
The second consists only of METADATA_KIND records. The latter is used
only in the METADATA_ATTACHMENT block within function blocks (for
metadata attached to instructions).

For ThinLTO we want to delay the parsing of module level metadata
until all functions have been imported from that module (there is some
bookkeeping used to suture it up when we read it during a post-pass).
However, I do need the METADATA_KIND records since that is used when
parsing the function body being imported. In my ThinLTO prototype I
was simply skipping the first METADATA_BLOCK and always parsing the
second, since I knew from inspecting the BitcodeWriter that the second
was the one with the METADATA_KIND records.

When working on the implementation of this handling for the upstream
patches, I am reassessing how to do this perhaps more robustly. A few
options:

1) Should the two module-level blocks use different block ids since
they hold different types of records? I.e. a METADATA_BLOCK_ID
(holding the actual metadata) and a new METADATA_KIND_BLOCK_ID (for
the metadata kind records). That way it is more obvious what the two
separate blocks are for and it is also much more trivial to identify
and skip just the former.

This SGTM. The other options sound fragile (not generally correct).

Note that the reader will still have to understand the old schema.

I have a question about module-level METADATA_BLOCKs in the bitcode.
There are currently two blocks with the METADATA_BLOCK id at module
scope. The first has the module-level metadata values (consisting of
some combination of METADATA_* record codes except for METADATA_KIND).
The second consists only of METADATA_KIND records. The latter is used
only in the METADATA_ATTACHMENT block within function blocks (for
metadata attached to instructions).

For ThinLTO we want to delay the parsing of module level metadata
until all functions have been imported from that module (there is some
bookkeeping used to suture it up when we read it during a post-pass).
However, I do need the METADATA_KIND records since that is used when
parsing the function body being imported. In my ThinLTO prototype I
was simply skipping the first METADATA_BLOCK and always parsing the
second, since I knew from inspecting the BitcodeWriter that the second
was the one with the METADATA_KIND records.

When working on the implementation of this handling for the upstream
patches, I am reassessing how to do this perhaps more robustly. A few
options:

1) Should the two module-level blocks use different block ids since
they hold different types of records? I.e. a METADATA_BLOCK_ID
(holding the actual metadata) and a new METADATA_KIND_BLOCK_ID (for
the metadata kind records). That way it is more obvious what the two
separate blocks are for and it is also much more trivial to identify
and skip just the former.

This SGTM. The other options sound fragile (not generally correct).

Ok good, I think this is cleaner too.

Note that the reader will still have to understand the old schema.

I will keep support for reading the old format (ThinLTO won't support
it, but it needs recent bitcode anyway).

Thanks,
Teresa