RFC: ThinLTO File Format

As discussed in the high-level ThinLTO RFC (http://lists.cs.uiuc.edu/pipermail/llvmdev/2015-May/086211.html), we would like to add support for native object wrapped bitcode and ThinLTO information. Based on comments on the mailing list, I am adding support for ThinLTO in both normal bitcode files, as well as native-object wrapped bitcode.

The following RFC describes the planned file format of ThinLTO information both in the bitcode-only and native object wrapped cases. It doesn’t yet define the exact record format, as I would like feedback on the overall block design first.

I’ve also implemented the support for reading and writing the bitcode blocks in the following patch:
http://reviews.llvm.org/D11722

The ThinLTO data structures and the file APIs are described in a separate RFC I will be sending simultaneously, with pointers to the patches implementing them.

Looking forward to your feedback. Thanks!
Teresa

ThinLTO File Format

Bitcode ThinLTO Support
ThinLTO Bitcode Blocks
THINLTO_SYMTAB_BLOCK
THINLTO_MODULE_STRTAB_BLOCK
THINLTO_FUNCTION_SUMMARY_BLOCK
Bitcode Combined Function Summary
Native-Wrapped ThinLTO Support
Native-wrapped bitcode
Native-Wrapped Combined Function Summary

This document discusses the high-level file format used to represent ThinLTO function index/summary information. It covers the index created at the module level (produced by the phase-1 -c compiles) and the combined function index/summary generated by the phase-2 linker step of a ThinLTO compile. More information about ThinLTO compilation can be found in the Updated RFC at: http://lists.cs.uiuc.edu/pipermail/llvmdev/2015-May/086211.html

As discussed in that document and subsequent mailing list discussions, we will add support for ThinLTO with both normal bitcode-only intermediate files, as well as native-wrapped bitcode files. This document describes the ThinLTO format for both of these file types. The formats were designed to allow as much sharing of APIs and implementation as possible between the two file types.

The ThinLTO information is written to the per-module (translation unit) intermediate files during the phase-1 (-c) compile. They are read by the phase-2 linker step, which aggregates them into a combined function index/summary file, and which by default does not need to parse the rest of the module IR. The phase-3 parallel backend processes that each compile a single module into a final object file read the combined function index/summary file during importing, but do not need to look at the module’s own ThinLTO information.

Since as noted above the usual normal non-ThinLTO module IR and its ThinLTO information are not typically needed in the same compile step, the following design tries to minimize the required parsing of the normal module bitcode IR when reading the ThinLTO information, and vice versa.

Bitcode ThinLTO Support

This section describes the representation of ThinLTO for bitcode-only intermediate files.

ThinLTO Bitcode Blocks

There will be three ThinLTO bitcode blocks nested within an outer THINLTO_BLOCK, which itself is nested within the outer MODULE_BLOCK:

<MODULE_BLOCK>

<THINLTO_BLOCK BlockID=19 …>

<THINLTO_SYMTAB_BLOCK BlockID=20 …>

</THINLTO_SYMTAB_BLOCK>

<THINLTO_MODULE_STRTAB_BLOCK BlockID=21 …>

</THINLTO_MODULE_STRTAB_BLOCK>

<THINLTO_FUNCTION_SUMMARY_BLOCK BlockID=22 …>

</THINLTO_FUNCTION_SUMMARY_BLOCK>

</THINLTO_BLOCK>

</MODULE_BLOCK>

These block IDs are defined along with other LLVM bitcode IDs in include/llvm/Bitcode/LLVMBitCodes.h:

namespace llvm {

namespace bitc {

// The only top-level block type defined is for a module.

enum BlockIDs {

// Blocks

MODULE_BLOCK_ID = FIRST_APPLICATION_BLOCKID,

// Module sub-block id’s

THINLTO_BLOCK_ID,

// ThinLTO sub-block id’s.

THINLTO_SYMTAB_BLOCK_ID,

THINLTO_MODULE_STRTAB_BLOCK_ID

THINLTO_FUNCTION_SUMMARY_BLOCK_ID,

};

The outer THINLTO_BLOCK will contain a record with the version ID of the ThinLTO information, which will evolve as the importing algorithm is tuned.

The exact record formats within each of the THINLTO_*_BLOCKs are still TBD, but the following is an overview of what they will contain:

THINLTO_SYMTAB_BLOCK

This block contains a record for each function in the summary. The record will contain the ValueID of the corresponding function symbol in the VALUE_SYMTAB_BLOCK (which contains the function’s name string), as well as the bitcode offset of the corresponding function summary record. The latter enables fast seeking when the function summary section is read lazily.

THINLTO_MODULE_STRTAB_BLOCK

This block contains a record for each module with functions in the combined function index/summary file, holding the module ID and its path string (so that the module can be located during phase-3 importing). This block is not needed in the per-module function index/summary, as the module path is known by the linker when the file is loaded. Additionally, the unique module ID is assigned to each module by the phase-2 linker step when creating the combined index (used to attain consistent renaming during static promotion in the phase-3 backend).

THINLTO_FUNCTION_SUMMARY_BLOCK

This block contains a record for each function available for importing. At a minimum, it holds the index into the THINLTO_MODULE_STRTAB_BLOCK of the module containing the function, as well as the bitcode offset of the function’s FUNCTION_BLOCK within that module. The THINLTO_MODULE_STRTAB_BLOCK index will be 0 in the per-module function summary, as that section does not exist yet, but will be non-zero in the combined index/summary file (see Bitcode Combined Function Summary section below). It also will be used to hold information about the function that is useful in making importing decisions (e.g. its instruction count and profile entry count).

There are several reasons for this block organization:

  1. Nesting ThinLTO subblocks within an parent THINLTO_BLOCK allows the ThinLTO information to be quickly skipped during frontend parsing in the backend phase-3 parallel backend compile steps, when the ThinLTO information in the module is not needed.

  2. Nesting within the MODULE_BLOCK allows the THINLTO_SYMTAB_BLOCK records to share the function name strings with the MODULE_BLOCK’s function symbols. These strings are saved in the VALUE_SYMTAB_BLOCK nested within the Module block. Note that it would be faster to parse ThinLTO blocks during the phase-2 linker step if they were not nested within the MODULE_BLOCK (which could be skipped in one step using the size in the MODULE_BLOCK entry), since the phase-2 parser is only interested in the ThinLTO blocks. But this block placement enables greater size efficiency.

  3. Separating the ThinLTO function symtab information from the rest of the function summary has a couple of benefits:

  4. Mirrors how the information is structured in the native-wrapped case, where the native object symbol table is leveraged for holding the symbol name plus index into the summary section. This in turn enables better sharing of the bitcode parsing code and interfaces (discussed in more detail below in the native-wrapped description).

  5. Enables lazy reading of the function’s summary information, delayed until we are considering importing that function, while allowing fast checking of whether the function is available for importing (via presence in the ThinLTO function symtab).

Because, as mentioned earlier, the THINLTO_BLOCK and the rest of the MODULE_BLOCK are not typically both needed in a single compile step, we will implement a ThinLTO-specific bitcode reader class (ThinLTOBitcodeReader) to handle parsing of the ThinLTO blocks. This bitcode reader will hold a pointer to the ThinLTO data structure to be populated with the ThinLTO information (data structures described in a separate “ThinLTO File API and Data Structures” RFC which should be sent out at the same time). It will ignore all MODULE_BLOCK subblocks except the THINLTO_BLOCK, the BLOCKINFO_BLOCK containing abbrev IDs, and the VALUE_SYMTAB_BLOCK. The VALUE_SYMTAB_BLOCK parser is specialized/simplified since there will not be any Value objects created during ThinLTO parsing, we simply need to correlate each string with its ValueID in the VALUE_SYMTAB_BLOCK record.

Bitcode Combined Function Summary

The combined function index/summary (thin archive) file created by the phase-2 linker step will also be bitcode. It will consist of a MODULE_BLOCK containing only a THINLTO_BLOCK, a BLOCKINFO_BLOCK, and a VALUE_SYMTAB_BLOCK. The THINLTO_BLOCK will contain all three subblocks, with the THINLTO_SYMTAB_BLOCK and the THINLTO_FUNCTION_SUMMARY_BLOCK holding the aggregated per-module ThinLTO information. As noted earlier, it will also contain a THINLTO_MODULE_STRTAB_BLOCK created from the linked modules. The combined index will exclude symbols that are undefined, duplicate (e.g. comdats) or unlikely to benefit from importing. The THINLTO_FUNCTION_SUMMARY_BLOCK offsets in the THINLTO_SYMTAB_BLOCK records are updated to reflect the new offset into the combined THINLTO_FUNCTION_SUMMARY_BLOCK, and the THINLTO_FUNCTION_SUMMARY_BLOCK records are updated to include the appropriate module index into the THINLTO_MODULE_STRTAB_BLOCK.

Native-Wrapped ThinLTO Support

This section describes the representation of ThinLTO for native-wrapped bitcode intermediate files. The discussion here uses ELF as an example, but should also apply to other formats such as COFF and Mach-O [1].

Native-wrapped bitcode

There is already support in LLVM for reading native-wrapped bitcode, where the bitcode is contained within an .llvmbc section. For ThinLTO, unlike in the earlier bitcode-only case, the ThinLTO information is not nested within the MODULE_BLOCK contained within the .llvmbc section. Instead, the native object will contain a symbol table, and special sections holding the additional ThinLTO information. These sections are the function summary section (.llvm_thinlto_funcsum) containing the function’s bitcode offset and summary information for importing decisions, as well as the module path string table (.llvm_thinlto_modstrtab).

For simplicity and consistency with the bitcode-only format and interfaces, the contents of the .llvm_thinlto_funcsum and .llvm_thinlto_modstrtab will be encoded with bitcode. The .llvm_thinlto_modstrtab section will contain bitcode for a single THINLTO_MODULE_STRTAB_BLOCK. The format and contents of this block will be identical to the equivalent block in the bitcode-only case. Similarly, the .llvm_thinlto_funcsum section will contain bitcode for a single THINLTO_FUNCTION_SUMMARY_BLOCK. The format and contents will be identical to the equivalent block in the bitcode-only case, however note that the bitcode offset for the FUNCTION_BLOCK is the offset within the .llvmbc section bitcode (which contains the function IR).

As with the symbol table in a normal object file, the symbol table for the native-object wrapped bitcode file will hold entries for both defined and undefined but referenced symbols. The entries for functions defined in the module specify the location of that function’s summary via the st_shndx (index of .llvm_thinlto_funcsum section) and st_value (bitcode offset within .llvm_thinlto_funcsum section). The st_size field will hold the size of the function summary entry in the .llvm_thinlto_funcsum section. Note that for functions that are deemed unlikely to benefit from importing (e.g. large and cold), the summary data will be suppressed and the symtab entry will simply have a zero offset and size.

The symbol’s visibility can be emitted in the st_other field which typically holds the visibility info. If a tool such as objcopy or ld -r modifies the symbol visibility, this change is recorded in the symbol table. The change will be propagated to the bitcode when the backend compiles the native-wrapped bitcode.

E.g.:

Section Headers:

[Nr] Name Type Address Offset

Size EntSize Flags Link Info Align

[ 0] NULL 0000000000000000 00000000

0000000000000000 0000000000000000 0 0 0

[ 1] .shstrtab STRTAB 0000000000000000 0000024b

0000000000000059 0000000000000000 0 0 1

[ 2] .text PROGBITS 0000000000000000 00000040

0000000000000000 0000000000000000 AX 0 0 16

[ 5] .llvmbc PROGBITS 0000000000000000 00000040

000000000000044c 0000000000000000 E 0 0 4

[ 6] .llvm_thinlto_funcsum PROGBITS 0000000000000000 00000040

0000000000000400 0000000000000000 E 0 0 4

[ 7] .llvm_thinlto_modstrtab PROGBITS 0000000000000000 00000440
0000000000000013 0000000000000000 E 0 0 4

Symbol table ‘.symtab’ contains 11 entries:

Num: Value Size Type Bind Vis Ndx Name

0: 0000000000000000 0 NOTYPE LOCAL DEFAULT UND

1: 0000000000000000 0 FILE LOCAL DEFAULT ABS t1.c

2: 0000000000000000 0 SECTION LOCAL DEFAULT 2

3: 0000000000000000 0 SECTION LOCAL DEFAULT 4

4: 0000000000000000 0 SECTION LOCAL DEFAULT 5

5: 0000000000000000 0 SECTION LOCAL DEFAULT 6

6: 0000000000000000 0 SECTION LOCAL DEFAULT 7

7: 0000000000000000 0 SECTION LOCAL DEFAULT 8

8: 0000000000000040 40 FUNC GLOBAL DEFAULT 6 bar

9: 0000000000000000 40 FUNC GLOBAL DEFAULT 6 foo

10: 0000000000000000 0 NOTYPE GLOBAL DEFAULT UND blah

The section index value in the symtab entry for ‘foo’ are 6 and 0x0, respectively, meaning that the function summary info for ‘foo’ can be found in section 6 (.llvm_thinlto_funcsum) at offset 0x0. Similarly, the function summary info for ‘bar’ can be found in .llvm_thinlto_funcsum at offset 0x40. The size refers to the size of the corresponding function summary entry.

Native-Wrapped Combined Function Summary

The combined function summary (thin archive) file created by the phase-2 linker step can also be in native object format. It will contain the symbol table, and just the .llvm_thinlto_funcsum and .llvm_thinlto_modstrtab sections, combined across all of the linked modules. The combined symbol table, .llvm_thinlto_funcsum and .llvm_thinlto_modstrtab sections will exclude symbols that are undefined, duplicate (e.g. comdats) or unlikely to benefit from importing. The offsets in the symbol table are updated to reflect the new offset into the .llvm_thinlto_funcsum section, and the .llvm_thinlto_funcsum updated to include the appropriate module path index in the new .llvm_thinlto_modstrtab section.

Quick note that the companion RFC was apparently a little large - it is
waiting for moderator approval.
Thanks,
Teresa

I think I’ve read all the feedback posted regarding your May proposal. I have yet to see a single response that wants native object wrapped bitcode.

If the only use for native object wrapped bitcode is for your project at Google, then it probably shouldn’t go into the tree against all of these objections.

Alex

Hi Alex,

After outlining some of the rationale for using native-wrapped, there were a couple of responses that indicated native-wrapped support was reasonable, but they preferred to see bitcode-only first (Phillip and Rafael). This is essentially what this proposal and the patches do - I’ve implemented some of the basic support for looking for and parsing the native-wrapped sections, but the bitcode-only reading/writing support is more complete.

In fact, as described in this RFC, I designed the native-wrapped format to utilize the same bitcode encoding for most of the ThinLTO information, so it uses most of the same underlying bitcode interfaces anyway. The additional support required for native-wrapped is not tremendous, and is similar to existing support in the LLVM tree for reading native-wrapped bitcode.

We believe that there will be clang/llvm users who will find native-wrapped ThinLTO easier to use for the same reasons (e.g. compatibility with existing native toolchains), so I don’t expect this to be Google specific.

Thanks,
Teresa

Ping. Explicitly adding a few more people who commented on the earlier (high-level) ThinLTO RFC. I removed the body of the RFC here since the original was large and had trouble getting through the mailer. I also updated the patch mentioned below so that it was emailed to llvm-commits properly.

Thanks,
Teresa

Alex already made what I consider to be the most relevant point. I would suggest removing the unwanted functionality and asking again. thus worth reviewing and/or talking about) once the native bitcode version is in tree and functional. Frankly, I consider the native wrapped bitcode to be an entirely orthogonal proposal that shouldn't be tied to ThinLTO at all.

Fair warning, I'm not going to be particularly involved either way. This is far enough from my own immediate interests that I can't spare the cycles. I would suggest that you collaborate closely with the Sony and Apple folks who are already *using* LTO to find a proposal they're happy with. Until you do that, you are unlikely to make much progress.

Philip

Alex already made what I consider to be the most relevant point. I would
suggest removing the unwanted functionality and asking again.

I find your definition of 'unwanted' too narrow -- there are certainly
users who may want this. It would be a more productive discussion if the
comments can be made on the details of the RFCs themselves.

David

I can remove the native wrapper portions of the associated patch and add that later. Most of the support as I mentioned is for the bitcode handling anyway, but I wanted to include a skeleton of the native wrapper part. For the RFC, I wanted to show the end goal including how the native wrapper support would look (it in fact mostly leverages the same bitcode encoding, so there isn’t a lot of difference, and hence there isn’t a whole lot of extra code needed to support that). The bulk of the RFC deals with the bitcode format, and I would love some feedback on that.

Thanks,
Teresa

I went ahead and replied to two of the review threads. I only considered the parts which would be left without the native wrapped bitcode support.

Philip

Saw that, thanks! Responding now. Will update the patch with some changes and the wrapper stuff removed later today or very early tomorrow.
Teresa

Hi all,

I updated the patches to remove the native object wrapper format. As suggested we will work on getting the ThinLTO framework in place using bitcode first, and then work on adding the native object support. As noted in this RFC and in the associated patch D11722, for now I have empty ThinLTO blocks with no records, since I wanted to get feedback on the overall block design first. The RFC discusses this in more detail, but one of the main ideas is to leverage the existing value symbol table block in the module to avoid duplicating function symbol strings, e.g.

I also wanted to call out another important design consideration here, since it is buried in the other RFC (ThinLTO File API and Data Structures), and has a big influence on the way I have designed the ThinLTO index and object file data structures. The ThinLTO index is read in compile/link steps when the rest of the Module IR is not, and vice versa. That is why I have separate data structures for reading/holding the ThinLTO index. The ThinLTO index in the module (generated during the initial -c compile step) is needed by other modules during the later parallel backend compile phase, and therefore it is only used in the linker plugin step to create the combined index file. The rest of the Module IR is not read during this step (eventually we may look at adding heavier weight whole program analysis under an option, but by default the Module, Functions, etc are not read or materialized). When the normal Module IR is read during the parallel backend compile step, the ThinLTO information in its own module is not read, as the importing pass will read the combined (global) index file instead. This is because a module is only interested in the ThinLTO index from other modules that it is considering importing from.

Right now I have 5 outstanding patches to put in the basic infrastructure/options for reading/writing the ThinLTO function indices:

D11721 [ThinLTO] Data structures for holding ThinLTO function index/summary

D11722 [ThinLTO] Bitcode reading/writing support for ThinLTO function summary/index

D11723 [ThinLTO] ThinLTO object file interfaces

D11907 LLVM support for -fthinlto option.

D11908 Clang support for -fthinlto.

Once the basic options support, data structs, and bitcode support goes in I can send patches for generating/emitting the function index and the combined function index (off by defaut, guarded by the -fthinlto option), and subsequently send patches for the function importing during the backend compile step. I’ve tried to break down the above infrastructure into small pieces for review, and plan to implement the rest via incremental patches.

Hope this clarifies the approach I’m taking! Looking forward to additional feedback on the approach and the patches.

Thanks,
Teresa

This RFC and the patches listed below are now obsolete. I have written up the bitcode format discussed with Duncan and others, which I’ve copied below but the link to the doc with potentially better formatting is:
https://drive.google.com/file/d/0B036uwnWM6RWdnBLakxmeDdOeXc/view.

Duncan, can you take a look and make sure this properly describes the format changes we discussed?

I just sent a patch which implements the part of the bitcode format changes that apply to the lazy function reader: http://reviews.llvm.org/D12536. The current patch does not contain any of the ThinLTO specific changes, which will be sent as a follow-on patch.

I’ve additionally created a site that contains links to all the existing ThinLTO related RFCs and patches, which I will try to keep updated:
https://sites.google.com/site/llvmthinlto

Thanks,
Teresa

ThinLTO Bitcode Format Design

Bitcode Format Changes to Support ThinLTO

During importing, ThinLTO needs the following information for a potentially imported function:

  • Summary information to determine import profitability (e.g. inst count, hotness, etc)

  • Location of function IR (path to module, bitcode offset)

  • Import source module unique identifier (for consistent renaming of promoted locals)

ThinLTO interaction with function summary/index:

  • Phase-1 compile (-c -fthinlto) produces summary info and records function bitcode offsets

  • Phase-2 link aggregates all function summary and index information into combined index file, assigns unique module ids, records module paths and ids in combined index file

  • Phase-3 parallel backend processes independently use combined index for function importing decisions and mechanics

Lazy Reader Support

The existing function lazy reader builds an index of function blocks bitcode offsets on the fly while initially walking through (and skipping) function blocks. ThinLTO also needs an index of function blocks but we don’t want to pay the cost of building this on the fly each time a function is imported. The bitcode indexes of function records will therefore be added to ValueSymbolTable (VST) function records, and the existing lazy function reader will be changed to use this index rather than building it on the fly. Then ThinLTO importing can easily leverage the same infrastructure as lazy function reading.

Specifically, augment the existing lazy reader with the function bitcode index:

  • Build lazy reader’s DeferredFunctionInfo map (maps from Function* to function block bitcode offset) from index rather than by parsing the function blocks, use during currently lazy reading as well as ThinLTO importing.

  • Means that bitcode index is needed before the function blocks in the bitcode (phase-ordering issue discussed below)

  • Include bitcode offset with ValueSymbolTable (VST) function records

Function offset needed in bitcode before function blocks (in order to use for lazy function reading). However, we don’t have function offsets when this part of bitcode being encoded/written. This requires some kind of backpatching. There are several approaches:

  1. Backpatching a bitcode offset precludes encoding it as a VBR, since we don’t know how many chunks are required. This means any backpatched bitcode offsets must be 64-bit fixed. Doing this for every function VST record can result in high overhead.

  2. Could encode the function blocks twice, once before the VST in a temp stream to get each bitcode offset into the stream of function blocks, then again into the final location in the real output stream (or copy over the pre-encoded function blocks at the final location). This has time overhead, but allows VBRs to be used for encoding the offsets (which are offsets from the start of the function blocks, not from the start of the file).

  3. Encode the VST after the function blocks, but place a new forward declaration VST record at the point where we previously had the VST (before the function blocks). Only the one forward decl record needs to be backpatched with a 64-bit fixed offset (can likely get by with a 32-bit word offset as the real VST block should be word aligned). The reader needs to be taught to jump to and parse the real VST when seeing the forward decl VST, and to jump back after reading it.

Proposal #3 above (forward decl VST) is the approach that was agreed to and is being implemented. The first patch will implement this new forward decl VST, add the bitcode offsets to the real VST, and change the lazy reader to use the bitcode offsets from the VST instead of building up the DeferredFunctionInfo on the fly.

The new VST bitcode (used by the lazy function reader even without ThinLTO) are shown below:

A.c:

A1() {…}

static A2() {…}

BitOffset
0 <MODULE_BLOCK>

// MODULE_CODE_VSTOFFSET: [wordoffset(32-bit fixed)]
<VSTOFFSET 20/> // 2032 = 640
320 <FUNCTION_BLOCK>

</FUNCTION_BLOCK>
480 <FUNCTION_BLOCK>

</FUNCTION_BLOCK>
640 <VALUE_SYMTAB>
// VST_FNENTRY: [valueid, funcoffset(VBR), namechar x N]
<FNENTRY 0, 10, “A1”/> // 10
32 = 320
<FNENTRY 1, 15, “A2”/> // 15*32 = 480

</VALUE_SYMTAB>
</MODULE_BLOCK>

ThinLTO-Specific Bitcode Changes

In addition to the VST changes above, for ThinLTO importing additional bitcode blocks are needed. These will initially only be generated under -fthinlto, unless other use cases are identified. The bitcode changes are summarized below.

Per-Module Bitcode

This pertains to bitcode generated by the phase-1 compile step (-fthinlto -c). It includes one new block that holds summary information for the functions in that module, summarized below:

  • Function summary info encoded in new FUNCTION_SUMMARY_BLOCK.

  • One record per function with summary data containing: VST value id, islocal flag for phase-2 renaming decisions, summary info for importing decisions (e.g. instruction count). The summary information will evolve over time.

Note that the summary block is only used to create the combined index in phase 2. It is not used when compiling that module through the phase 3 backend. The earlier example is expanded with the function summary block below:

BitOffset
0 <MODULE_BLOCK>

// MODULE_CODE_VSTOFFSET: [wordoffset(32-bit fixed)]
<VSTOFFSET 20/> // 2032 = 640
320 <FUNCTION_BLOCK>

</FUNCTION_BLOCK>
480 <FUNCTION_BLOCK>

</FUNCTION_BLOCK>
640 <VALUE_SYMTAB>
// VST_FNENTRY: [valueid, funcoffset(VBR), namechar x N]
<FNENTRY 0, 10, “A1”/> // 10
32 = 320
<FNENTRY 1, 15, “A2”/> // 15*32 = 480

</VALUE_SYMTAB>
<FUNCTION_SUMMARY_BLOCK>
// FS_ENTRY: [valueid, islocal, instcount]
<ENTRY 0, 0, 10/>
<ENTRY 1, 1, 15/>
</FUNCTION_SUMMARY_BLOCK>
</MODULE_BLOCK>

Combined Index File Bitcode

This pertains to the combined index file generated by the phase-2 link step, which is encoded as bitcode. This file contains a single MODULE_BLOCK with only 3 subblocks: the VST, a module path strtab, and a function summary block. Also note that the VST only contains entries for functions, and the record type used in the combined index is changed to include the VBR-encoded bitcode offset of the corresponding summary record in the summary block. This is to allow lazy reading of summary records from the combined index file during importing. That replaces the bitcode offset of the function summary block which is not needed in the combined index (it is obtained from the importee module’s VST when importing from that module).

  • Module paths encoded in new MODULE_STRTAB_BLOCK.

  • One record per module containing: unique module id assigned during phase-2 link and module path string

  • Function summary info encoded in new FUNCTION_SUMMARY_BLOCK.

  • One record per function containing: VST value id, module id, summary info for importing decisions

  • VST:

  • One record per function containing: value id, function summary offset, function name string

Note that a VST forward decl record is not needed in the combined index, as the VST can be connected to the summary records later via the value ids (eager parsing of summary) or via the summary record offsets (lazy parsing of summary). When reading the summary eagerly, we just need to build a temporary map from value id to summary structure.

BitOffset
0 <MODULE_BLOCK>
<MODULE_STRTAB_BLOCK>
// MST_ENTRY: [modid, namechar x N]
<ENTRY 1, “A.o”/>
<ENTRY 2, “B.o”/>
</MODULE_STRTAB_BLOCK>
<FUNCTION_SUMMARY_BLOCK>
// FS_ENTRY: [valueid, modid, islocal, instcount]
500 <ENTRY 0, 2, 0, 100/>
550 <ENTRY 1, 2, 0, 20/>
600 <ENTRY 2, 1, 0, 10/>
650 <ENTRY 3, 1, 1, 15/>
</FUNCTION_SUMMARY_BLOCK>
<VALUE_SYMTAB>
// VST_FNENTRY: [valueid, funcsumoffset, namechar x N]
<FNENTRY 2, 600, “A1”/>
<FNENTRY 3, 650, “A2”/>
<FNENTRY 0, 500, “B2”/>
<FNENTRY 1, 550, “B2”/>

</VALUE_SYMTAB
</MODULE_BLOCK>

Note that the value ids are reassigned here to be unique as they are no longer correlated with uses outside of the function summary records. They are not strictly necessary for correlating VST entries with function summary entries, but enable some sanity checking.

Overall, I really like the direction of the new proposal. The integration with existing lazy reading mechanisms is good to see.

w.r.t. the function summary section, I’d suggest that this doesn’t need to be ThinLTO specific either. It also sounds like this isn’t fully fleshed out yet, so what I might suggest is that we explicitly design this strictly as a cache of the information stored elsewhere in the file. If we did so, we could freely evolve the format without worrying about maintaining backwards compatibility by just ignoring the contents of the summary section (by rebuilding it) unless it’s an exact match. If we wrote the interface carefully, this could be entirely invisible to the consumers of the file.

w.r.t. the summary file, this feels like it has a lot in common with bitcode linking. Is there infrastructure which could be shared here?

Philip

Overall, I really like the direction of the new proposal. The integration
with existing lazy reading mechanisms is good to see.

Thanks, glad to hear that.

w.r.t. the function summary section, I'd suggest that this doesn't need to
be ThinLTO specific either.

True. While the new format proposal talks about it in the context of
ThinLTO, since that is the first consumer, based on earlier feedback
the sections, data structures, and interfaces are being stripped of
the "ThinLTO" prefix.

It also sounds like this isn't fully fleshed
out yet, so what I might suggest is that we explicitly design this strictly
as a cache of the information stored elsewhere in the file. If we did so,
we could freely evolve the format without worrying about maintaining
backwards compatibility by just ignoring the contents of the summary section
(by rebuilding it) unless it's an exact match. If we wrote the interface
carefully, this could be entirely invisible to the consumers of the file.

Point taken about backwards compatibility. To address this we can
include a special summary version ID record in the summary section.
The summary format ID can be incremented when the format is changed in
an incompatible way. E.g. we can provide backwards compatibility when
adding new summary fields to the end of the existing summary records
(or adding new record types), where we just fall back to default
values for the missing summary items when reading the old bitcode. If
any existing fields are changed, that's when the summary version ID
needs to be bumped. Note that the main bitcode version ID should not
be changed so that the non-summary block bitcode remains usable.

At least after the initial bring up/tuning of ThinLTO (or if any other
consumers of the summary section are added) we should try to make
backwards compatible changes as much as possible.

When the summary version ID changes, ignore old summary data or if
appropriate exit with a clear message.

It is possible for the per-module summary to eventually be rebuilt on
the fly like you suggested when the version ID doesn't match. Note
that for the ThinLTO phase-2 link step where the per-module summary is
read, this would require invoking a full bitcode read which normally
isn't needed in this step. Also note that this won't work for the
summary sections in the combined summary file: They will eventually
hold additional information determined during the phase-2 link step
which should eventually perform some inexpensive IPA (possibly based
just on the info in the summary sections so that no additional bitcode
must be read), so it can't be reconstructed from looking at the single
file corresponding to that function.

w.r.t. the summary file, this feels like it has a lot in common with bitcode
linking. Is there infrastructure which could be shared here?

The linking to create the basic summary file is significantly simpler
than (and doesn't include) normal bitcode linking, since by default it
would just read the summary section and VST FNENTRY records (for the
value id and name), and it simply assigns module ids and does a basic
merge of the resulting function index data structures. So while the
interfaces to read and link when creating the combined index look
similar to the regular bitcode read and link, they operate on separate
data structures (e.g. an index vs a Module), and use a simplified
bitcode reader implementation (e.g. that doesn't try to construct
actual GVs).

Thanks,
Teresa