This RFC describes the data structures to hold the ThinLTO function index/summary used to support function importing. It also describes the high-level APIs for reading and writing this information. As discussed in the high-level ThinLTO RFC (http://lists.cs.uiuc.edu/pipermail/llvmdev/2015-May/086211.html), we would like to add support for native object wrapped bitcode and ThinLTO information. Based on comments on the mailing list, I am adding support for ThinLTO in both normal bitcode files, as well as native-object wrapped bitcode.
I’ve implemented support for the data structures in http://reviews.llvm.org/D11721, and support for some of the APIs (not the libLTO API’s, but the underlying ThinLTOObjectFile interfaces used by both libLTO and directly by gold) in http://reviews.llvm.org/D11723.
The file format is described in a separate RFC I am sending simultaneously, which contains a pointer to the patch implementing the bitcode reading/writing support.
Looking forward to your feedback. Thanks!
Teresa
RFC: ThinLTO File API and Data Structures
This RFC covers a proposed API for ThinLTO clients (e.g. gold plugin, linkers, llvm-lto) to use when reading and writing ThinLTO information from bitcode files (raw or wrapped). The APIs are meant to hide the underlying format of the files, much as the existing LTOModule and IRObject interfaces hide the format when reading bitcode (i.e. they work transparently on native wrapped bitcode).
See the following thread for background on ThinLTO and motivation for the APIs:
http://lists.cs.uiuc.edu/pipermail/llvmdev/2015-June/086310.html
Current Interfaces
Currently linkers such as ld64, lld and some proprietary linkers (e.g. Sony), as well as the llvm-lto tool, utilize libLTO to read bitcode intermediate files. Specifically, they use the LTOModule class and interfaces, or the C interface to these (lto_module_* routines). The gold plugin performs many of the same steps, but interfaces directly with the underlying IRObjectFile class, encapsulating the following in the IRObjectFile::create static method.
-
Check if file/mem contains bitcode (including native object wrapped bitcode)
-
static bool LTOModule::isBitcode* and lto_module_is_object* interfaces
-
Reads file into a MemoryBuffer and invokes static IRObjectFile::findBitcodeInMemBuffer which:
-
If memory buffer is bitcode, return as is
-
If memory buffer is object file, invoke ObjectFile::createObjectFile, then IRObjectFile::findBitcodeInObject to look for .llvmbc section, if found return memory buffer for section contents
-
Return true if findBitcodeInMemBuffer returns non-null
-
Read and parse bitcode from file/mem, build Module, returning LTOModule/lto_module_t object
-
static LTOModule * LTOModule::createFrom* and lto_module_create_from* interfaces
-
Same steps as the isBitcode interface above, but if findBitcodeInMemBuffer returns a memory buffer:
-
Parse into a new Module object (or get a lazy bitcode parser Module) via BitcodeReader interfaces.
-
Create IRObjectFile object (derived from SymbolicFile), save Module here
-
Create LTOModule object, save IRObjectFile here
-
Return LTOModule object
The classes referred to above look like (only the relevant members shown):
include/llvm/LTO/LTOModule.h:
struct LTOModule {
…
std::unique_ptrobject::IRObjectFile IRFile;
…
};
include/llvm/Object/IRObjectFile.h:
class IRObjectFile : public SymbolicFile {
std::unique_ptr M;
…
};
include/llvm/Object/ObjectFile.h:
/// ObjectFile - This class is the base class for all object file types.
/// Concrete instances of this object are created by createObjectFile, which
/// figures out which type to create.
class ObjectFile : public SymbolicFile {
…
};
include/llvm/IR/Module.h:
class Module {
…
};
New Interfaces for ThinLTO
High-level API Descriptions
The following main interfaces (e.g. in libLTO) will be needed to support ThinLTO (detailed interface descriptions and data structures can be found further down):
This interface writes the ThinLTO function index/summary for a module to the intermediate file containing its bitcode (either bitcode only or native object wrapped), and is invoked by the compiler during intermediate object emission.
Inputs:
-
ThinLTO function index/summary
-
Output stream to write it into
This interface writes the combined function index/summary to a file. This is the combined index/summary created by the plugin step from all modules. The output file format will depend on the format of the intermediate files (bitcode only or native object wrapped).
Inputs:
-
ThinLTO function index/summary
-
Output file path
This interface checks if an file or memory buffer contains a ThinLTO function index/summary. File (possibly in memory buffer) can contain either bitcode only or native object wrapped bitcode.
Input:
- File or memory buffer to check
Output:
- Boolean
This interface will read and parse the ThinLTO function index/summary from an intermediate file for a module (either bitcode only or native object wrapped).
Inputs:
- File or memory buffer to check
Output:
- ThinLTO function index/summary for module (format discussed below)
This interface will read and parse the combined function index/summary file. The file format will depend on the format of the intermediate files (bitcode only or native object wrapped).
Inputs:
- Combined index/summary file path
Output:
- ThinLTO function index/summary object (format discussed below)
This interface will read and parse the combined function index/summary file, parsing and populating the summary information for a single given function. The file format will depend on the format of the intermediate files (bitcode only or native object wrapped).
Inputs:
-
Combined index/summary file path
-
Function name
Output:
- ThinLTO function index/summary object (format discussed below)
Some design considerations for the new interfaces:
-
Interfaces 1, 5 and 6 are used internally by the compiler.
-
Interfaces 2, 3 and 4 are used by the linker, and therefore are callable via libLTO and from the gold-plugin.
-
Just as the bitcode file format is transparently handled by the existing interfaces described above, the new ThinLTO interfaces should be similarly independent of the underlying format (bitcode only vs native object wrapped).
-
Interfaces 5 and 6 are similar, and may not both be needed eventually, depending on tradeoffs investigated when tuning the ThinLTO implementation.
-
The interfaces for reading and writing the function index/summary should look very similar regardless of whether we are doing the per-module versions (1, 4) or the combined index/summary file (2, 5, 6). This is both for consistency and to allow the implementation to share as much code as possible. This design goal also means the format of the index/summary in the module and in the combined file should be similar.
ThinLTO Function Index/Summary Data Structures
In order to save time and memory in the plugin step, we don’t want to parse the entire bitcode for a module during the plugin step (which only needs to read the function index/summary for merging into a combined index/summary). We also don’t need to parse the ThinLTO information out of the module’s IR when constructing the Module object during the backend compile step (the ThinLTO importing step will read the combined index file, not the module’s own ThinLTO index).
Therefore any data structure created to encapsulate ThinLTO function index and summary information should be independent of LTOModule and IRObjectFile, as those structures are created when we read/parse the module’s normal (non-ThinLTO) bitcode, and result in the creation of a Module object. This also simplifies the implementation of reading/writing interfaces that can be shared between bitcode only and native object wrapped bitcode formats, since the latter will represent the ThinLTO information in a separate section and in the object’s symbol table.
For a description of the bitcode and native-wrapped file formats, see the separate “ThinLTO File Format” RFC.
The proposed new data structures are outlined below. As mentioned earlier in the interface design considerations, they are totally independent of where the function index/summary sits in the intermediate file (whether in a bitcode block or a native wrapped section/symtab):
-
ThinLTOModulePathStringTable: String table to hold/own module path strings, which additionally holds the module ID assigned to each module during the plugin step. This can simply be a typedef StringMap<uint64_t>, since the StringMap makes a copy of and owns inserted strings.
-
ThinLTOFunctionInfo: Class to hold function’s bitcode index and summary info. Includes:
-
Module path StringRefs (module strings owned by ThinLTOModulePathStringTable)
-
Bitcode index of function in the module
-
ThinLTOFunctionSummary: Function summary information to aid in importing decisions (e.g. instruction count, profile count).
-
Transient information used while reading/writing function summary from file (specifically the function’s offset into encoded function summary section)
-
ThinLTOFunctionMap: Mapping from function name to corresponding ThinLTOFunctionInfo(s)
-
Implemented via StringMap class, which makes a copy of and owns inserted (function name) strings.
-
There may be more than one ThinLTOFunctionInfo for a given function name in the combined function index/summary map due to COMDATs. While the plugin step that creates the combined function map may decide to select one representative COMDAT instance, we don’t want the design to preclude holding multiple as it may be advantageous to import a particular COMDAT from different modules in different backend instances.
-
Therefore, use StringMap< std::vector >
-
ThinLTOFunctionSummaryIndex: Class to hold ThinLTOModulePathStringTable and ThinLTOFunctionMap and encapsulate methods for operating on them
-
Includes method for combining from another given ThinLTOFunctionSummaryIndex instance (used when creating combined index/summary map.
-
Used to hold both per-module and combined function index/summary.
-
Class ThinLTOObjectFile
-
Analogous to IRObjectFile, but holds ThinLTOFunctionSummaryIndex instead of Module
-
Derived from SymbolicFile
-
New libLTO class ThinLTO
The new classes referred to above will look like (only the relevant members shown):
include/llvm/LTO/LTOThinLTO.h:
struct ThinLTO {
…
std::unique_ptrobject::ThinLTOObjectFile ThinLTOFile;
…
};
include/llvm/Object/ThinLTOObjectFile.h:
class ThinLTOObjectFile : public SymbolicFile {
std::unique_ptr Index;
…
};
include/llvm/IR/ThinLTOInfo.h:
typedef std::vector ThinLTOFunctionInfoList;
typedef StringMap ThinLTOFunctionMap;
typedef StringMap<uint64_t> ThinLTOModulePathStringTable;
class ThinLTOFunctionSummaryIndex {
ThinLTOFunctionMap FunctionMap;
ThinLTOModulePathStringTable ModulePathStringTable;
…
};
class ThinLTOFunctionInfo {
StringRef ModulePath;
uint64_t BitcodeIndex;
ThinLTOFunctionSummary *FunctionSummary;
uint64_t FunctionSummarySecOffset; // Used during parsing
uint64_t FunctionSummarySecSize; // Used during parsing
…
};
class ThinLTOFunctionSummary {
// TBD (includes function size, hotness, …)
};
Detailed API Descriptions
With the above data structures in mind, this section shows the refined and more detailed interface specifications.
Note that the bitcode reader interfaces are not shown here, since this document focuses on the higher-level API and data structures. But this will use a new ThinLTOBitcodeReader class which will save and populate the newly created ThinLTOFunctionSummaryIndex object (similar to how the BitcodeReader class saves and populates a Module object). The ThinLTOBitcodeReader class contains methods for parsing the ThinLTO bitcode blocks in a bitcode-only file, as well as the bitcode-encoded ThinLTO summary and module string table sections in the native-wrapped case. This is also discussed in the separate “ThinLTO File Format” RFC.
API Details:
This interface writes the ThinLTO function index/summary for a module to the intermediate file containing its bitcode (either bitcode only or native object wrapped), and is invoked by the compiler during intermediate object emission (so no libLTO interfaces).
Inputs:
-
ThinLTO function index/summary object
-
Output stream to write it into
Interfaces:
-
static void WriteThinLTOBlock(const ThinLTOFunctionSummaryIndex *Index, BitstreamWriter &Stream)
-
void llvm::WriteThinLTOToStreamer(const ThinLTOFunctionSummaryIndex *Index, MCStreamer &MCS)
Notes:
- The former (taking a BitstreamWriter) is used when we are writing bitcode-only (invoked from WriteModule under the appropriate option) and the latter (taking MCStreamer) indicates we are writing wrapped to the given native object streamer.
This interface writes the combined function index/summary to a file. This is the combined index/summary created by the plugin step from all modules. The output file format will depend on the format of the intermediate files (bitcode only or native object wrapped).
Inputs:
-
ThinLTO function index/summary
-
Output file path
-
Output triple
Outputs:
- Error status (message or boolean)
Interfaces:
-
void ThinLTO::writeToFile(const char* path, const char* triple, std::string &errMsg) /* libLTO C++ */
-
lto_bool_t lto_thinlto_write_to_file(const char* path, const char* triple) /* libLTO C */
-
std::error_code ThinLTOObjectFile::writeToFile(const char* path, const char* triple)
Notes:
-
The triple is non-null in the case where we are writing native-wrapped bitcode, and is used to construct the appropriate MCStreamer object.
-
The gold plugin will interact directly with ThinLTOObjectFile instead of the libLTO ThinLTO class, as it does with IRObjectFile.
-
The libLTO interfaces will invoke the ThinLTOObjectFile method
-
Leverages the same underlying writers as #1
This interface checks if an file or memory buffer contains a ThinLTO function index/summary. File (possibly in memory buffer) can contain either bitcode only or native object wrapped bitcode.
Input:
- File or memory buffer to check
Output:
- Boolean
Interfaces:
-
static bool ThinLTO::hasThinLTOIndexInFile(const char path) / libLTO C++ */
-
static bool ThinLTO::hasThinLTOIndexInFile(const void mem, size_t length) / libLTO C++ */
-
lto_bool_t lto_has_thinlto_index(const char path) / libLTO C */
-
lto_bool_t lto_has_thinlto_index_in_memory(const void mem, size_t length) / libLTO C */
-
static bool ThinLTOObjectFile::hasThinLTOInMemBuffer(MemoryBufferRef Object)
Notes:
-
The libLTO routines invoke the ThinLTOObjectFile function that checks for ThinLTO function index/summary information using format-specific readers.
-
The gold plugin will interact directly with ThinLTOObjectFile instead of the libLTO ThinLTO class, as it does with IRObjectFile.
This interface will read and parse the ThinLTO function index/summary from an intermediate file (possibly in memory) for a module (either bitcode only or native object wrapped).
Inputs:
- File or memory buffer to check
Output:
-
ThinLTO function index/summary for module (format discussed below)
-
Error status (libLTO C interfaces set sLastErrorString)
Interfaces:
-
static ThinLTO * ThinLTO::createFromFile(const char path, std::string &errMsg) / libLTO C++ */
-
static ThinLTO * ThinLTO::createFromOpenFile(int fd, const char path, size_t size, std::string &errMsg) / libLTO C++ */
-
static ThinLTO * ThinLTO::createFromBuffer(const void *mem, const char path, size_t length, std::string &errMsg) / libLTO C++ */
-
thin_lto_t lto_thinlto_create(const char path) / libLTO C */
-
thin_lto_t lto_thinlto_create_from_fd(int fd, const char path, size_t size) / libLTO C */
-
thin_lto_t lto_thinlto_create_from_memory(const void *mem, const char path, size_t length) / libLTO C */
-
static ErrorOr<std::unique_ptr> object::ThinLTOObjectFile::create(MemoryBufferRef Object, bool ReadFuncSummaryData = true)
Notes:
-
The optional ReadFuncSummaryData boolean flag to ThinLTOObjectFile::create is used to support the interface discussed below in #6, but is true in this context.
-
The libLTO routines invoke the ThinLTOObjectFile function that reads the given function index/summary by invoking format-specific readers/parsers. The resulting ThinLTOObjectFile is saved in new ThinLTO object.
-
The gold plugin will interact directly with ThinLTOObjectFile instead of the libLTO ThinLTO class, as it does with IRObjectFile
This interface will read and parse the combined function index/summary file. The file format will depend on the format of the intermediate files (bitcode only or native object wrapped). Invoked from compiler during ThinLTO importing (so no libLTO interfaces).
Inputs:
-
Combined index/summary file in memory buffer
-
Boolean indicating whether to read function summary data (in addition to symbol table), true by default and in this context.
Output:
- ThinLTO function index/summary object (wrapped in ErrorOr)
Interface:
- static ErrorOr<std::unique_ptr> object::ThinLTOObjectFile::create(MemoryBufferRef Object, bool ReadFuncSummaryData = true)
Notes:
-
This is the same ThinLTOObjectFile interface shown above in #4.
-
The optional ReadFuncSummaryData boolean flag is used to support the interface discussed below in #6, but is true in this context.
This interface will read and parse the combined function index/summary file, parsing and populating the summary information for a single given function. The file format will depend on the format of the intermediate files (bitcode only or native object wrapped).
Inputs:
-
Combined index/summary file in memory buffer
-
Function name
-
ThinLTOObjectFile object (partially populated)
Output:
- Error status
Side Effect:
- The ThinLTOFunctionInfo entry for the given function is populated in the given ThinLTOObjectFile
Interfaces (2 steps):
-
static ErrorOr<std::unique_ptr> object::ThinLTOObjectFile::create(MemoryBufferRef Object, false /bool ReadFuncSummaryData = true/)
-
std::error_code ThinLTOObjectFile::findThinLTOFunctionInfoInMemBuffer(MemoryBufferRef Object, StringRef FunctionName)
Notes:
-
This usage model requires first reading the ThinLTO symbol table information using the first interface ThinLTOObjectFile::create (same interface as in #5/#6), but with ReadFuncSummaryData=false. In that case the resulting ThinLTOObjectFile object is not fully populated. Specifically, the ThinLTOFunctionInfo entries are not yet populated with the bitcode index and function summary information.
-
Subsequent invocations to read specific function summaries use the second interface.