new LTO C interface

Hello. I work at Apple on our linker. We are working to improve support for llvm
in our tools. A while back Devang created <llvm/LinkTimeOptimizer.h> a C++
interface which allows the linker to process llvm bitcode files along with native
mach-o object files.

For the next step we’d like our other tools like nm, ar, and lipo to be able to
transparently process bitcode files too. But those tools are all written in C. So, I’ve
reworked the LTO interface as a C interfaces and broke out the steps so that it
could be used by other (non-linker) tools.

Below is the proposed interface. The project (llvm/tools/lto2 for now) will build
a shared object that can be used by other tools.

I’d be interested to know if any one else has use for this.

One area we know we need to augment the API is to allow various optimizations
to be selected on the linker command line.

-Nick

//===-- llvm-c/lto.h - LTO Public C Interface -------------------- C++ --===//
//
// The LLVM Compiler Infrastructure
//
// This file is distributed under the University of Illinois Open Source
// License. See LICENSE.TXT for details.
//
//===----------------------------------------------------------------------===//
//
// This header provides public interface to an abstract link time optimization
// library. LLVM provides an implementation of this interface for use with
// llvm bitcode files.
//
//===----------------------------------------------------------------------===//

#ifndef LTO__H
#define LTO__H

#include <stdint.h>
#include <stdbool.h>
#include <stddef.h>

typedef enum {
LTO_SYMBOL_ALIGNMENT_MASK = 0x0000001F, /* log2 of alignment */
LTO_SYMBOL_PERMISSIONS_MASK = 0x000000E0,
LTO_SYMBOL_PERMISSIONS_CODE = 0x000000A0,
LTO_SYMBOL_PERMISSIONS_DATA = 0x000000C0,
LTO_SYMBOL_PERMISSIONS_RODATA = 0x00000080,
LTO_SYMBOL_DEFINITION_MASK = 0x00000700,
LTO_SYMBOL_DEFINITION_REGULAR = 0x00000100,
LTO_SYMBOL_DEFINITION_TENTATIVE = 0x00000200,
LTO_SYMBOL_DEFINITION_WEAK = 0x00000300,
LTO_SYMBOL_DEFINITION_UNDEFINED = 0x00000400,
LTO_SYMBOL_SCOPE_MASK = 0x00001800,
LTO_SYMBOL_SCOPE_INTERNAL = 0x00000800,
LTO_SYMBOL_SCOPE_HIDDEN = 0x00001000,
LTO_SYMBOL_SCOPE_DEFAULT = 0x00001800
} lto_symbol_attributes;

typedef enum {
LTO_DEBUG_MODEL_NONE = 0,
LTO_DEBUG_MODEL_DWARF = 1
} lto_debug_model;

typedef enum {
LTO_CODEGEN_PIC_MODEL_STATIC = 0,
LTO_CODEGEN_PIC_MODEL_DYNAMIC = 1,
LTO_CODEGEN_PIC_MODEL_DYNAMIC_NO_PIC = 2
} lto_codegen_model;

// opaque reference to a loaded object module
typedef struct LTOModule* lto_module_t;

// opaque reference to a code generator
typedef struct LTOCodeGenerator* lto_code_gen_t;

#ifdef __cplusplus
extern “C” {
#endif

//
// returns a printable string
//
extern const char*
lto_get_version();

//
// returns the last error string or NULL if last operation was sucessful
//
extern const char*
lto_get_error_message();

//
// validates if a file is a loadable object file
//
extern bool
lto_module_is_object_file(const char* path);

//
// validates if a file is a loadable object file compilable for requested target
//
extern bool
lto_module_is_object_file_for_target(const char* path,
const char* target_triplet_prefix);

//
// validates if a buffer is a loadable object file
//
extern bool
lto_module_is_object_file_in_memory(const uint8_t* mem, size_t length);

//
// validates if a buffer is a loadable object file compilable for requested target
//
extern bool
lto_module_is_object_file_in_memory_for_target(const uint8_t* mem, size_t length,
const char* target_triplet_prefix);

//
// loads an object file from disk
// returns NULL on error (check lto_get_error_message() for details)
//
extern lto_module_t
lto_module_create(const char* path);

//
// loads an object file from memory
// returns NULL on error (check lto_get_error_message() for details)
//
extern lto_module_t
lto_module_create_from_memory(const uint8_t* mem, size_t length);

//
// frees all memory for a module
// upon return the lto_module_t is no longer valid
//
extern void
lto_module_release(lto_module_t mod);

//
// returns triplet string which the object module was compiled under
//
extern const char*
lto_module_get_target_triplet(lto_module_t mod);

//
// returns the number of symbols in the object module
//
extern uint32_t
lto_module_get_num_symbols(lto_module_t mod);

//
// returns the name of the ith symbol in the object module
//
extern const char*
lto_module_get_symbol_name(lto_module_t mod, uint32_t index);

//
// returns the attributes of the ith symbol in the object module
//
extern lto_symbol_attributes
lto_module_get_symbol_attribute(lto_module_t mod, uint32_t index);

//
// instantiates a code generator
// returns NULL if there is an error
//
extern lto_code_gen_t
lto_codegen_create();

//
// frees all memory for a code generator
// upon return the lto_code_gen_t is no longer valid
//
extern void
lto_codegen_release(lto_code_gen_t);

//
// add an object module to the set of modules for which code will be generated
// returns true on error (check lto_get_error_message() for details)
//
extern bool
lto_codegen_add_module(lto_code_gen_t cg, lto_module_t mod);

//
// sets what if any format of debug info should be generated
// returns true on error (check lto_get_error_message() for details)
//
extern bool
lto_codegen_set_debug_model(lto_code_gen_t cg, lto_debug_model);

//
// sets what code model to generated
// returns true on error (check lto_get_error_message() for details)
//
extern bool
lto_codegen_set_pic_model(lto_code_gen_t cg, lto_codegen_model);

//
// adds to a list of all global symbols that must exist in the final
// generated code. If a function is not listed there, it might be
// inlined into every usage and optimized away.
//
extern void
lto_codegen_add_must_preserve_symbol(lto_code_gen_t cg, const char* symbol);

//
// writes a new file at the specified path that contains the
// merged contents of all modules added so far.
// returns true on error (check lto_get_error_message() for details)
//
extern bool
lto_codegen_write_merged_modules(lto_code_gen_t cg, const char* path);

//
// generates code for all added modules into one object file
// On sucess returns a pointer to a generated mach-o buffer and
// length set to the buffer size. Client must free() the buffer
// when done.
// On failure, returns NULL (check lto_get_error_message() for details)
//
extern const uint8_t*
lto_codegen_compile(lto_code_gen_t cg, size_t* length);

#ifdef __cplusplus
}
#endif

#endif

I would probably just use the C++ interface version, but I could certainly see a C interface like that being very useful for binding purposes.

Hi Nick,

I don’t have any comments on the substance of the APIs (I’m not expert in this area), just some style notes. Overall, the capitalization style is inconsistent with the bulk of the C bindings, which are more Carbon than GNU.

#include <stdint.h>

#include <stdbool.h>

Note that MSVC++ still doesn’t support C99, and is a target for LLVM. I’d suggest bool → int and uint8_t* → void* to resolve the dependency.

#include <stddef.h>

extern const char*
lto_get_error_message();

I’ve tried not to create thread-unsafe designs in the rest of the bindings. I return a malloced error message by optional output parameters and provide a generic dispose function (LLVMDisposeMessage). Copying CFError’s design might be smarter still.

extern bool
lto_module_is_object_file_in_memory(const uint8_t* mem, size_t length);

extern bool
lto_module_is_object_file_in_memory_for_target(const uint8_t* mem, size_t length,
const char* target_triplet_prefix);

extern lto_module_t
lto_module_create_from_memory(const uint8_t* mem, size_t length);

Why not void*? Saves casting.

//
// generates code for all added modules into one object file
// On sucess returns a pointer to a generated mach-o buffer and
// length set to the buffer size. Client must free() the buffer
// when done.
// On failure, returns NULL (check lto_get_error_message() for details)
//

extern const uint8_t*
lto_codegen_compile(lto_code_gen_t cg, size_t* length);

The return value should be non-const. free takes a void*, not a const void*.

Windows people like to play hideous macro tricks with malloc, so I’d provide a corresponding dispose method, keeping the API self-contained.

extern const char*

lto_module_get_target_triplet(lto_module_t mod);

LLVM nomenclature is triple, not triplet.

extern uint32_t
lto_module_get_num_symbols(lto_module_t mod);

extern const char*
lto_module_get_symbol_name(lto_module_t mod, uint32_t index);

extern lto_symbol_attributes
lto_module_get_symbol_attribute(lto_module_t mod, uint32_t index);

Why uint32_t instead of size_t?

//
// frees all memory for a code generator
// upon return the lto_code_gen_t is no longer valid
//
extern void
lto_codegen_release(lto_code_gen_t);

Existing bindings use the term dispose to avoid any possible retain/release confusion.

— Gordon

More stylistic nitpicking for consistency sakes…

  1. LTO → LLVM_C_LTO
  2. Do we need those #include’s?
  3. Rather than using underscore in function names, e.g. lt_foo_bar, use capital letters and also start with prefix LLVM. e.g. LLVMLTOFooBar.
  4. lto_codegen_release → lto_codegen_release_memory to be clearer and more consistent.
  5. Use C comments /* … */?
  6. Please start comments with capital letters and end sentences with periods! Or perhaps not since it’ll drive Chris nuts. :slight_smile:

Evan

I’ll work on another round with the feedback so far.

I'll work on another round with the feedback so far.

1. __LTO__ -> LLVM_C_LTO

This one is actually interesting. The names started out with llvm in them, but then we realized that this was a generic interface from the linker to some foreign object file format. So, some other compiler could create an LTO shared object that exports the same lto_ interface and it would just work with the linker. Therefore, although this implementation is llvm specific, the interface is not.

Ok, that makes sense. Ignore my comments about naming convention, etc. then. However, does this mean the interface should not be part of llvm if it's truly compiler neutral?

Evan

The goal is for it to be compiler neutral, but since LLVM is the only compiler that supports it, it is mostly a theoretical exercise. When the second compiler comes along that wants to support LTO, we can figure out where it should live. Until then, living in the LLVM repo makes sense to me,

-Chris

I’ve updated the header (enclosed).

lto.h (6.44 KB)

Hi Nick,

My turn. :slight_smile: Just one question. Will the comments you have here be doxygen-friendly? (N.B., I’m not an expert on doxygen.) One thing I notice in other code is that the function or variable names are repeated in the comments. There might also be a special C-style comment quoting for doxygen. Something like this:

/**

  • lto_module_is_object_file_in_memory - Checks if a buffer is a loadable object file.
    /
    extern bool
    lto_module_is_object_file_in_memory(const void
    mem, size_t length);

and so on.

-bw

Thanks for catching that. I’ve added the extra asterisk to the start of each comment.

-Nick