[RFC] An ABI lowering library for LLVM

A key premise of LLVM is that it provides an abstraction over different targets. A frontend just needs to worry about generating LLVM IR, and then LLVM takes care of everything else, generating highly efficient code for a wide variety of targets.

The main area where this premise currently breaks down is target-specific ABI differences, especially concerning the call ABI. Part of the call ABI is handled by LLVM, but a large part of it is a frontend responsibility. There are many different ABIs, with many complex and subtle rules. Failing to implement them correctly will result in miscompilations.

Every LLVM-based frontend that wants to expose a C FFI interface currently has to manually implement these ABI rules. Clang implements these in the CodeGen/Targets.

We regularly get questions on how to support C FFI on Discord, and for some reason nobody is ever happy about “you have to re-implement these ten thousand lines of Clang code” being the answer. :slight_smile:

Why doesn’t LLVM “just” handle this?

LLVM handles part of the call ABI lowering, such as the assignment of scalar arguments/returns to registers. Why can’t LLVM simply do the right thing for all argument types?

The primary reason for this is that the LLVM type system is not expressive enough to make all ABI decisions. Here are a few examples:

  • LLVM does not have a representation for unions at all.
  • LLVM does not have type-level alignment annotations. For example, two otherwise identical structs, but one with an explicit alignment attribute that matches its default alignment, do not have the same ABI. In fact, alignment has to be computed in at least three different ways to satisfy various ABI rules.
  • Do you think __int128 and _BitInt(128) have the same ABI? Think again.

Why don’t we extend LLVM’s type system to handle these cases? LLVM generally tries to omit any type information that is not semantically relevant. LLVM will represent both int and unsigned as an i32. This kind of inherent canonicalization is important for optimization purposes.

While directly extending the type system would be a bad idea, it would in principle be possible to convey additional information using attributes/metadata at call-sites and declarations only, similar to how we have the zeroext and signext attributes to distinguish unsigned/signed integers for ABI purposes.

Doing this in full generality, providing all the information necessary for struct and union passing, would be significantly harder though, and essentially introduce a second shadow type system into LLVM IR.

An additional consideration here is that it is beneficial for optimization purposes if cases where large structures need to be passed in memory, are explicitly represented as such in LLVM IR. For example, this allows optimizing away redundant copies of the memory, etc.

LLVM’s IR design is generally very hostile towards working with aggregate SSA values, and our historical trend is to reduce reliance on struct types in IR. Making these first-class citizens would be a substantial shift in optimization philosophy, which goes far beyond ABI questions.

Proposal

The proposal is to introduce an LLVM ABI lowering library (LLVMABI), which provides information to frontends on how to correctly produce LLVM IR for a specific target. The initial focus of the library would be on call ABI lowering, as this is the hardest part. It can be extended to handle other ABI aspects as well.

The high level sketch of how this would look like is:

  • The library will have its own type system, which is independent of both the LLVM IR type system and the Clang type system. The types will support encode exactly as much information as is necessary for correct ABI lowering.
  • The library will provide per-target implementations of ABIInfo, extracting what Clang currently does.
  • The main result of ABI classification will be something like ABIArgInfo, which specifies whether the argument is passed directly, indirectly, etc.
  • The frontend is then responsible for generating LLVM IR based on the ABIArgInfo.

Some notes:

  • Clang will be switched to use the new ABI lowering library. I think it’s very important that Clang makes use of it, not just 3rd-party frontends, otherwise we’ll certainly get divergences.
  • Yes, this does mean that Clang will have to lower to an additional type system. My hope is that this will not add a lot of additional overhead if it is cached, but that’s one of the things that remains to be seen.
  • While the motivation here is purely about the C ABI, I think it will be unavoidable to also support the parts of the C++ ABI that relate to the calling convention. The C++ ABI is a minor modification of the C ABI, and I don’t think they can be usefully separated.
  • Layering: I think this library could be implemented completely independently of IR, depending only on Support. But having it depend on IR would probably make it more useful, as we could also provide LLVM IR types where relevant.

Implementation

I’d like to offer creating a prototype for this as a GSoC project. At this point, I’m mainly looking for some feedback on the general direction, and any insights people familiar with ABI lowering may have. The details of the design will have to be ironed out later.

There is an old llvm-abi project, also discussed here, which started implementing this concept out-of-tree. It’s many years out of date now and only support x86, but may serve as inspiration. I think it’s important that the ABI lowering library is in-tree and used by Clang to guarantee continued maintenance.

An alternative approach to the call ABI problem has recently been explored in Ideas about C calling convention lowering to LLVM IR. The approach there is to have the frontend attach additional ABI classification metadata, that allows LLVM to perform the ABI lowering.

25 Likes

I can’t authoritatively comment on limitations of either approach (nor the finer points of ABI/conv), as someone that has spent an unreasonable amount of time fiddling with C++ FFI, I would be extremely (extremely) interested in supporting this work in any way possible.

2 Likes

The primary issue here is that you need to replicate the entire C/C++ type system in order to compute the correct lowering. This includes the types themselves, computing the size/alignment of types, and structure layout. And that’s a lot of work. (We might be able to exclude some parts of the C++ type system related to classes which are always passed indirectly, but we need most of it. There are some weird edge cases, like alignment attributes on typedefs.) Maybe we should use the clang AST library to represent the type system, instead of reimplementing it from scratch?

Does this library need to expose a C API, so it can be used by compilers not written in C++?

Right, duplicating the C/C++ type system is the main cost of this proposal. My hope is that the type system we need for ABI lowering can be substantially simpler than what is implemented in clang/AST/Type. Partially because we’re dealing with resolved types at that point (don’t need all the template handling etc) and partially because many things don’t affect the call ABI (like qualifiers).

This would mean that frontends which want to make use of this have to add a libclang dependency, which is probably a no-go for anything that only wants to offer C FFI (rather than something that requires ability to parse C++ headers).

Yes. It’s not an immediate priority, but longer term this should have a C API, just like anything that is needed for frontend IR generation.

Re type system duplication: I’d consider using a traits class to wrap existing type representations to reduce the runtime costs. The design should allow for covering Clang’s representation as well as a more simple design, which could be provided by the library (and exposed as C API). Clang is already rather slow, so if a type conversion turns out to have a somewhat substantial cost, I’d rather like to avoid that.

Re general usefulness/LLVM integration: I think there’s use in such a library beyond the scope of producing LLVM-IR signatures, in fact, primarily the computation of actual registers and stack offsets is missing. I think it would make sense to include this – and possibly in the long term even to move this out of LLVM. One limitation of LLVM is the lack of flexibility w.r.t. custom calling conventions, so this would be a good opportunity to improve this. (The back-end doesn’t do much with the information anyway, it is only used during instruction selection and discarded afterward.)

I’d hope that the layout parts of the C++ ABI can be kept separate, and clang-only. Maybe that’s asking too much, but my only interactions with the layout part came when the C++ committee decided to change the rules. That feels like it would have nothing at all to do with call ABI.

  • The library will have its own type system, which is independent of
    both the LLVM IR type system and the Clang type system. The types
    will support encode exactly as much information as is necessary
    for correct ABI lowering.

I’ll admit that I’m not a full connoisseur of calling convention ABIs,
but I wonder how much a type system we actually need for ABI purposes.
Like, we still need an enum to distinguish between int, unsigned, float,
etc., sure, but do we need to actually represent a struct as a full
struct, or can we get away with a function that takes in a sufficient
description of a struct and spits out a “acts as i64/v128/v256/etc. for
calling ABI purposes”? I worry a lot about how much effort it would be
to convert a struct type to the ABI struct type, and wonder if we can
sufficiently describe the ABI without having to actually make a full
ABIStructType class.

In C++ at least struct definition informs the cconv:

If a C++ object has either a non-trivial copy constructor or a non-trivial
destructor 11, it is passed by invisible reference (the object is replaced in the
parameter list by a pointer that has class POINTER) 12

https://software.intel.com/sites/default/files/article/402129/mpx-linux64-abi.pdf

So it’s not that simple.

Thanks for proposing this, it sounds promising. I wouldn’t worry about the cost of the representation shift as long as the function prototype representations are short-lived and not stored in some kind of global, lives-forever context like LLVMContext of ASTContext.

If we can omit as much detail as possible from the ABI type system (think opaque pointers and no C++ direct/virtual base class details), it should be compact, easy to produce, and easy to analyze. Recent ABI bugs involving std::tuple could be avoided if the new representation has a uniform subobject abstraction, instead of having to remember to explicitly iterate over C++ base classes in addition to regular C struct fields.

One challenge worth taking seriously up front is inalloca. Going forward, it’s highly likely that we will never see another calling convention that requires this kind of deconstructed call sequence, where the address of future argument slots is made available to user code. However, exposing these memory locations to a mid level optimizer could unlock some valuable performance opportunities, which we touched on in @jyknight’s RFC.

A lot of targets have special rules for “homogeneous aggregate” values. x86-64 does something even more complicated where a struct can be passed in mix of integer and fp registers, depending on the field types. Both of these support C++ base classes. Structs also have two different values for alignment, computed using different algorithms. In some cases, it’s even relevant whether a class has a copy assignment operator.

And we never know if someone is going to come along with a new target that depends on some new property we never thought about.

I wondering if an ABI library is necessary. The contract between the frontend and the backend is essentially broken and non-existent. What has happened is that each target has come up with their own solution and they have distributed the responsibility of the lowering differently.

Most notable example of this is system V ABI for x86-64 (typically Linux), where the frontend gets the responsibility of almost everything and that is:

  1. split up aggregates in registers manually.
  2. track remaining registers for the ABI until “byval” must be used.
  3. decompose aggregates into simpler structures, for example floats gets sucked into an integer.

The one to one mapping between language parameters and the IR parameters is basically gone.

Now compare this with ARM and you’ll see that they have made it much simpler and the abstraction is better. For values up to certain size you use an array with a number of integers. You don’t need to track when the registers run out because the backend does this for you. I thank the people who implemented the ARM version for this as they understood how to make the life a little bit simpler. Microsoft x86-64 is also not that bad.

With this in mind, I think it is necessary to setup rules how the frontend should present aggregates to the backend so that it can lower it properly. It was that each target made up its own solution that is the source of the problem we want to solve.

Why not implement this as an MLIR Dialect?

  • It seems natural to have a C-Call form in MLIR that correctly implements the calling convention for each platform.
  • MLIR gives you the ability to create a custom type system.
  • The users (Other compiler writers) just have to annotate arguments with a c-type that correctly describes the values they are passing and the lowering pass generates the correct code in the LLVM dialect and saves the compiler writers from reimplementing the lower.

This seems a lot more convenient. Note your LLVM ABI Library would still probably be a prerequisite to building such a dialect and could be reused by others for LLVM IR just as you described. Mostly just suggesting what I see as the logical conclusion of your idea. Which I guess is a C MLIR Dialect.

Interesting idea. I don’t think we can use traits in the sense LLVM usually uses that word, because those require a templated, header-only implementation, and I don’t think that’s feasible here. We could use virtual inheritance though, with some caveats (for example, to make the memory management work out, we’d want sub-type iteration to be callback-based rather than iterator-based).

My general inclination here is to try the separate type system first, and fall back to this variant if it turns out to add significant cost.

The function prototype itself would be short-lived, but my assumption was that the argument/return types would use the “usual” uniqued type system representation, with a cache on the translation from Clang to ABI types. If we make the types ephemeral, we wouldn’t be able to cache the translation, which may be problematic for complex record types.

@LLamas The differences in the lowering of the x86-64 ABI and the ARM ABIs reflects the differences in the ABIs. The ARM ABI lowering is simpler because the ABI is simpler. The baseline struct ABI just reinterprets the struct in GPR registers. But even then, you still have to take various special cases into account, such as homogeneous aggregates and alignment adjusted types.

The x86-64 SysV ABI is more complex than that, mainly because it allows passing aggregates split across GPR and FPR registers. This classification has to occur in the frontend (for example because it applies to unions as well, which are not modeled in LLVM IR).

I think the register counting part doesn’t have to be handled in the frontend, but doing it improves optimization quality because byval arguments are exposed at the IR level and can be optimized accordingly. I’m not entirely sure on this point, it’s possible that it is also required for other reasons.

In any case, the details here don’t matter all that much, the bottom line is that each ABI tends to have its own special rules that require information that is not represented in LLVM IR, and no matter how you turn this, this is not a problem you can fix by just cleaning up the ABI lowering contract – at least not without substantially increasing the scope of LLVM’s type system.

This is a great idea @nikic, thanks for driving this. In ClangIR we have a target lowering library that postpones ABI lowering from codegen, and we’re also using the same information you propose to provide, it would be great if we can also be a consumer of this library in the future!

1 Like

This should be possible. I think the main constraint this adds is that the result produced by the library should be entirely independent of LLVM IR types (which Clang’s ABIArgInfo currently uses to represent some cases), as in the ClangIR case you wouldn’t be lowering to LLVM IR directly.