[RFC] TargetInfo Library

Problem statement

There are a lot of frontends sitting on top of LLVM that have to
understand the ABI of their targets. They either independently develop
the code, partially copy it from Clang, try to follow the standards,
i.e, AAPCS and psABI, or try a mix of the options. IIRC, there were
even cases of inconsistencies between C-like compilers.

What are the subtle differences of AArch64 on Linux, Darwin, and
Windows and ARM64EC? Does ARM32 change the differences?

Does your toy language support bitfields?

Are you an expert for the differences between ARM and RISC-V scalable vectors?

Proposal

We could a new library to llvm/frontend, e.g., targetinfo with its own
type system. It already contains libraries for OpenMP and
OpenACC that are shared between Flang, Clang, and future MLIR-Clang.

class TargetInfo {
public:
  virtual bool isCharSigned() = 0;

  virtual size_t getSizeOf(Type *) = 0;
  virtual size_t getAlignmentOf(Type *) = 0;
 
  virtual bool isSupported(Type *) = 0;


  // virtual XXX queryLockFreeAtomicCapabilities(...) = 0;

  virtual CallWithLayoutAndCode getCall(FunctionType *signature, std::span<Type *> arguments) = 0;
};

extern llvm::Expected<std::unique_ptr<TargetInfo>>
                      getTargetInfo(llvm::Triple triple,
                                    std::span<std::string> cpuFeatureFlags);

Odd options:

  • -malign-double, -mno-align-double
  • -m96bit-long-double, -m128bit-long-double

Type System

TypeBuilder with builtins, pointers, enums, unions, bitfields, structs, functions, arrays, vectors, scalable vectors, …

It would be beneficial for at least MLIR-Clang, Flang, Rust, Swift, my
new toy language, and out-of-tree targets.

C API

For rust compiler, …

Recent PRs

Wrong cast of u16 to usize on aarch64

Initial support of lowering derived type passed by value

Make 128 bit integers be aligned to 8 bytes.

Add comdat to globals when translating to LLVM IR on Windows

Define MVT for v3i8

The hard part of C ABI is lowering calls. You need to come up with some sort of common representation for all the different aspects C type system/struct declarations/etc. that are relevant to the calling convention, then provide some sort of lowering. That is a lot of work, and nobody has stepped up to do it.

A common library just to compute the size of C integer/float types isn’t really that useful. I mean, it’s a bit of a pain to replicate clang’s TargetInfo.h if you want to try to support every single target that clang supports, but in practice nobody has considered it a significant obstacle, as far as I know.

2 Likes

The integers are more like a teaser. For the moment the main user will maybe Flang. The development would be lazy and focussed on prominent targets.

If you look at Rust issue, there was a bug of how to fit an u16 into an usize parameter. They had to inspect assembler to find the root cause. The bug was from 2016. signed, zero, or non extension?. The solution is probably in a IRBuilder call in Clang.

How about:

virtual CallWithLayoutAndCode getCall(functionSignature, arguments) = 0;

Then the expansion from u16 to usize is in a C++ data structure instead of an IRBuilder method invocation.

I think it would make sense to review a complete RFC with ABI lowering abstraction design unified over multiple platforms, etc. Not just simple “teaser” wrapper over bunch of integers. The latter do matter, certainly, however, they are maybe 0.5% of all the complexities of ABI-aware call lowering.

I don’t mean to dissuade you from doing this work–after all, getting a proper C ABI layer into LLVM is one of the longest-standing major feature request–but I have to echo previous concerns here: this probably isn’t worth tackling until you have tackled lowering call support. The main reason for this is the API you’ve focused on in your initial post focuses on size/alignment of types, and given that you’ll probably need to have some representation of said types for generating function calls correctly, it probably makes sense to make size/alignment queries be with respect to those representations rather than as you have indicated in your API.

(A side comment is that many custom languages that use LLVM as a backend are likely to prefer to access it via the LLVM-C API, which suggests that whatever final API you come up with should be amenable to exposure via LLVM-C rather than relying heavily on C++ features.)

We basically agree. I showed a getCall method above. It will maybe help you to get details to lower a function invocation.

E.g. the Rust compiler uses the LLVM-C API. Based on need, there will be a C-API to query the TargetInfo library/database.

TargetInfo will need its own type system with builtins, pointers, enums, unions, bitfields, structs, functions, …
You need the type system to describe the function signature and the types of the argument list for getCall. You can query the size and alignment of types to not pollute the API of the TargetInfo library.

I agree this would be very helpful for frontends! But as others have already said, this is not trivial with call lowering (including function signatures) being the hardest part.

With the frontend I’m working on, I’ve found:

  • I can trivially know things like “size of int”, “size of long”, “signedness of char”, etc simply by using libclang. It reports the size and signedness of C integers so I can use that.
    Also remember that “short”, “int”, and “long” are C types, not universal types. Java has similarly named types for example but they have a different meaning.
  • What is difficult is details like this: Compiler Explorer. Here, a struct with two fields is returned in a single register, which is done in IR by using a single i64.

Basically, a frontend will need to do these things given a particular signature:

  • Make a function signature type in LLVM IR (for function pointers, declaring functions, etc).
  • Making a call.
  • Extracting the normal LLVM values from the parameters in a function and then encoding them at the end for return.

For example, for a call that would mean packing the arguments to the function (with bitshifts, allocas, or whatever), doing the call itself, and then extracting the return value from this call. For example, it would need to convert the i64 back into a {i32, i32} struct in the example above.
I’m not sure what such an API would look like. Either it could directly give you a FunctionType and do things like build the entire call given an IRBuilder, or it could be decoupled from the IR and only give you some sort of high-level instructions how to convert a high-level type into a LLVM type and back. Or maybe some other way I haven’t thought about.

I updated the original RFC based on feedback.

call.runOn(irBuilder);
  • Flang and MLIR-Clang maybe want to do call lowering on MLIR instead of LLVM-IR.
  • This API is hard to expose to C.
call.getProlog()
for (auto param: call.getParams()) {
 // do something
}
call.getEpilog();
  • We invent our own instructions.
  • It will be easier to expose to C.
  • It will work with LLVM-IR, MLIR, and …

I’m sorry, but I still do not see a proposal here. Maybe you’d take X86-64 psABI and use this to drive a design? And show the final result then?

I have been playing a bit with the API for a TargetInfo library:

I ended up with Actions based on aapcs.

Can please you explain how the handling of different FP ABIs will look like? Soft vs SoftFP vs Hard? How the homogenous aggregates will be done?

You will find calls to isHomogenous in getCall:

I guess Soft vs Hard FP will be command line parameters passed to getTarget and then it is the targets job: