Questions about C Calling conventions

I am writing a compiler that emits LLVM ir and wishes to use the same calling conventions as C to be compatible with C programs.

Part of the problem is handled by the default C calling convention present in LLVM ir, but part of it is handled by clang.

if i run clang++ -S -emit-llvm -o on the following source code on x86_64

struct X{
  int x; 
  int y; 
  int j; 
  int k;
}; 

struct X f() { 
  struct X x; 
  return x;
}

the emitted ir is

%struct.X = type { i32, i32, i32, i32 }

; Function Attrs: noinline nounwind optnone uwtable
define dso_local { i64, i64 } @f() #0 {
  %1 = alloca %struct.X, align 4
  %2 = bitcast %struct.X* %1 to { i64, i64 }*
  %3 = load { i64, i64 }, { i64, i64 }* %2, align 4
  ret { i64, i64 } %3
}

while if i add a extra field

struct X{
  int x; 
  int y; 
  int j; 
  int k;
  int m;
}; 

struct X f() { 
  struct X x; 
  return x;
}

the resulting ir is

%struct.X = type { i32, i32, i32, i32, i32 }

; Function Attrs: noinline nounwind optnone uwtable
define dso_local void @f(%struct.X* noalias sret(%struct.X) align 4 %0) #0 {
  ret void
}

The struct gained a extra field and because of the parameter passing rules it got promoted to be a pointer passed by the caller.

I understand that at least some part of this process is driven by this file https://github.com/llvm/llvm-project/blob/main/clang/include/clang/CodeGen/CodeGenABITypes.h that converts a type into a llvm::type that respects the calling conventions, but i have not found a comprehensive guide explaining all design decision regarding the IR calling conventions.

  • How do i know which rules related to parameter passing and calling conventions in general are correctly handled by the LLVM ir and which are not?
  • Why has it been implemented this way instead of letting the IR handle all this intricacies in the backend?
  • Do i have a alternative that does not involve relying on the implementation of clang, nor re-implements the whole logic for each target, to figure out how my functions should be lowered to LLVM IR?
1 Like

The calling conventions for the different targets are implemented in https://github.com/llvm/llvm-project/tree/main/clang/lib/CodeGen/Targets. To the best of my knowledge, there is currently no way around essentially re-implementing this code for all targets you care about if you want to have a C FFI interface.

Why has it been implemented this way instead of letting the IR handle all this intricacies in the backend?

Calling conventions generally require more information than is available from LLVM IR types. This starts with basic things like LLVM integers not having signedness (which is why we need the zeroext/signext ABI attributes) to outright arcane rules (e.g. the struct passing ABI may change based on the presence of an alignment attribute, even if it does not actually change alignment).

There have been discussions in the past to either extract Clang’s ABI handling into a separate reusable component, but these haven’t gone anywhere (yet).

1 Like

Everything Nikita said is correct. I just wanted to add that this, C FFI compatibility for non-Clang frontends, is basically the biggest, longest outstanding, missing feature in LLVM. The 2014 talk, Skip the FFI: Embedding Clang for C interoperability, remains the best rundown of the issues that I’m aware of.

This is a hard problem that shouldn’t be taken lightly, but I think this is something that LLVM as a project has to prioritize if it wants to remain relevant as the default, conventional CPU backend of choice for emerging ahead-of-time-compiled languages.

1 Like

I see, thank you both. all around it does not seem like a trivial problem at all.

Is there a particular place or working group i should pay attention to keep me updated of development in this topic?

I’m facing the same problem. For the moment, I would even be willing to generate inefficient trampoline functions written in C if there were a portable way to pass even a single void* to C from LLVM. Is that possible?

If not, maybe creating the code to do something like that could be the seed from which a fully optimized LLVM C ABI compatibility layer grows? I rely on those more experienced with the codebase/community to assess the likelihood.

(makes note to watch Jordan and John’s talk all the way through tomorrow)

I’m facing the same problem. For the moment, I would even be willing to generate inefficient trampoline functions written in C if there were a portable way to pass even a single void* to C from LLVM. Is that possible?

At the moment what is working out fairly well for me is to turn every argument into a pointer to that argument and the return value is turned into pointer argument too. From my observations, it works correctly on x86-64 linux and x86-64 apple machines. It should work everywhere too.

For a more general solution, i was considering that what would be required would be a mechanism that allow the user to map a mlir function type X onto a clang function declaration ast Y and is able to keep track what sub type of X has been turned into what argument of Y.

From there a generic mechanism unrelated to the particular source mlir dialect would be able to lower the ast into a llvm function and figure out the signature, while keeping track of what argument got mapped in what register.

At that point the user would get back a llvm function type lifted back into llvm dialect, while knowing how each argument got lowered.

if implemented correctly there would be a path to refactor clang too, so that clang function declaration lowering works the same way, by mapping clang ast onto a intermediate layer that performs this mapping down to llvm ir. And by isolating this mechanism in a library of its own that does not depend in clang, but rather it is clang that depends on this library, then there is a way to have this functionality in any mlir dialect without linking against clang.

That would be easy for me because it’s the parameter passing model of the language we are implementing.

From my observations, it works correctly on x86-64 linux and x86-64 apple machines.

Sure, and our naîve translations of simple C APIs directly into LLVM that passes pointers and integers seems to work fine on those platforms, and ARM64 as well. But, so far I have zero reason to believe this is true:

and lots of what I’ve read seems to indicate otherwise. What makes you so sure of that?

Answering my own question, perhaps it’s this part of John and Jordan’s talk?

yes, that part of the talk is the reason why i went for that solution.

The basic diagnostic I would use is “is there a register for this type?” That is to say, I would expect the following types to lower as C ABI on all LLVM targets:

  • void (as a return type)
  • pointers
  • register-sized integer type
  • subregister sized power-of-2 integer type, IF you add signext/zeroext as appropriate
  • any floating-point or vector types the target hardware supports, e.g., if you’re compiling for x86 + SSE, <4 x float> should lower correctly but <8 x float> might not unless you enable AVX as well.

I would explicitly expect that any use of struct types would not do the right thing. i128 also definitely doesn’t match C ABI in same cases (e.g., windows x86-64 ABI). iN where N is not 8, 16, 32, or 64 in general is not likely to be correct. Variable argument functions might be safe to call if all of the parameters are in the above format. C’s complex types are “please just no” in ABI rules. I wouldn’t trust LLVM’s floating-point ABI in soft-float scenarios, and for types other than float, double (and x86_fp80 or ppc_fp128 for x86 and PPC, respectively) are also dubious.

@jcranmer Please excuse my overly-literal brain, but I’m having trouble with your use of the phrase “I would expect.” Do you mean to:

  • simply inform me about what you would like to be true
  • inform me about what is true in LLVM today
  • inform me about what is guaranteed to be true in LLVM now and in the future

or something else entirely?

Thanks

Register-sized operands being passed “correctly” is, as far as anything about LLVM codegen is guaranteed (I’m not aware of any documentation that specifically requires this to be the case), guaranteed to be true.

I did verify for all the targets I could find on Compiler Explorer that C int is passed as i32 or i32 signext in LLVM IR. I haven’t done the same level of verification for the other components, but I believe them to all be true statements of LLVM targets today.