Argument Lowering Redux

Hi,
Having just found an ABI conformance bug in a compiler front-end, I am curious: is the state of target-independent argument lowering in LLVM still the same as when this thread was taking place?: http://lists.cs.uiuc.edu/pipermail/llvmdev/2013-February/thread.html#59387

Vadim

I'm very interested in this myself. Basically every single frontend
has to plug in the same code to make this work, which seems to
indicate to me that we need a place to put it to share amongst the
frontends.

This comes up periodically. The consensus last time was that LLVM should have some sort of ABIBuilder class that would take C types (as most ABIs are defined in terms of C types) and allow building function definitions and calls from them. Inside the function, you'd get an alloca containing each C parameter. Unfortunately, no one has had the combination of time, inclination, and expertise to do it.

Each back end would then be responsible for maintaining its corresponding ABIBuilder, to ensure that it expected in the IR is what the ABIBuilder produces.

The other problem is that the current approach in LLVM IR complicates optimisations. It's bad enough that clang and the x86 back end both have to somehow know that an i64 return value is returned in two integer registers and so is the correct thing to use (on some platforms) for a struct of two i32s. It's even worse that an optimisation has to know that the i64 that a function is returning might actually be a structure.

It is probably the ugliest bit of LLVM IR currently, that we have front ends that all either consume C or know how to map their values to C types, and have back ends that must implement ABIs defined in specifications in terms of C types, but we insist on mapping this knowledge to something different in the middle.

David

I am not pretending to understand all corner cases in this, but as I was reading that old thread, a question popped up in my mind:
Couldn’t LLVM provide an early IR transform pass that lowers “high-level” argument definitions into the current target-dependent form, converting by-value structs into sret arguments as needed? It seems to me that, at least for structs, all information that such a pass would require, is representable in the current LLVM IR. Of course, under this proposal, unions would need to be re-introduced into IR in some form (perhaps as structs tagged with a “union” flag?). However, if they are immediately lowered into structs, the rest of optimization pipeline would not need to change.

Vadim

Huh, I didn't know that. So who treats "complex" differently than struct {
float re, im; }? Fortran?

Would any of the following be feasible then?:
- if this special case is widespread, add a special attribute for "complex"
to LLVM IR,
- ignore this issue, and have Fortran compilers do manual lowering of
"complex" arguments, but at least all "sane" languages could enjoy
automatic lowering.

Vadim

I am not pretending to understand all corner cases in this, but as I was
reading that old thread, a question popped up in my mind:
Couldn't LLVM provide an early IR transform pass that lowers
"high-level" argument definitions into the current target-dependent
form, converting by-value structs into sret arguments as needed? It
seems to me that, at least for structs, all information that such a pass
would require, is representable in the current LLVM IR.

That's not the case. For example, many ABIs specify a specific way to
pass and return values of the "complex" type, which at the LLVM level just
looks like a struct with two float or double fields.

Huh, I didn't know that. So who treats "complex" differently than struct
{ float re, im; }? Fortran?

Not just Fortran. For example, for X86_64 clang returns complex<float> as a
vector, but complex<double> as a structure. For X86 I think it returns them
using an extra pointer argument.

Not just Fortran. For example, for X86_64 clang returns complex<float> as a
vector, but complex<double> as a structure. For X86 I think it returns them
using an extra pointer argument.

Modern ABIs are pretty complex. Even more, they are usually worded in
high-level terms (C) which require special handling at frontend side.
Probably one would need to familiarize with X86-64, ARM and AArch64
ABIs to understand the whole problem (e.g. split register / stack
passing, homogeneous aggregates, etc.). Passing of 'complex' becomes
easy after this :wink:

This is the approach taken by WHIRL (which is well worth studying by anyone looking at how to design a compiler IR - it's not perfect, but does have some nice ideas). The front end for Pro64-derived compilers generates a very high-level representation, containing C types and C-like flow control structures (and some things like multiple entry points for Fortran). This is then progressively lowered towards something that looks more like assembly. Different optimisations are run at different layers.

There are a few problems with adopting this as-is for LLVM without some significant changes:

- The LLVM IR type system is not rich enough to express unions or the difference between _Complex float and struct {float i,r;} in C (for example).

- The IR modification infrastructure is not set up to make it easy to change the type of a value. Changing the type signature of a function requires creating a new function and then copying all of the instructions into it. This would be very expensive.

The LLVM model has a notion of canonical forms, which are fragile undocumented implicit contracts between producers and consumers of IR. These serve roughly the same purpose as the layers in WHRIL. This specifies (used in the loosest possible sense of the word) the IR that the back end expects to correspond to particular C types and, unfortunately, knowledge of this is very leaky and ends up having to permeate the entire optimisation stack.

David

It's not about languages, but about ABIs.

Jonas