What was the IR made for precisely?

Date: Tue, 1 Nov 2016 11:31:05 +0000
From: David Chisnall via llvm-dev <llvm-dev@lists.llvm.org>
Subject: Re: [llvm-dev] What was the IR made for precisely?

>
>> From: "Chris Lattner via llvm-dev" <llvm-dev@lists.llvm.org>
>> To: "David Chisnall" <David.Chisnall@cl.cam.ac.uk>
>> Cc: llvm-dev@lists.llvm.org, "ジョウェットジェームス"
<b3i4zz1gu1@docomo.ne.jp>
>> Sent: Friday, October 28, 2016 2:13:06 PM
>> Subject: Re: [llvm-dev] What was the IR made for precisely?
>>
>>
>>>
>>>>
>>>> I would need to sum up all the rules and ABIs and sizes for all
the
>>>> targets I need and generate different IR for each, am I correct?
>>>
>>> This is a long-known limitation of LLVM IR and there are a lot of
>>> proposals to fix it. It would be great if the LLVM Foundation
would
>>> fund someone to do the work, as it isn’t a sufficiently high
>>> priority for any of the large LLVM consumers and would make a huge
>>> difference to the utility of LLVM for a lot of people.
>> …
>>> I think it would be difficult to do it within the timescale of the
>>> GSoC unless the student was already an experienced LLVM developer.
>>> It would likely involve designing some good APIs (difficult!),
>>> refactoring a bunch of Clang code, and creating a new LLVM library.
>>> I’ve not seen a GSoC project on this scale succeed in any of the
>>> open source projects that I’ve been involved with. If we had a
good
>>> design doc and a couple of engaged mentors then it might stand a
>>> chance.
>>
>> Is there a specific design that you think would work? One of the
>> major problems with this sort of proposal is that you need the
entire
>> clang type system to do this, which means it depends on a huge chunk
>> of the Clang AST. At that point, this isn’t a small library that
>> clang uses, this is a library layered on top of Clang itself.
>
> Given that ABIs are defined in terms of C (and sometimes now C++)
language constructs, I think that something like this is the best of
all bad options. Really, however, it depends only on the AST and
CodeGen, and maybe those (along with 'Basic', etc.) could be made into
a separately-compilable library. Along with an easy ASTBuilder for C
types and function declarations we should be able to satisfy this use
case.

Indeed. Today, I can go and get the MIPS, ARM, x86-64, or whatever ABI
specification and it defines how all of the C types map to in-memory
types and where arguments go. We currently have no standard for how
any of this is represented in IR, and I have to look at what clang
generates if I want to generate C-compatible IR (and this is not stable
over time - the contract between clang and the x86 back end has changed
at least once that I remember). The minimum that you need to be able
to usefully interoperate with C is:

- The ability to map each of the C types (int, long, float, double) to
the corresponding LLVM type.

- The ability to generate an LLVM struct that corresponds to a
particular C struct (including loads and stores from struct members)

- The ability to construct functions that have a C API signature and
call functions that have such a signature.

We’ve discussed possible APIs for this in the Cambridge LLVM Socials a
couple of times. I think that the best proposal was along the
following lines:

A new CABIBuilder that handles constructing C ABI elements. This would
have the primitive C types as static fields and would allow you to
construct a C struct type by passing C types (primitives or other
structs, optionally with array sizes). From this it would construct an
LLVM struct and provide IRBuilder-like methods for constructing GEPs to
specific fields (and probably loads and stores to bitfields).

The same approach would be used for functions and calls. Once you’ve
built the CFunctionType from C structs and primitives for arguments,
you would have an analogue of IRBuilder’s CreateCall / CreateInvoke
that would take the IR types that correspond to the C types and marshal
them correctly.

On the other side of the call (constructing a C ABI function by passing
a set of C types to the builder), you’d get an LLVM Function that took
the arguments in whatever LLVM expects and then stores them into
Allocas, which would be returned to the callee, so the front-end author
would never need to look at the explicit parameters.

You’d need a small subset of Clang’s AST for this (none of the stuff
for builtins, nothing for C++ / Objective-C, and so on) and several of
the bits of CodeGen (in particular, CGTargetInfo contains a bunch of
stuff that really should be in LLVM, for example with respect to
variadics). It’s a big bit of refactoring work, and a lot of it would
probably need to end up duplicated in both clang and LLVM (though it
should be easy to automate the testing).

Another alternative is to expose these APIs via from Clang itself, so
if you need them then you will have to link clang’s Basic, AST and
CodeGen libraries (which is only about 10MB in a release build and
could be dynamically linked if they’re used by multiple things). This
approach would also make it easier to extend the interfaces to allow
header parsing and C++ interop (which would be nice for using things
like shared_ptr across FFI boundaries).

David

I'd prefer not to expose them via Clang, that would just make another
dependency for those of use generating our own IR directly to LLVM
without having Clang. Right now, we have a converter from our own
backend IR to LLVM IR so we can port our entire compiler suite of
BASIC, COBOL, Pascal, Fortran, C. We've had to peek at what Clang
generates to figure out the correct mapping for us.

Even with some CABIBuilder, I think we might have to go deeper for us.
For example, we aren't using IRBuilder at the moment since we end
up building the IR in pieces and stitching it all back together
at the end of the conversion process.