Inspired by the Deegen talk at the LLVM dev meeting (blog post) where one of the issues mentioned was that clang doesn’t expose ghccc to source code, as well as past internal conversations about similar issues with protobuf table driven parsing, I’d like to expose ghccc to Clang via __attribute__((preserve_none)).
ghccc is a calling convention has no callee-saved registers and uses as many registers as possible to pass parameters. This allows chained musttail calls to not have to preserve registers. There’s no point in saving registers if the caller never uses them since we never return to the caller. Some interpreters/parsers, like mentioned above, use chained musttail calls to speedup hot interpreter/parser logic.
There are already preserve_all and preserve_most calling conventions exposed to Clang (meaning make all/most registers callee-saved), this follows that naming.
ghccc is more like the combination of preserve_none plus fastcc conventions. Instead of exposing ghccc as preserve_none, is it better to formally define and implement preserve_nonecc? @weiguozhi
no_caller_saved_registers is documented to not be a calling convention, because it is designed to compose with other calling conventions, as in the provided example:
__attribute__ ((no_caller_saved_registers, fastcall))
void f (int arg1, int arg2) {
...
}
I assume that must be preserved to maintain GCC compatibility.
Maybe what we want is an ABI-unstable (similar to fastcc) calling convention that preserves no CSRs (David’s suggested preserve_none_cc). If it’s ABI unstable, we can implement that today on x86 with ghccc, with the possibility of changing it to some other convention in the future.
If we’re not preserving any registers, then why not pass everything via registers? i.e. I don’t see the benefit of a separate preserve_none CC that doesn’t also use fastcc.
I’d be happy to mark this as unstable for people to first experiment with.
One thing to note: calling conventions (in the Clang type system) are packed into a bit-field that is critical to keep small for build-time performance reasons:
We don’t have many bits to spare and once we run out, that’s the end of adding new calling conventions until we figure out some way to recapture space or not degrade performance. @erichkeane did this dance a few years ago, so we may not have many bits left we can steal. Whoever the unlucky winner is to add the N+1 calling convention may have a significant amount of work involved.
Personally, I am opposed to adding new calling conventions to the type system without significant justification. Calling convention mismatches are a source of security bugs and one-off calling conventions don’t seem to justify the expense of implementing and maintaining them. I’m not saying I’m opposed to this proposal (I’ve not thought about it enough to have that strong of an opinion on it), but I’m wary of it. I’d appreciate more details justifying the need to expose this in Clang and just how much of a user community the calling convention is expected to have.
Ack Aaron’s concern on the limited number of bits – it does look like with the current implementation, it is a precious resource. Also agree on the concerns of potential issues on bugs etc.
On the other hand, these should not be big issues for introducing new CCs (even experimental ones):
There are 5 bits available, and we still have ~10 free slots to use
A 16 bit type of ExtInfo does not seem to bring too much benefit to (build time) performance than a 32 bit field. If that is the case, we may have a bigger problem else where.
We can use the next 16 bit for the CC if the 5th bit is one
Fortunately – there seem to be a few helper functions that need to be changed to deal with the escape.
Regarding bugs – shaking out bugs/wrong assumptions in the common code shared by all CCs is a good thing for the product.
As far as #1: We have to be sympathetic to our downstreams, many of which use these bits for their own uses, so any additions here are likely to break downstreams.
#2: Its not just performance, it is space. ExtInfo is stored frequently in the AST, so it needs to be as small as possible. We did an evaluation of this in the past, and expanding it any more in size causes us to lose template instantiation depth significantly, and in a way that most would be against.
I would be absolutely against increasing the size of ExtInfo in any way.
Given many of them are target-specific, one could heavily compress that list by using overlapping encodings for different targets (assuming we can throw away the attribute for an unsupported calling convention as we parse that attribute rather than later on). You could even still have the interface deal with the uncompressed thing and transparently (de)compress it on access to the structure itself (with an assertion on compressing that you’re not asking for an invalid combination).
I tried that at one point, and it is actually way harder than you’d think. We have the ability to use ‘aux target’ (and presumably some day multiple aux-targets), so you need to be able to discriminate between the target and aux-targets lists, which ends up not saving yourself.
Perhaps someone else can come up with a better mechanism than I could, but I tried exactly as you’re proposing and it ended up being worse size-wise in many cases, particularly mixed with any of the x86 targets.
Just a note from the user-perspective. We in Chrome are very much interested in having __attribute__((preserve_none)). We use conservative scanning for garbage collection and currently have assembly stubs that spill callee-saved registers to the stack (to make sure that no pointers escape).
With something like
__attribute__((preserve_none)) void ScanStack();
we would have the guarantee that the caller has saved all the context in the parent frame and we could get rid of the stubs.
I’d also like to add that CPython would get immediate utility from this addition as well. We are working on a Deegen-style copy-and-patch template JIT for the 3.13 release cycle, and currently have to jump through quite a few hoops in the prototype to make this work (compile first to LLVM IR, fix up the calling convention, then recompile).
Exposing ghccc (or a similar tail-call-and-register-pinning-friendly calling convention) to C code would remove time, complexity, and hacks from our JIT builds.
I looked into the ExtInfo question, and I noticed that we spend a whole ExtInfo bit on no_caller_saved_regs support. At first you would think this is exactly what we want, and we mentioned it further up the thread, but unfortunately, it does the exact opposite of what this use case needs: it causes functions to save and restore everything, including registers that are normally volatile or scratch according to the normal convention. This functionality was added to support interrupt handlers in 318a6eae06.
If we need to free up more space in ExtInfo, we should move some of these infrequently used bits over to FunctionTypeExtraBitfields. If we did that, then we could consider implementing a new composable attribute no_callee_saved_registers with the opposite semantics of no_caller_saved_registers. Or, just have more calling conventions.
Based on subsequent feedback on the thread, I think this calling convention probably will have enough usage to warrant its inclusion.
FWIW, I kind of prefer this design to preserve_none; I think no_callee_saved_registers is easier to reason about and I like the named pairing with no_caller_saved_registers.
Just to reiterate a point made earlier in this thread: turning off callee-saved registers is only half the utility here. One other huge benefit of ghccc is that it uses all of these newly-free registers for argument-passing, which makes for really effective register pinning across tail calls. So we (CPython) would be +0 on the flag and +1 on exposing the whole calling convention… which, as a bonus, seems to use fewer valuable bits.
Regarding attribute pairing, preserve_none can also be paired with preserve_all or preserve_most clang attributes that are already there. Usability wise, they are similar : Compiler Explorer
If composability with other calling conventions are really useful, no_callee_saved_register might be better.