Should there be an ABI sanitizer?

Here’s a crazy idea:

LLVM/Clang could have an ABI sanitizer which inserts instrumentation to check stack alignment on function entry, that callee-saved registers are preserved across function calls, etc.

This would help catch issues when interoperating with hand-written assembly, JIT’ed code, etc.

Do folks think such a tool would be generally useful?

2 Likes

That is an interesting idea. I do think that the amount of hand-written assembly on most platforms is relatively tiny (i.e. <1%). Now, this could be useful for certain projects (e.g. ffmpeg), but I think that broad applicability might be low? Perhaps some motivating cases might be useful. Then the question is how complex would such a hook be? PEI is already pretty gnarly, and it seems like the hook would belong there. Doing it earlier as a regular LLVM pass (and inject a function call) would be possible and likely nicer.

I agree this is a pretty niche use case, but if the effort isn’t too high maybe it would still be worth it.

The back story is Please Restore Our Registers When You’re Done With Them | Random ASCII – tech blog of Bruce Dawson
Basically Chromium was crashing due to some external software not preserving XMM7 across calls.
That external software is beyond our control, but in the process, the author audited and found a couple of instances in code we could fix:
https://chromium-review.googlesource.com/c/libyuv/libyuv/+/3972137
Save xmm7 in DyadicBilinearQuarterDownsampler_sse by randomascii · Pull Request #3586 · cisco/openh264 · GitHub

There could be a lot more of these waiting to pop up as tricky bugs.

Generally, if producing hand-written assembly, something like this is quite essential to have - handwritten assembly which isn’t tested properly is a submarine bug waiting to happen…

Specifically for ffmpeg, they (and related projects such as x264 and the dav1d AV1 decoder) already do have something similar: They have an unit test framework for testing the assembly implementations - both with respect to functional and non-functional aspects - called checkasm. It passes the same inputs to an assembly function and the corresponding reference C function and verifies that the outputs match (for whatever is defined as the relevant outputs of the function), and it can do microbenchmarking of the functions.

It also wraps the tested assembly functions in such an assembly wrapper function which does such ABI testing. In addition to checking that callee saved registers are restored properly, it also tries to trigger other tricky cases, like passing undefined bits in the upper half of 64 bit parameter registers (for cases where only the lower half is defined), checks other registers (e.g. floating point mode setting registers), checks for stack buffer overflows, etc.

But you’re right that this only covers the projects that are mature enough to use such a framework (and even for those projects, it doesn’t help if there are old functions that aren’t hooked up to be tested in the framework - dav1d should have all functions covered, while ffmpeg does have some older assembly that predates checkasm). So some sort of testing of those ABI aspects for all other projects would be useful. Initially, I think it would end up with quite a lot of overhead if that is enabled on every single function call (as it’s only relevant for a small fraction of the functions) - but maybe it’s still in the same ballpark as other sanitizers?

The thing is, if the function is written in assembly, we cannot really instrument it on the callee side, it would have to be an interpositioning trampoline that wraps each call. I don’t think that there is a good way to identify such a call (perhaps we can rely on an extern declaration as a heuristic?), so this feels like it could get expensive (particularly for functions which are recursive and may rely on tail call elimination).

The problem here is that this type of verification is extremely useful in a certain set of cases. Perhaps an explicit call-site opt-in mechanism could make this cost reasonable. But that does reduce the benefit - the entire idea that existing code would need to adapt makes it less enticing.

I haven’t thought about this a lot, but what I envisioned when I wrote it was to instrument all call sites (where we can’t see and trust the callee) to check that callee-saved registers are preserved. But maybe that would be too expensive in practice.

BOLT can analyse binaries and maybe it can identify misuse in calling conventions.

For PowerPC calling conventions will be available with td files:
https://reviews.llvm.org/D137504

1 Like

It is a good mention of analyzing binary with BOLT. For the .td file it looks a good abstraction but for further checking one tough point I got into when I was thinking about verifying the prologue and epilogue generation with it, especially after custom instrumentations, is that it is not representing the stack structure like what data goes to which slot (e.g. offset to registers like sp) and the FrameLowering may just manually simulate one and even have to comment a diagram like this one in ARM target I have seen. My idea is that with having this kind of specification it could be easier for it to be used by multiple modules so that codegen just generates following this structure and linker could check that the callee is reading what they are expecting instead of having to convert codegen lines into check lines?

I am starting to think that there are two approaches to the challenges.

  • The traditional sanitizers instrument the code.
  • BOLT would be kind of a post-mortem checker. Did the compiler and linker worked well?

With BOLT it’s relatively straightforward to insert stack alignment checks and ABI-specific register preservation checks into any function. Unlike sanitizer, it can cover all functions in the binary regardless of the origin (source or external libraries), and can be targeted to specific functions with existing mechanisms (-funcs/-skip-funcs that accept regexes). It can likely be implemented in a single (late) pass in BOLT.

It seems to me that sanitizing an ABI should make it easy to find overruns on the stack that damage the “contract between callee and caller.” And for the sake of argument, this is a job for the architect in cooperation with the compiler porter to get right. The average programmer should not have to deal with ABI holes of specification or design.

It also seems to me that for your typical RISC-like ISA, that you need around 14 storage containers and 14 compares to verify that the stack was not damaged. Before the call, the registers are saved, after the call, the registers returned are compared against the saved registers. A mismatch indicates a violation of the contract. This is straight forward but expensive and must be verified in a position where the values of the preserved registers are known and visible. This is always the caller and can be in the callee only in very unusual circumstances.

On the other hand, there are a couple of new ISAs that proactively prevent damaging the contract between callee and caller. Not surprisingly, both use a second stack for preserved registers where the called subroutine has no LD or ST access to these preserved registers; so, the called subroutine cannot read or write to the preserved registers now on the “other” stack. Only the standard prologue and epilogue instructions can access this stack.

In both of these new ISAs, there is no way to create a misaligned stack using the normal means to allocated and deallocate space on the stack.

So such a tool would have to scan the code of the subroutine to find ADDs/SUBs/LDs of the stack pointer and verify alignment is required at each occurrence (also for frame pointer). It would have to verify that all preserved registers are written before being modified, and if modified, has to be loaded back from the stack before return. Such a tool would have to be cognizant of constructors, destructors, longjumps, varargs, and maybe be cognizant some ATOMIC stuff as well.

1 Like