Project on Heap Canaries (or something like them, based on randstruct) for Linux

Hello LLVM community,

This is an odd topic, but I wanted to “test the water” and get some ideas of how to best go about the subject, as well as better understand some of the challenges involved.

Many may be familiar with __randomize_layout, a struct attribute that is being used as part of the Linux KSPP to prevent data-only attacks on critical OS components. One of the drawbacks to this is that even under the circumstances of a randomized layout, an adversary with a read gadget can “infer” the correct address to write to within an allocated mutable data structure.

Following up on recent work from ARM, myself, and a bunch of other cool (imo) people regarding protecting these data structures using mechanisms operating at the page granularity (e.g. POE, see [RFC PATCH v3 00/15] pkeys-based page table hardening - Kevin Brodsky, or HVCI [RFC PATCH v3 0/5] Hypervisor-Enforced Kernel Integrity - CR pinning - Mickaël Salaün, and more), we still run into this problem of 20,000+ different types in the kernel, with all sorts of different semantics and allocation patterns.

After talking with Kees Cook and Kevin Brodsky about the subject in some more depth and spending many hours in thought on the topic, I have the following idea.

Bootstrap off of __randomize_struct with a flag that also introduces a canary value, not into the data structure but into a separate set of read-only code pages (these may be protected by the kernel or by HVCI), and XOR this canary value into fields of the data structure that we would like to mark “immutable” / protect. Then, have the code refer to this canary value when accessing protected fields of this data structure, but in general, memcpy and other movement/allocation/de-allocation operations can proceed as expected.

The adversary, then, to use a “write gadget” which aims to accomplish a data-only attack, would also need to change the canary value in the read-only memory page, but this is presumed difficult, and so they would need to leak the canary value in order to determine the “right” update. To avoid this there are several possible mechanisms, but my feeling is the seed value used for randstruct could also be used to partially randomize the location of the “ground truth” canary within the “canary region”, forcing the adversary to try all N (~20,000) canaries that are allocated for the data structures in the kernel.

I expect there may be significant issues related to type-casting, specifically, and that said protected types would need a warning on implicit conversion at the pointer level, and during direct assignment.

Is something like this even feasible? I will also do some digging to figure out how much it makes sense, but I wanted to post here and engage the Clang community on the ideas, get some critiques, additional ideas, etc..

Hopefully this all makes sense.

Regards,
Maxwell Bland

1 Like

Update on my original post here.

I’ve since “dug in” and written up a lot of code to do this, and it actually, like, seems to work for user-land binaries, which is cool to report. Motorola’s open source software workflows are nonexistent, though, so I do not know if I will ever be able to post any plugins to the WWW.

Some notes if you want to build a GPL-2.0 solution:
(1) My earlier idea, the whole “bootstrap off of __randomize_struct” thing, was dumb. You just need a Recursive AST traversal pass and two LLVM passes to identify struct accesses and then generate additional integrity check handlers based on the existing data-flow inference steps. Then, just live-patch-inject the integrity checks. You still may need to worry about type-casting and aliased memory, but for the majority of real-world cases where this might be useful, e.g. the kernel, and manual exceptions / LLVM pass identification can solve these issues. I’m also thinking you could probably do more than I did and maybe build off of ASAN with an AST routine to throw an error on aliasing. I’ve not gotten to that part yet, since I need to finish up a (thankfully GPL-2.0!) LSM and just “see what happens” on a real device when trying to use a system like this. Also, the downside is using llc and opt in this way necessitates you preserve all the existing passes/compiler flags you use for the original, uninstrumented binary.

(2) QBDI and other contemporary frameworks are kind of limited in what they can do/where they can be deployed when it comes to ARM64 instrumentation. BUT no program needs to be greater than 32MB in size. (-; The b instruction in aarch64 can only jump +/-32MB (+/-0x4000000) because the immediate value added to the current program counter must fit into 26 bits for the 4 byte machine code instruction. You see what I’m saying? It’s easy to reinstrument arm64 binaries if you generate an object file that lives within the same 32MB range as the original code, just add an immediate branch.

Cheers,
Maxwell Bland