RFC: Deactivation symbols

With pointer field protection we adjust the in-memory representation of certain pointer fields to be different from the regular representation of a pointer. This almost works except that it is possible for code to take the address of a field and pass it outside of the translation unit, i.e. to code that has no idea that the field uses a different in-memory representation, which means that it will read an incorrect value from the field and will corrupt it if written to. To address this, we must disable pointer field protection for any field whose address escapes from the translation unit (if the address is merely taken but does not escape, it is in many cases possible to compile the translation unit to hide the fact that the representation is different using the as-if rule). This disablement must occur in all translation units in the program. This RFC proposes to implement this using special symbols in the object file, which we refer to as deactivation symbols.

A deactivation symbol acts as an identifier for specific instructions to be replaced globally in the program. Typically, a deactivation symbol is used to replace the instructions with NOP, i.e. deactivated (hence the name), by applying relocations to the instructions. This may be used to implement global PFP disablement for a field by taking advantage of the fact that replacing the representation-altering instructions for a field with NOPs is equivalent to disabling PFP for that field. Hence, PFP disablement utilizes one deactivation symbol for each field. A translation unit which escapes the address of a field and hence needs to deactivate instructions controlled by its deactivation symbol shall define the symbol as an absolute (SHN_ABS) symbol with a value equal to the representation of the NOP instruction on the target architecture, such as 0xd503201f on AArch64 or 0x90 on x86.

Each instruction controlled by a deactivation symbol is relocated using the corresponding deactivation symbol, which appears in the object file as a weak undefined symbol. For fixed length ISAs, there is one relocation per instruction, which shall have the same size as the width of an instruction, such as 4 bytes on AArch64. For variable length ISAs, each “granule” of the instruction shall have its own relocation (we define an ISA’s granule size as the number of bytes that all instruction lengths are divisible by, such as 1 byte for x86 and 2 bytes for Thumb-2 and RISC-V). The relocation shall have the following semantics: if the symbol is defined, relocate the place by storing the absolute value of the symbol, otherwise leave the place untouched. These semantics are almost, but not quite, the same as existing relocation types such as R_X86_64_8S and R_AARCH64_ABS32. In both instances, the existing relocation types will unconditionally overwrite the place even if the symbol is not defined, which is not what we want (because that will overwrite the existing instruction with all zeros). Hence, the deactivation symbols feature requires us to introduce new relocation types. The strawman proposal is to introduce new relocation types with a name including INST (e.g. R_AARCH64_INST32), but I don’t have a strong opinion about the name.

I also considered proposing to change the semantics of the existing relocation types, since it is likely that existing object files already have zeros at the pre-relocated place, thus preserving semantics in the case where the symbol is undefined, but this would lead to silent breakage of programs utilizing deactivation symbols when linked with linkers that do not support the new relocation semantics (instead of loudly failing at link time as desired), so I think it is best to go with new relocation types.

To attach a deactivation symbol to an instruction at the IR level, the proposal is to attach an operand bundle to the call instruction representing the intrinsic or function call. In the case of PAC instructions, this can be the llvm.ptrauth.sign or llvm.ptrauth.auth intrinsics. Because deactivation symbols replace instructions with NOPs, it shall not in general be considered valid to attach them to arbitrary call instructions, as the semantics of replacing arbitrary instructions with NOP is not well defined. Instead, deactivation symbols shall only be supported on specific intrinsic or function calls as specified by the target, and the semantics of each intrinsic or function call shall be defined specifically. For example, we may define llvm.ptrauth.sign and llvm.ptrauth.auth to have the following semantics:

  • If the deactivation symbol is not defined, the intrinsics have their usual semantics.
  • If the deactivation symbol is defined, the intrinsics return their first argument.
  • The deactivation symbol may only be defined to NOP. Any other definition will result in undefined behavior.

On architectures such as AArch64 which use the same register for the return value as for their first argument, we can, with few backend changes, define the semantics of normal call instructions such that the first argument is returned if the deactivation symbol is defined, by placing the deactivation symbol relocation on the BL instruction and disabling tail calls. This is, for example, sufficient for EmuPAC. It may be possible to support something similar on architectures such as x86 that use different registers for the first argument and the return value, but that would likely require the backend to be adjusted to emit an extra MOV instruction before the call.

Deactivation symbols have hidden visibility and are scoped to the linkage unit. Applying deactivation symbol relocations at runtime would likely be prohibitively expensive. Hence, multiple linkage units utilizing deactivation symbols are not necessarily compatible with each other, and the intended usage model is that there is a single linkage unit for the whole program (similar to LLVM CFI). However, given that it is unlikely that a given field’s address escapes, the practical likelihood of incompatibility is fairly low when using multiple linkage units. For such use cases it may be sufficient to utilize a tool that checks the symbol tables of provided linkage units to verify whether they are compatible, i.e. whether all deactivation symbol values and definednesses are consistent. If an incompatibility is detected, it would be necessary to rebuild the linkage units with an attribute disabling PFP for the field with the incompatibility. Alternatively the tool may include functionality to make provided linkage units compatible by retroactively applying deactivation symbol relocations using the relocation sections emitted by --emit-relocs.

The pull requests that add deactivation symbol IR, backend and linker functionality may be found here, here and here.

1 Like

Could you give an example of the problem here?

Is it that the thing you call may have side effects, and so not calling skips the instructions you wanted to skip, but also the side effects?

Which would make it a situation where you could attach it to any call, but you would have to do the work of verifying that skipping it was ok.

Overall, I think this works like this:

  • If the compiler sees that a field’s address escapes the translation unit, it adds this deactivation symbol to it.
  • This becomes a relocation which causes the linker to NOP out part of the code that would change the representation.
  • Which preserves the expectations other objects have about the value of that address.

If you are NOP-ing these specific instructions, is there a way to know for sure that you are not NOP-ing in a copy of the signing function that is shared between fields? If they all call sign_my_pointer and you NOP out the pac instruction in there, now you have disabled it for all of them.

This comes back to your decision to restrict this to specific intrinsics I expect. Ones that you know will be inlined.

Could the machine outliner cause problems here?

In general I think we want the result of a call (by which I don’t just mean function calls but also intrinsic calls) to be well defined whether or not the deactivation symbol is defined. This breaks down if the destination register is separate from the source register because the destination would have an unknown value after the instruction. This applies not only to architectures where the ABI specifies that the return value is not passed using one of the argument registers, but also on AArch64 to most intrinsics. To pick an example at random, consider ADDP:

%tmp1 = call <16 x i8> @llvm.aarch64.neon.addp.v16i8(<16 x i8> %lhs, <16 x i8> %rhs)

This intrinsic call may be compiled to something like

ADDP V0.16B, V1.16B, V2.16B

It would not be sensible to place a deactivation symbol on this instruction because if the instruction is deactivated, V0 would end up containing an undefined value (i.e. whatever happened to be in it before) instead of the value of one of the operands. So for example if we wanted to support deactivation symbols on ADDP we would need to modify the code generator to tie the destination register to one of the source registers if a deactivation symbol operand bundle is present and then that register would be the one that would be defined to be returned if the deactivation symbol is defined.

Correct.

In this example the code generator wouldn’t place the deactivation symbol relocation on the PAC instruction in sign_my_pointer. Instead it would be placed on the BL sign_my_pointer instruction in the callers according to which field is being accessed. This is how Emulated PAC works, for example.

The code generator would be expected not to merge PAC instructions (or BL instructions for Emulated PAC) that pertain to different fields. The change that I made to MachineInstr::isIdenticalTo ensures this.

Kind of, but inlining is somewhat orthogonal, it has more to do with whether replacing with NOP ends up doing something reasonable.

The machine outliner compares instructions with isIdenticalTo (see this and this), that should already prevent outlining instructions with different deactivation symbols because of the change that I made to that function.

Understood, thanks for explaining.

So if you were to have “emulated simd” where you deactivate anything using Neon, you would have to find a single instruction way to do the equivalent of ADDP, or replace it with a call to an emulation routine. Which would probably require some extra steps vs. the single NOP you can use for PAC.

Makes sense considering these two facts.