[RFC] Supporting pointers with external storage in llvm.memcpy (and friends)

Background

LLVM recently introduced pointers with external state as a specific case of non-integral pointers. This IR concept is used to represent pointer values for which some kind of out-of-band runtime information is required, a canonical example being the validity tag bits for CHERI capabilities.

One important constraint of this kind of pointer is that memory accesses to regions containing them must be properly typed:

When a pointer type has external state, all roundtrips via memory must be performed as loads and stores of the correct type since stores of other types may not propagate the external data. Therefore it is not legal to convert an existing load/store (or a llvm.memcpy / llvm.memmove intrinsic) of pointer types with external state to a load/store of an integer type with the same bitwidth, as that may drop the external state.

While the IR specification mentions that llvm.memcpy / llvm.memmove require special handling, no facility is currently provided in upstream LLVM to handle external-storage pointers in these builtins, other than completely disabling transformations that could produce them. This proposal aims to address that.

Proposed Change

We propose, based on implementation experience in the CHERI and CHERIoT LLVM downstreams, creating a new enum-valued call site attribute to LLVM IR, which may only be applied to llvm.memcpy, llvm.memmove, and their respective inline variants:

  • copy_external_pointer_state(true): Required preservation of external pointer state. Code generation must assume that the copied region may contain pointers with external state and take appropriate steps to preserve them.
  • copy_external_pointer_state(false): No preservation of external pointer state is required. Code generation may assume that the copied region does not contain pointers with external state, and no extra steps need to be taken to preserve them.

Backends for targets with external state pointers may reject calls to llvm.memcpy/llvm.memmove that do not carry this annotation. As such, this is in-effect a tri-state value: attribute absent, copy_external_pointer_state(true), and copy_external_pointer_state(false).

Why is the attribute-absent case useful?

At first glance it may appear that the copy_external_pointer_state(true) could be made the default behavior, simplifying the implementation to a single optimization attribute and a conservative default. Unfortunately, our experience has been that this is an area with a high likelihood of regression: most transformations are written without regard to external pointer state targets.

As these regressions would otherwise manifest as difficult to track-down performance regressions, in the CHERI / CHERIoT downstreams we find it valuable to be able to distinguish the case of an omitted attribute (due to a new oblivious transformation) from an intentional conservative attribute. The CHERI backend asserts at compile time that all calls to llvm.memcpy/llvm.memmove have the copy_external_pointer_state attribute set in one direction or the other.

Does my frontend, optimizer, or backend need to care?

We will be upstreaming changes from the CHERI / CHERIoT downstreams to set, propagate, and preserve these attributes properly across Clang and all of the relevant LLVM IR optimizations. Backends that do not have pointers with external state should require no functional changes, and upstreamed support for CHERI / CHERIoT backends will naturally incorporate support for lowering the flag appropriately for them.

I do not like giving special semantics to the absence of an attribute. I also really do not like attributes that impose new constraints on operations. Well-formed attributes only go in the relaxing direction, and everything introducing structural requirements ends up being a hassle (e.g., convergent, noduplicate and strictfp) are all problematic today.

This is just bug hiding; I think it would be better to treat this as a wholly separate kind of operation.

Should this just be an entirely separate intrinsic then?

1 Like

I understand the concern here. Here’s one idea to avoid assigning special semantics to the absence of the attribute: could we define it in the LangRef to be mandatory for a memcpy-like intrinsic to have the attribute set (either true or false) if the the DataLayout contains any external-state address spaces? This could be enforced in the Verifier.

One alternative approach that we considered but discarded was adding an i2 parameter to the relevant intrinsics. That would require a huge amount of test churn that we thought would be undesirable, but we could potentially revisit that idea.

I think it’s the opposite? The entire purpose of the tri-state is to be able to report an error in a CHERI-enabled backend when the attribute is missing. The alternative of only having a relaxing attribute would result in many more hidden performance regression bugs.

We strongly wish to avoid having to add a lot of duplicate pattern matches and transformations scattered throughout the optimizers and code generator.

Could you please provide some more information on what problem this is solving? Is this about about more efficient memcpy lowering in the backend? Is it about preventing incorrect middle-end transforms, like replacing memcpy with an integer load+store?

Can you explain how this proposal will interact with the byte type that will be introduced in the future? Once that is introduced[1], memcpy cannot be legally transformed to integer load+store, it must be converted to byte load+store instead, where the byte type should be able to carry pointers with external state as well. Would that effectively solve your problem or not?

How do you actually determine that a memcpy can be copy_external_pointer_state(false)? Is this primarily going to be used for struct copies, where an member-wise copy is lowered to a memcpy and the members are all non-pointers? Does struct TBAA play any role here?


  1. Well, it’s already incorrect now, there’s just no alternative for it. ↩︎

Certainly! The purpose here is to allow more efficient inline lowering of memcpy and friends in the backend. It’s not about blocking middle-end transforms, but about retaining enough information to allow the backend to lower transform-create memcpy efficiently.

Specifically for the case of CHERI, a generalized capability-preserving memcpy is large and undesirable to inline [0]. We only want to inline a memcpy if we know either that we do not need to preserve capabilities, or that src and dest are sufficiently statically aligned to guarantee safety of an inline expansion.

[0]: Capability loads & stores have dynamic alignment requirements that necessitate a pre/post loop, similar to a vectorized memcpy.

Firstly, no, it would be orthogonal to the issue here, as we’re primarily concerned with the reverse transformation and preserving information through it.

Secondly, and this should probably be discussed in a separate thread, @davidchisnall indicates that he raised concerns that the byte type is fundamentally incompatible with CHERI when it was proposed. I’ve not been able to track down that discussion, but you can see his recent comments here and here.

The places where we are generating it in CHERI-LLVM today are in LoopIdiomRecognize, SimplifyLibCalls, and SROA, all in situations where we synthesize new memcpy’s out of operations that already weren’t capability-preserving.

We do not currently emit them directly in the frontend for struct copies, but I believe it would be suitable for that use as well.

1 Like

Thanks for the clarifications!

If you’re only interested in the performance and not correctness aspect of this, we should defer the interaction with the byte type to a separate discussion.

However, if we’re approaching this from a pure performance perspective, I think that the problem you’re solving here is too narrow. The fundamental issue here is that we may, for various reasons, want to convert a “typed copy” (as a sequence of typed load+store instructions) into an untyped memcpy. This transform loses information that currently cannot be recovered (except through analysis of related loads/stores, which is fragile).

I think the proper solution is to be able to annotate memcpys with metadata which specify that it’s actually a typed copy using these types at these offsets. This solves your problem, but it also solves many other problems:

  • If you have a memcpy that copies capabilities, but not only capabilities, you can use a more efficient expansion. E.g. if you have a pointer plus 8 integers, you only have to use a capability-aware copy for the former, but not the latter. With a single true/false flag, you’d be required to use a capability-aware copy for everything.
  • It can preserve information about padding bytes which can but do not have to be copied. If the memcpy gets expanded, the backend might chose to not copy those bytes if it is more efficient.
  • It provides information on the type expansion to use when converting a memcpy back into loads and stores, e.g. during SROA or in InstCombine. This is already a problem now (making a bad choice in SROA can have substantial impact on optimization) and will become more significant once we take poison-propagation issues more seriously (the aforementioned byte type).

This would be similar to !tbaa.struct metadata, except that it would not be bound to TBAA, which makes it unsuitable for use with non-C/C++ languages. Instead it would be bound to IR types (plus maybe optional TBAA references). Ideally we would also remove !tbaa.struct in favor of that – at least from a cursory look, I’m not clear where we’re actually using that right now (could it be entirely dead?)

To accommodate the LIR case, we’d want a way to express that the specified types repeat (as we generally don’t have a fixed memcpy size there, or it may be very large).

1 Like

I see the upside to what you’re proposing, but it would at a minimum require some infrastructural changes to support types-as-metadata.

I’ll have to give it some more thought (I’m traveling over the next several days), but the schema for the metadata seems tricky to get right if you want it to be general enough to handle the full range of possibilities.

The usual workaround for that is to use something like ptr addrspace(1) poison instead – that is, use a dummy poison value that just exists to carry the type information. (Of course, proper “types as metadata” support would be nicer, just not strictly necessary…)