[RFC] Tracking values through integral address space casts for improved alignment reasoning

This RFC is to get the opinion of and promote awareness for a ValueTracking feature I am working on: [ValueTracking] Allow tracking values through Integral AddrSpaceCasts for improved alignments by jrbyrnes · Pull Request #70483 · llvm/llvm-project · GitHub .

The feature will enable tracking values through integral AddrSpaceCasts, which allows for improved reasoning about alignments. Since the semantics of AddrSpaceCasts are target specific, I thought this was the proper channel for this discussion.

The main motivation is, as mentioned, to allow for improved reasoning about alignments.

As we can see in this short example Example 1 , InferAlignment Pass is able to reason about the alignment of the load. The global array it is loading from is 16 byte aligned, and we know the element that we are loading has an index that is a multiple of 16 due to the shl. Therefore, the load is 16 byte aligned and the compiler is able to determine such (as seen by the change in alignment).

However, if we insert an AddrSpaceCast, then the compiler is no longer able to determine the alignment Example 2 . The issue is that ValueTracking doesn’t support AddrSpaceCasts.

In the example, we are casting between the address spaces 256 and 0 (which are integral for x86). As we can see in X86ISelLowering.cpp::LowerADDRSPACECAST, AddrSpaceCasts between these two address spaces is semantically equivalent to a sign extend. In order to reason about the alignment, the low bits of the address must be preserved through the AddrSpaceCasts. That is the case for casts between these two address spaces.

Moreover, after looking through all the backends, I can confirm that every target that has DataLayout defined integral address spaces has integral AddrSpaceCast semantics which preserve the low bits from the original pointer. There are some instances where AddrSpaceCasts involve truncating the original pointer, however, the non truncated bits are left unmodified which is the key to this analysis. There are a handful of targets that do not have DataLayout defined address spaces, but do have addrspace handling and test coverage. The majority of these targets have defined isNoopAddrSpaceCast to be basically universally true. However, I am unable to fully conclude about integral addrspacecast semantics for the following targets: DirectX, XCore and SPIRV.

Nonetheless, I am proposing to enable tracking values through integral AddrSpaceCasts. This modifies the semantics of integral AddrSpaceCasts such that they must now preserve the non-truncated bits of the original pointer. Targets are still able to express different AddrSpaceCasts semantics through non-integral addressspaces.

I wonder if there are any conerns with this, or if anyone happens to know if DirectX, XCore and SPIRV currently uphold this behavior.

Probably in most situations the low bits are preserved… there normally isn’t really any reason to add an odd offset to an address. But how do you tell which bits are “low” without some target-specific query? I don’t think you can safely assume that ValueTracking won’t figure out the value of any non-low bits.

Oh, I see, you’re assuming all bits are “low”. More specifically:

  • For addrspace casts which extend the value, you treat all the source the bits as “low”
  • For addrspace casts which truncate, you treat all the untruncated bits as “low”
  • For addrspace casts where the source and destination are the same width, you treat all bits as “low”, i.e. you assume such casts are no-ops.

That seems pretty aggressive. For example, the x86 backend supports addrspace(256)… although it doesn’t currently support addrspace casts involving it.

Thanks for the comment, and thanks again – this is the correct definition of “low”.

It may be aggressive, but it does seem to be observed for all targets.

For example, for x86 I see that AddrSpaceCasts have the following semantics:

  if (SrcAS == X86AS::PTR32_UPTR && DstVT == MVT::i64) {
    Op = DAG.getNode(ISD::ZERO_EXTEND, dl, DstVT, Src);
  } else if (DstVT == MVT::i64) {
    Op = DAG.getNode(ISD::SIGN_EXTEND, dl, DstVT, Src);
  } else if (DstVT == MVT::i32) {
    Op = DAG.getNode(ISD::TRUNCATE, dl, DstVT, Src);
  } else {
    report_fatal_error("Bad address space in addrspacecast");
  }

(sorry for the code dump, github doesn’t like Target/X86/X86ISelLowering.cpp)

So if the SrcAS is X86AS::GS (address space 256) and the DstVT == MVT::i64 (as with the address space 0), then it will be lowered into a sign_extend. However, X86AS::GS pointers are also 64 bits so it is just a noop.

I’m not quite sure what you mean that X86 doesn’t support AddrSpaceCasts involving AS(256) – llc seems to have no issue: X86 AddrSpaceCast Lowering

X86 doesn’t support AddrSpaceCasts involving AS(256)

For example:

define float @f(ptr addrspace(256) %x) {
  %xx = addrspacecast ptr addrspace(256) %x to ptr
  %l1 = load float, ptr %xx, align 4
  ret float %l1
}
define float @f2(ptr addrspace(256) %x) {
  %l1 = load float, ptr addrspace(256) %x, align 4
  ret float %l1
}
f:
        movss   xmm0, dword ptr [rdi]
        ret
f2:
        movss   xmm0, dword ptr gs:[rdi]
        ret

The two functions are loading from different addresses… which implies the addrspace cast is getting miscompiled.


Anyway, the more general point is that it’s reasonable to have an address-space that involves some non-trivial computation… and even if we don’t actually have any examples in-tree, someone is likely to request some sort of hook for the sake of out-of-tree targets.

Ah, I see what you mean. Well, the proposed analysis is still consistent with the way AddrSpaceCasts are lowered, so this doesn’t actually seem like an issue with the proposed feature (but perhaps the examples in the proposal…)

After this feature those would fall into non-integral address spaces – targets can use non-integral to bypass this tracking. This feature, then, primarily increases the conditions for an address space to be integral.

I’m under the impression that addrspacecast is target specific with essentially unspecified semantics. I’m not convinced we should use whether it has an integer tag or not to control assumptions made on the representation.

However lots of these probably are int2ptr bitcast with a following truncation or (sign/zero?) extension. Maybe this is one for extending the target info interface so transforms can query the target for whether a given cast is integer-like or not?

As always with these things I’m nervous about adding reasoning ability to pointer provenance style deductions - we might want to disable the insight if some aliasing style compiler flags are in effect.

The feature would make amdgpu IR shorter which would be delightful.