Allowing arbitrary pointer sizes in data layout

Stephen_Neuendorffer · December 2, 2021, 8:23pm

Hi all,

LLVM has long had a restriction that pointer data types in the LLVM backend must be powers of two. While this works for many architectures, in some cases it is two restrictive. Because this is such a long-standing assumption, I’d like to get wider review on a patch here: https://reviews.llvm.org/D114141 that starts to relax this constraint. The patch allows DataLayouts to be parsed correctly with pointer types that are arbitrary values. Note that this does not change the alignment requirements for pointers, only the types that are permissible and used when InstructionSelection lowers pointers. We have so far used this effectively on out of tree targets and feel it is reasonably general to live upstream.

Steve Neuendorffer

stephenneuendorffer · March 30, 2022, 7:49pm

Returning to this topic: There is a question of the existing getPointerSize() API, which returns the size of a pointer in bytes and is used by some architecture-independent code. Previously we implemented this to return the next larger size in bytes: i.e. for a pointer size of 20, getPointerSize() would return ceil(20/8) = 3. However, it seems that some users of the getPointerSize() API (notably code in the AsmPrinter handling DWARF information) assumes that getPointerSize() will return a power of two.

There’s a patch for review here that ensures this: ⚙ D122758 DataLayout::getPointerSize() should always return a power of 2

It’s unclear to me what the broader implications of this are, since the ‘size’ of a pointer could have multiple interpretations: The size in a register, the size as represented in memory, the size as represented in DWARF, etc. Input would be appreciated.

Steve

stephenneuendorffer · March 30, 2022, 10:44pm

I’m currently leaning towards removing the getPointerSize() API entirely, migrating all the internal uses to call getPointerSizeInBits(). In looking more closely at the users, it seems that some are actually using getPointerSize() in contradictory ways, or ways that don’t really have anything to do with pointers. For instance, this code in OpenMPOpt.cpp:

const unsigned int PointerSize = DL.getPointerSize();

    for (Instruction &I : *BB) {
      if (&I == &Before)
        break;

      if (!isa<StoreInst>(&I))
        continue;
      auto *S = cast<StoreInst>(&I);
      int64_t Offset = -1;

      auto *Dst =
          GetPointerBaseWithConstantOffset(S->getPointerOperand(), Offset, DL);

      if (Dst == &Array) {
        int64_t Idx = Offset / PointerSize;
        StoredValues[Idx] = getUnderlyingObject(S->getValueOperand());
        LastAccesses[Idx] = S;
      }
    }

Any objections to taking this course?

Topic		Replies	Views
Status of getPointerSize()/getPointerTy() per address space? LLVM Dev List Archives	4	88	July 25, 2013
Pointer size LLVM Dev List Archives	0	66	November 16, 2011
About the result of getPointerSizeInBits(); LLVM Dev List Archives	1	103	May 24, 2012
Byte width specification in Data Layout string LLVM Project	7	569	May 3, 2022
Data Layout pointer width question Code Generation llvm	17	220	August 24, 2024

Allowing arbitrary pointer sizes in data layout

Related topics