How do I obtain the original data type of a pointer?

As the llvm always use PTR to express various types of pointer now (⚙ D126689 [IR] Enable opaque pointers by default), so how can I obtain the original data type of a pointer?

  • a simple case for example:
int foo (int *a, int *b) {
  return a < b;
  • We’ll get the following similar IR with type ptr for pointer %a and %b.
define dso_local i32 @foo(ptr noundef readnone %a, ptr noundef readnone %b) {
  %cmp = icmp ult ptr %a, %b
  %conv = zext i1 %cmp to i32
  ret i32 %conv

In general you can’t because memory isn’t typed in LLVM. You could find there are 0 (as in your example), 1, or many answers to the question depending on where you look.

Some things that can give you a hint are

  • Some pointer definitions will come with a type (e.g. an alloca or a global variable).
  • You can iterate through the uses of a pointer to see if any specify a type (and whether they’re all the same).
  • Metadata may have useful type info. TBAA for example, or debugging information.

Each of these is likely to be useful in different situations. What are you planning to do with the type when you get it?

For the case Compiler Explorer, I wan’t combine the sub + AArch64ISD::SUBS into “whilewr” as gcc. However, there are 4 kinds of whilewr to different type of pointer, eg:

  • AArch64::WHILEWR_PXX_S is for 32-bit type ponter int*
  • AArch64::WHILEWR_PXX_B is for 8-bit type ponter char*
    so I should known the types of register t2 and t4 (now the i64 is for the ptr), which help me choose correct version of whilewr.
(gdb) p DCI.DAG.dump()
SelectionDAG has 24 nodes:
  t0: ch,glue = EntryToken
            t6: i32,ch = CopyFromReg t0, Register:i32 %16
          t7: i64 = zero_extend t6
        t9: ch = CopyToReg t0, Register:i64 %0, t7
          t11: i64 = vscale Constant:i64<1>
        t13: ch = CopyToReg t0, Register:i64 %1, t11
      t22: ch = TokenFactor t9, t13
          t2: i64,ch = CopyFromReg t0, Register:i64 %15
          t4: i64,ch = CopyFromReg t0, Register:i64 %14
        t17: i64 = sub t2, t4
        t16: i64 = vscale Constant:i64<16>
      t31: i64,i32 = AArch64ISD::SUBS t17, t16
    t33: ch = AArch64ISD::BRCOND t22, BasicBlock:ch< 0xaaaabce9c830>, Constant:i32<2>, t31:1
  t26: ch = br t33, BasicBlock:ch<for.body.preheader5 0xaaaabce9c738>

It sounds like you’re going to have to have some memory accesses in hand that you want to vectorize, can’t they provide the pointee type?

yes, I still don’t know how to get the pointee type at the stage of DAG comebine, so I retry this in LoopVectorize pass.