Help understanding a change to GOT relocations of undefined VTT entries in clang 12

Something appears to have changed regarding generation of a relocation against an undefined VTT symbol between clang 11 and 12.

struct A {};
struct B : virtual A {};
struct C : virtual A {};

struct D : B , C {
    virtual void foo();
};

void bar() {
    D* d = new D;
    d->foo();
}

Minimal reproducer on Godbolt (flip between clang 11.0.0 and 12.0.0 to see the relocations change)

  • In clang 11, an R_X86_64_64 reloc is used for the VTT for D
  • In clang 12+, an R_X86_64_REX_GOTPCRELX is used for the VTT for D

I tried searching through relevant changes in that timeframe, but came up empty. I must admit I’m unfamiliar with this area of code.

Note that the example (and my situation) don’t cause a link error because I’m only concerned with relocatable object output for the file in question.

Does anyone know what may have changed (and why)?

Thanks!

I think you’re forgetting to look at the generated assembly. It looks like in clang 12, the offset to the global variable is getting folded into the load.

Thanks for taking a look. Are you referring to this difference?

11:

        call    B::B() [base object constructor]
        movabs  rax, offset VTT for D
        add     rax, 16
        mov     rcx, qword ptr [rbp - 16]       # 8-byte Reload
        add     rcx, 8
        mov     rdi, rcx
        mov     rsi, rax

12:

call    B::B() [base object constructor]
        mov     rdi, qword ptr [rbp - 16]       # 8-byte Reload
        add     rdi, 8
        mov     rsi, qword ptr [rip + VTT for D@GOTPCREL]
        add     rsi, 16

But even with the folding, 11 is using direct addressing versus 12 using the GOT, correct?

I’ve tried a few more things to eliminate the GOT indirection in my original example:

I’m still digging, but it seems like this new behavior is relying on the linker to eliminate the indirection. But that doesn’t help for object output from clang. And lld doesn’t appear to do GOT elimination when linking multiple .o’s into another relocatable object, even if one of the .o’s holds the VTT definition.

I spent today running a bisect. This is the commit that caused the change in behavior:

Specifically, that commit introduced the change to R_X86_64_GOTPCREL from R_X86_64_64. Then, later on, R_X86_64_GOTPCREL became R_X86_64_REX_GOTPCRELX (I didn’t bisect for this difference; I think it’s inconsequential to the issue).

@MaskRay Could you please comment on the discussion here so far? Is it expected that the VTT symbol would be affected by this commit?

Oh, sorry, I got confused; yes, it is using the GOT.

It looks like clang isn’t marking up VTT symbols correctly in LLVM IR; they should have the same markings as the corresponding vtable symbols, I think.

Does this fix your issue?

diff --git a/clang/lib/CodeGen/CGVTT.cpp b/clang/lib/CodeGen/CGVTT.cpp
index 564d9f3..d53a02b 100644
--- a/clang/lib/CodeGen/CGVTT.cpp
+++ b/clang/lib/CodeGen/CGVTT.cpp
@@ -122,6 +122,7 @@ llvm::GlobalVariable *CodeGenVTables::GetAddrOfVTT(const CXXRecordDecl *RD) {
   llvm::GlobalVariable *GV = CGM.CreateOrReplaceCXXRuntimeVariable(
       Name, ArrayType, llvm::GlobalValue::ExternalLinkage, Align);
   GV->setUnnamedAddr(llvm::GlobalValue::UnnamedAddr::Global);
+  CGM.setGVProperties(GV, RD);
   return GV;
 }

Thanks! Yes, it does! At least, it fixes my minimal reproducer when I apply that change to clang 12.

I’m going to try applying the fix on the full project, and I’ll report back.

The patch resolves the issue. I used 13.0.1 for the full validation. Thank you again for investigating.

If you need a reviewer or further testing if the patch needs to be tweaked I am happy to help. :slight_smile:

VTT does not have dso_local. After [TargetMachine] Don't imply dso_local on global variable declarations in Reloc::Static model, direct access relocation will be avoided. In -fno-pic mode, an absolute relocation is used.

  • -fdirect-access-external-data

Because we don’t set dso_local on external VTT for -fno-pic right now. If we do, -fno-direct-access-external-data can drop dso_local.

LGTM.

@efriedma-quic Just checking: are you planning to send this patch for review, or did you want me to attempt that and send it your way?

I’ll send it for review. (Probably this week?)

Sounds great, thank you.

It looks like there’s at least one more instance of this connected to coverage instrumentation (-fprofile-instr-generate). Here’s the reproducer (flip between 11 and 12 again).

11:

0000000000000003  000000150000000b R_X86_64_32S           0000000000000000 __llvm_profile_runtime + 0

12:

0000000000000003  000000150000002a R_X86_64_REX_GOTPCRELX 0000000000000000 __llvm_profile_runtime - 4

Note that, just like in the previous example, I am explicitly not using -fPIC. (The -fPIC case looks correct from 11 to 12: R_X86_64_GOTPCREL to R_X86_64_REX_GOTPCRELX.)

The llvm_profile_runtime symbol is created here within InstrProfiling::emitRuntimeHook():

  auto *Var =
      new GlobalVariable(*M, Int32Ty, false, GlobalValue::ExternalLinkage,
                         nullptr, getInstrProfRuntimeHookVarName());

We cannot unconditionally set dso_local here as in @efriedam-quic’s patch. My attempt (against 13.0.1) resolved my failure, but I’m not sure about larger implications:

@@ -1117,6 +1117,9 @@
       new GlobalVariable(*M, Int32Ty, false, GlobalValue::ExternalLinkage,
                          nullptr, getInstrProfRuntimeHookVarName());

+  if (M->getPICLevel() == PICLevel::NotPIC)
+    Var->setDSOLocal(true);
+
   // Make a function that uses it.
   auto *User = Function::Create(FunctionType::get(Int32Ty, false),
                                 GlobalValue::LinkOnceODRLinkage,

Things have changed in later releases too. I am still investigating that, but I wanted to update that there might be another instance of this issue.

I think you should be able to unconditionally mark __llvm_profile_runtime HiddenVisibility; the definition in libclang_rt.profile is marked “hidden”.