ODR checking status?

While upgrading to libc++ 16, I discovered some ODR violations in our codebase that manifested as debug info errors with LTO. Compiler Explorer demonstrates a simplified example of what I was seeing, with debug info errors such as:

fragment covers entire variable
  call void @llvm.dbg.value(metadata i64 %2, metadata !29, metadata !DIExpression(DW_OP_LLVM_fragment, 0, 64)), !dbg !30
!29 = !DILocalVariable(name: "s", arg: 1, scope: !26, file: !27, line: 5, type: !19)
fragment is larger than or outside of variable
  call void @llvm.dbg.value(metadata i64 %3, metadata !29, metadata !DIExpression(DW_OP_LLVM_fragment, 64, 64)), !dbg !30
!29 = !DILocalVariable(name: "s", arg: 1, scope: !26, file: !27, line: 5, type: !19)
LLVM ERROR: Broken module found, compilation aborted!

There’s a couple of problems though:

  • The error message doesn’t give a good indication of the potential cause, and had it not been for @dblaikie’s comment on ⚙ D133661 [libc++] Improve binary size when using __transaction, I likely wouldn’t have been able to figure it out.
  • It’s not comprehensive. The ODR violations I found already existed, and libc++ 16 just happened to change things in a way that exposed them. I’m sure there’s more violations still lurking, and I want to find and fix all of them.

@pcc had proposed an ODR checker several years back, but I don’t think it was committed, and even the prototype code is inaccessible now, unfortunately. More recently, @MaskRay wrote a comprehensive blog post on ODR violation checking, in which he mentioned implementing a checker, but I don’t think that’s started yet either.

In the meantime, is there a good way to detect ODR type violations with Clang/LLVM/LLD? ASAN only appears to detect global variable violations. ODR hashing is really promising, but as far as I can tell it requires building with modules, and that’s going to be a substantial amount of work for us. GitHub - adobe/orc: ORC is a tool for finding violations of C++'s One Definition Rule on the OSX toolchain. is specific to OSX, whereas I’m targeting Android.

Sorry, yeah - I don’t think there’s any good solution to this at the moment.

FWIW, the prototype code for my ODR checker is still available here and here.

2 Likes

It will be nice if @pcc has time to rebase and possibly polish the patch series. I am very happy to review it or help if needed. I am going to be out for most of the next 4 weeks and will not spend much time on reviews.llvm.org, though :slight_smile:

FWIW I think the way to go is to have different mangling for different libc++ configurations. That should fix any ODR violations within libc++ w.r.t. exceptions, and have only any ODR violations in user code if they compile with -fno-exceptions/-fexceptions combinations. This way everybody should be able to fix them, and libc++ doesn’t have to worry about it.

Forgot to mention in my original post, but this particular instance was a result of mixing TUs compiled with -std=c++14 and -std=c++17, not an exceptions/no-exceptions mix.

That’s interesting. I’m not aware of any struct that are different depending on language version. Do you know where exactly the problem occurred?

I was building with libc++'s unstable ABI, and the type in question was reverse_iterator<reverse_iterator<Foo *>>. I ran into exactly the same problem as Removing an empty base class can break ABI – Arthur O'Dwyer – Stuff mostly about C++, where the type ended up as 16 bytes in the C++ 14 TU and 8 bytes in the C++17 TU (where iterator was removed as a base class because I was building with the unstable ABI).