ODR checking status?

While upgrading to libc++ 16, I discovered some ODR violations in our codebase that manifested as debug info errors with LTO. Compiler Explorer demonstrates a simplified example of what I was seeing, with debug info errors such as:

fragment covers entire variable
  call void @llvm.dbg.value(metadata i64 %2, metadata !29, metadata !DIExpression(DW_OP_LLVM_fragment, 0, 64)), !dbg !30
!29 = !DILocalVariable(name: "s", arg: 1, scope: !26, file: !27, line: 5, type: !19)
fragment is larger than or outside of variable
  call void @llvm.dbg.value(metadata i64 %3, metadata !29, metadata !DIExpression(DW_OP_LLVM_fragment, 64, 64)), !dbg !30
!29 = !DILocalVariable(name: "s", arg: 1, scope: !26, file: !27, line: 5, type: !19)
LLVM ERROR: Broken module found, compilation aborted!

There’s a couple of problems though:

  • The error message doesn’t give a good indication of the potential cause, and had it not been for @dblaikie’s comment on ⚙ D133661 [libc++] Improve binary size when using __transaction, I likely wouldn’t have been able to figure it out.
  • It’s not comprehensive. The ODR violations I found already existed, and libc++ 16 just happened to change things in a way that exposed them. I’m sure there’s more violations still lurking, and I want to find and fix all of them.

@pcc had proposed an ODR checker several years back, but I don’t think it was committed, and even the prototype code is inaccessible now, unfortunately. More recently, @MaskRay wrote a comprehensive blog post on ODR violation checking, in which he mentioned implementing a checker, but I don’t think that’s started yet either.

In the meantime, is there a good way to detect ODR type violations with Clang/LLVM/LLD? ASAN only appears to detect global variable violations. ODR hashing is really promising, but as far as I can tell it requires building with modules, and that’s going to be a substantial amount of work for us. GitHub - adobe/orc: ORC is a tool for finding violations of C++'s One Definition Rule on the OSX toolchain. is specific to OSX, whereas I’m targeting Android.

Sorry, yeah - I don’t think there’s any good solution to this at the moment.

FWIW, the prototype code for my ODR checker is still available here and here.


It will be nice if @pcc has time to rebase and possibly polish the patch series. I am very happy to review it or help if needed. I am going to be out for most of the next 4 weeks and will not spend much time on reviews.llvm.org, though :slight_smile:

FWIW I think the way to go is to have different mangling for different libc++ configurations. That should fix any ODR violations within libc++ w.r.t. exceptions, and have only any ODR violations in user code if they compile with -fno-exceptions/-fexceptions combinations. This way everybody should be able to fix them, and libc++ doesn’t have to worry about it.

Forgot to mention in my original post, but this particular instance was a result of mixing TUs compiled with -std=c++14 and -std=c++17, not an exceptions/no-exceptions mix.

That’s interesting. I’m not aware of any struct that are different depending on language version. Do you know where exactly the problem occurred?

I was building with libc++'s unstable ABI, and the type in question was reverse_iterator<reverse_iterator<Foo *>>. I ran into exactly the same problem as Removing an empty base class can break ABI – Arthur O'Dwyer – Stuff mostly about C++, where the type ended up as 16 bytes in the C++ 14 TU and 8 bytes in the C++17 TU (where iterator was removed as a base class because I was building with the unstable ABI).

I’m looking into this again after we had yet another ODR-related issue. I’m going over @pcc’s patches for his ODR checker prototype. If I’m reading hook up clang, in principle · pcc/llvm-project-old@2ac70b2 · GitHub correctly, we’re only emitting ODR table entries for CXXRecord decls, i.e. structs and classes. Is that correct, and if so, would there be anything preventing us from checking function versions as well? IIUC ODR hashing can be computed for those as well. The issue we ran into was caused by mismatched inline functions across TUs, hence my interest in that.

1 Like

@pcc seems to be OOO for 4 weeks and presumably may be slow to respond.

1 Like