Difference when compiling human readable IR vs bitcode with clang frontend

We've noticed a difference in the embedded bitcode when compiling human readable IR to an object directly vs first compiling IR to BC and then an object through clang -cc1.
If the original IR file contained an "llvm.compiler.used" gv, it will be preserved when compiling IR -> BC -> Obj.
When compiling IR -> Obj directly, it will be removed.

This difference does not exist for the "llvm.used" gv however, it is always preserved.
This questions seems related to the following lit test in LLVM: https://github.com/llvm-mirror/llvm/blob/master/test/Transforms/GlobalOpt/compiler-used.ll.

Is this somehow expected behaviour?

Reproduce:
Source taken from the lit test.

define void @foo() {
  ret void
}

@llvm.used = appending global [1 x i8*] [i8* bitcast (void ()* @foo to i8*)], section "llvm.metadata"
@llvm.compiler.used = appending global [1 x i8*] [i8* bitcast (void ()* @foo to i8*)], section "llvm.metadata"

# Compile IR -> Obj directly.
clang -cc1 -triple x86_64-apple-macosx10.13.0 -emit-obj -fembed-bitcode=all -x ir test.ll -o test_ll.o

# Compile IR -> BC -> Obj.
clang -cc1 -triple x86_64-apple-macosx10.13.0 -emit-llvm-bc -fblocks -fencode-extended-block-signature -x ir test.ll -o test.bc
clang -cc1 -triple x86_64-apple-macosx10.13.0 -emit-obj -fembed-bitcode=all -x ir test.bc -o test_bc.o

# Extract and disassemble embedded bitcode from both scenarios.
segedit test_bc.o -extract __LLVM __bitcode bc_bc.bc
segedit test_ll.o -extract __LLVM __bitcode ll_bc.bc
llvm-dis bc_bc.bc
llvm-dis ll_bc.bc

# Diff both IR files to show that only bc_bc.ll contains "llvm.compiler.used"
diff bc_bc.ll ll_bc.ll

- Dennis Frett

From: llvm-dev [mailto:llvm-dev-bounces@lists.llvm.org] On Behalf Of
Dennis Frett via llvm-dev
Sent: Friday, January 18, 2019 4:15 AM
To: llvm-dev@lists.llvm.org
Subject: [llvm-dev] Difference when compiling human readable IR vs bitcode
with clang frontend

We've noticed a difference in the embedded bitcode when compiling human
readable IR to an object directly vs first compiling IR to BC and then an
object through clang -cc1.
If the original IR file contained an "llvm.compiler.used" gv, it will be
preserved when compiling IR -> BC -> Obj.
When compiling IR -> Obj directly, it will be removed.

This difference does not exist for the "llvm.used" gv however, it is
always preserved.
This questions seems related to the following lit test in LLVM:
https://github.com/llvm-
mirror/llvm/blob/master/test/Transforms/GlobalOpt/compiler-used.ll.

Is this somehow expected behaviour?

I am curious what happens if you do IR -> BC -> IR -> BC; I'd expect
the IR to more-or-less match (the differences all being due to one
is hand-written and one is a disassembly) and the two BC files should
be identical.

If not (and I'm am guessing they aren't, which is why you see some
differences in the compiled object file) that's a bug.
--paulr

Going from IR <-> BC does not seem to create a difference.
IR → BC → IR → BC, either with clang frontent or by using llvm-as and llvm-dis yields identical bc files.

I have only been able to reproduce this issue when emitting to an object file.

It’s good that IR <-> BC produces consistent results. I think it’s not good that IR->Obj and BC->Obj produce different results.

The next step would be to look at the dumps from llc to see what is different when starting with IR versus BC. That should help track down where something gets lost.

–paulr

Hey Dennis,

Maybe I’m doing something wrong, but I cannot reproduce your issue. Starting with your example as input.ll:

$ llvm-as input.ll -o input.bc

$ clang -fembed-bitcode=all -x ir input.ll -c -o ll.o

$ clang -fembed-bitcode=all -x ir input.bc -c -o bc.o
$ md5 ll.o
MD5 (ll.o) = 7e9bd15c4dd786a4bb4aec762d4e842e

$ md5 bc.o
MD5 (bc.o) = 7e9bd15c4dd786a4bb4aec762d4e842e

I verified with otool that there’s actually an embedded bitcode section.

However, looking at the cc1 invocation, I noticed the -emit-llvm-uselists, that sounds like it might have something to do with the behavior you’re seeing?

Cheers,
Jonas

Hi Jonas,

The reason why you are not seeing a difference is because you are not directly calling the clang frontend.
When running the clang driver like this with -v, you will notice it does 2 clang frontent invocations: IR → BC, BC → Obj.
So commands in your case will always go to BC before being compiled to an object.

Hi Paul,

llc outputs the same in both cases. This is what I would expect since the object files in both cases will be equivalent, except for the bitcode embedded inside.

  • Dennis Frett