[RFC] Support embedding bitcodes in LLD with LTO

Hi everybody!

I'm Josef and I'm working at Oracle Labs on Sulong [1,2], the LLVM IR execution engine in GraalVM [3]. In addition to executing bare bitcode files, Sulong also accepts ELF files with embedded bitcode sections. Therefore, it would be great if LLD in (Full)LTO mode would support embedding bitcode sections to the resulting object file. Is that something that would be considered useful and worth contributing?

Thanks,
Josef

[1] http://lists.llvm.org/pipermail/llvm-dev/2016-January/094713.html
[2] https://github.com/oracle/graal/tree/master/sulong
[3] https://www.graalvm.org/

Hi Josef,

Let me clarify my understanding. Do you want to keep original bitcode files in the output executable when doing LTO, so that the resulting executable contains both compiled bitcode (which is in native machine instructions) and original bitcode files?

Did you try embedding bitcode files into existing ELF files using objcopy or linker option --format=binary?

Thanks for your response!

Hi Josef,

Let me clarify my understanding. Do you want to keep original bitcode files in the output executable when doing LTO, so that the resulting executable contains both compiled bitcode (which is in native machine instructions) and original bitcode files?

Exactly! Kind of analogous to what `clang -fembed-bitcode -c` does, but for executables.

Did you try embedding bitcode files into existing ELF files using objcopy or linker option `--format=binary`?

Yes, that is the alternative. However, having support in the linker for that would require less tweaking of exiting build systems. Adding an option to CFLAGS/LDFLAGS would then be sufficient.

That feature is probably too specific to your project. Most projects that use LTO are using LTO just because it generates better code. Your project is special as your program itself can also interpret LLVM bitcode, but that’s not the case for most other programs.

I think the option that’s closest to the one you are looking for is --plugin-opt=emit-llvm. That option makes lld to make an output file in the bitcode file format (so lld doesn’t do LTO if the option is given and instead writes a raw bitcode as an output). With that option, I don’t think it’s too hard to embed bitcode file to your executable. Run the linker twice, with and without --plugin-opt=emit-llvm, and embed the generated bitcode file using objcopy.

Does that work for you?

I agree this is specific “compared to the usual expected output of as linker”, but on the other hand it also has potential for opening cool project that can be built on top of this!
If this could be supported in lld without too much trouble (maintenance, code complexity, etc.), why not accepting the patches?

Best,

We need to set some threshold to avoid feature creeping. As I wrote, lld already has a command line flag to emit a bitcode file instead of a compiled one, and perhaps you could do a lot of things you think interesting with that option. If you are already using the feature and find it less powerful for your purposes, then we can discuss whether adding a new feature is the way to go. But as a maintainer of the tool, I don’t think asking whether or not an existing feature would work as an initial response is not unreasonable.

I am not sure to understand what you mean by this?
Is this is some sort of view of the tool as a “product” where you want a close control on the exposed surface or something like this?

I tend to see LLVM as an “open platform” more than a closed “product”: it means that, with the goal of enabling future innovations, I personally tend to look at proposed new features/patches in light of the cost/burden they add on the community maintaining the project.

We need to set some threshold to avoid feature creeping.

I am not sure to understand what you mean by this?
Is this is some sort of view of the tool as a “product” where you want a close control on the exposed surface or something like this?

No. What’s wrong with asking if an existing feature would be sufficient with an explanation how it might solve their problem? If the existing feature can solve their problem, then that’s fine, we don’t need a new feature. If not, or if you have another reason to want it, then we would discuss and may want to implement it. This is the process of avoiding the situation in which we would end up having too many overlapping similar features. If you want the new feature, can you please explain why existing feature is not sufficient?

Thanks for sharing your thoughts!

That feature is probably too specific to your project. Most projects that use LTO are using LTO just because it generates better code. Your project is special as your program itself can also interpret LLVM bitcode, but that's not the case for most other programs.

I see that the requirement somewhat specific. On the other hand, the same feature is for example supported on Darwin via the __bundle section, so I'd see it more as a feature parity measure than something that is only of use to our project. (My view might be biased, though :wink:

I think the option that's closest to the one you are looking for is `--plugin-opt=emit-llvm`. That option makes lld to make an output file in the bitcode file format (so lld doesn't do LTO if the option is given and instead writes a raw bitcode as an output). With that option, I don't think it's too hard to embed bitcode file to your executable. Run the linker twice, with and without `--plugin-opt=emit-llvm`, and embed the generated bitcode file using objcopy.

>
> Does that work for you?
>

Sure it does and I fully agree that it is currently possible to get to the result. Actually, there are many different ways to accomplish it using combinations of wllvm, gllvm, llvm-link, --plugin-opt=emit-llvm, objcopy, etc. We are currently using these but my feelings about the approaches are mixed. I see two downsides.
First, they require modifications to the build scripts. We want to support a variety of different build systems and make it as easy as possible for users to compile their projects for Sulong. Running the linker twice is not easy to accomplish in general, especially if the source project is not under our control. However, adding a linker flag is simpler in most cases.
Second, the approaches add new dependencies and their portability is limited. E.g. what about objcopy on Darwin? Anyway, as mentioned above, Darwin does support embedding linked bitcode, but then we have two distinct workflows for Linux and Darwin, which I think is not very user friendly.

That all said, I understand that there is some hesitation of bringing new, non-mainstream features to a project that needs to be stable and maintainable. Anyhow, the prototype we are currently experimenting with did no require too much work. It reuses existing code from clang (but unrelated to clang), which we moved to a common place in llvm. The rest of the patch is mainly option handling. There is hardly any duplication of logic. I see it more of an addition to existing functionality (i.e., to `--plugin-opt=emit-llvm`). All the pieces are already there.

Whatever the outcome of this RFC is, I guess I need thank you and all other contributors for providing such a stable and mature platform that allows us to do these kind of changes easily. Thanks. :slight_smile:

any further comments on this?

thanks,
Josef

Thanks for sharing your thoughts!

That feature is probably too specific to your project. Most projects
that use LTO are using LTO just because it generates better code. Your
project is special as your program itself can also interpret LLVM
bitcode, but that’s not the case for most other programs.

I see that the requirement somewhat specific. On the other hand, the
same feature is for example supported on Darwin via the __bundle
section, so I’d see it more as a feature parity measure than something
that is only of use to our project. (My view might be biased, though :wink:

I think the option that’s closest to the one you are looking for is
--plugin-opt=emit-llvm. That option makes lld to make an output file
in the bitcode file format (so lld doesn’t do LTO if the option is given
and instead writes a raw bitcode as an output). With that option, I
don’t think it’s too hard to embed bitcode file to your executable. Run
the linker twice, with and without --plugin-opt=emit-llvm, and embed
the generated bitcode file using objcopy.

Does that work for you?

Sure it does and I fully agree that it is currently possible to get to
the result. Actually, there are many different ways to accomplish it
using combinations of wllvm, gllvm, llvm-link, --plugin-opt=emit-llvm,
objcopy, etc. We are currently using these but my feelings about the
approaches are mixed. I see two downsides.
First, they require modifications to the build scripts. We want to
support a variety of different build systems and make it as easy as
possible for users to compile their projects for Sulong. Running the
linker twice is not easy to accomplish in general, especially if the
source project is not under our control. However, adding a linker flag
is simpler in most cases.
Second, the approaches add new dependencies and their portability is
limited. E.g. what about objcopy on Darwin? Anyway, as mentioned above,
Darwin does support embedding linked bitcode, but then we have two
distinct workflows for Linux and Darwin, which I think is not very user
friendly.

That all said, I understand that there is some hesitation of bringing
new, non-mainstream features to a project that needs to be stable and
maintainable. Anyhow, the prototype we are currently experimenting with
did no require too much work. It reuses existing code from clang (but
unrelated to clang), which we moved to a common place in llvm. The rest
of the patch is mainly option handling. There is hardly any duplication
of logic. I see it more of an addition to existing functionality (i.e.,
to --plugin-opt=emit-llvm). All the pieces are already there.

Whatever the outcome of this RFC is, I guess I need thank you and all
other contributors for providing such a stable and mature platform that
allows us to do these kind of changes easily. Thanks. :slight_smile:

Sorry for the late reply. I’d like to get an opinion from broader audiences.

  • Does anyone think that it is useful to have a linker option to embed LLVM bitcode to a final exectuable when doing LTO? (Currently, LLVM bitcode is discarded once LTO is done.)
  • If so, could you explain your use case?