contributing llvm-lipo

Hey everyone!
In October/November 2018 I started the implementation of llvm-objcopy for MachO with the long-term plan to build some popular binary-level tools on top of it. That effort stopped at the stage where some boilerplate code for reading/writing MachO files was reviewed & committed to LLVM/tools/llvm-objcopy.

Later I started working on llvm-lipo (a drop-in replacing for the tool “lipo” for manipulating “fat” binaries), but that code has never been sent for code review. The original plan was to use the approach similar to llvm-strip, where the new tool is just another “driver” for llvm-objcopy. This approach worked well for llvm-strip (the command line interface of llvm-strip maps naturally onto the interface of llvm-objcopy) but turned out to be not flexible enough for llvm-lipo. The thing is that the tool lipo doesn’t work “per object file”, instead, it can have multiple input files & single output file, no output files at all, etc.

So below is the proposal / request for suggestions how to move forward from here. It seems to me it might be a good time to try to factor out the reusable part of llvm-objcopy’s codebase and build a library from it. At the beginning the library can contain : Reader + Object + Writer (for all the supported formats: ELF/COFF/MachO) and all the tool’s specific logic will stay in place. There are several questions: what would be a good name for it and how would we want to organize the codebase.

Some options which I can imagine:

A) Leave the code where it is right now, create a library from it,
put llvm-lipo.cpp into the folder llvm/tools/llvm-objcopy

B) Move the code which belongs to the library
out of the folder llvm/tools/llvm-objcopy
and create a new folder llvm/tools/llvm-lipo for the lipo-specific code.

C) Something else ?

D) /* personally don’t like */ Try to go with the approach when llvm-lipo is a symbolic link to llvm-objcopy. This kind of bloats the codebase of llvm-objcopy,
I don’t see many benefits on this path except some space savings.

Any thoughts/suggestions/comments would be appreciated.

Kind regards,
Alex Shaposhnikov

A) It sounds like you’re pretty sure that llvm-objcopy style code could be correctly reused here. Assuming that’s the case this is probably not the ideal option

B) This is something we want to do anyway and sounds like the most ideal solution assuming llvm-objcopy style code is compatible

C) I have no ideas here

D) This seems strictly less ideal than B but, assuming this tool would still fit generally into llvm-objcopy’s general library format it would be better than A. It sounds like the CopyConfig and all that logic is the killer here. We could generalize all of that yet further but…yeah that sounds like a losing battle and not as ideal than B.

An important thing to point out is that to proceed we only have to move the MachO code into a library and we can follow up with the ELF code later like we planned to anyway. I think that 1) goes in the correct direction and 2) has virtually no overhead.

Hi Jake,
many thanks,
yea, I have very similar feelings / thoughts.

After some thinking it seems to me that this discussion/problem which I have brought up is, in fact,
more relevant to the tools which really need a robust mutable model of an object file (like objcopy, strip, install_name_tool, etc),
but the particular case of “lipo” might be simpler, I need to double check that / will take a closer look again.
What I mean - the tool “lipo” manipulates “fat” binaries by extracting, removing, replacing slices

(slice = object file for a particular architecture), but the slices themselves are unmodified, thus things can be simpler.
I’d like to think a bit more about it to make sure I’m not missing anything important here.

But anyway, I think it’s very useful to discuss this problem, many thanks for your comments!

Kind regards,

I think that pretty much hits the nail on the head. The llvm-objcopy code is for when you need to perform mutations primarily. A common mistake (well a mistake in my opinion) I see is people wanting a one size fits all solution when one doesn’t exist (or perhaps rather, ELFTypes is as close as we have). As long as we conclude that kind of mutative model fits well here then we should proceed. That’s what I left all the cryptic “if this really fits into the llvm-objcopy model” messages.

Every case is different,

but yes, as I said - I would like to take a closer look at the problem again,
it might be the case that we don’t need this complexity in this particular case,
but want to double check.

But yeah, in general I agree with you!

Hi all,

Before we get too stuck into llvm-objcopy library discussions, I’d like to point out that Alex Brachet-Mialot (CC’ed) has been accepted this year as a GSOC student with his project of librarifying llvm-objcopy. Hopefully he’ll introduce himself and his proposal a bit more in the coming days. Jordan and I are mentoring him and another GSOC student (Seiya Nuta, who is planning on working on the Mach-O side of things too). I’d like Alex to have a chance to put forward his thoughts on this, and I’m sure we can all help him in the design decisions.

I’ll have more to say on this a bit later, I’m sure, but I just wanted to bring this to your attention before things progress too far without him!


Hi James,
that’s perfectly fine,
thanks for letting me know!
Regarding llvm-objcopy (librarifying & MachO) - sounds great, please, keep me in the loop.
Regarding llvm-lipo - I’m looking at it right now (just want to have some time for thinking),
if things are fairly independent from llvm-objcopy
I will probably send an update / proposal soon (~1-2 days) for review (and if everything goes well then features can be added to the tool incrementally),
is it OK with you / am I missing anything?

Kind regards,

Right, reading through all your comments, I think I agree with Jake’s comments. It sounds like this isn’t really the right fit for llvm-objcopy as such, although there might be some small sub-set that could be reused. I’m happy to wait to see your proposal.