Porting libxar to LLVM

Hi,

I want to float the idea of rewriting this external dependency as part
of LLVM. For context, Apple's `-bitcode_bundle` flag packages bitcode
within the `__LLVM` segment of the larger executable. The bitcode
files are packaged by serializing them into a XAR (XML Archive) file,
which then gets copied into `__LLVM`. llvm-objdump and LLD use libxar
to read and write these XARs.

I think there are several good reasons for writing our own libxar:
1. libxar's interface doesn't allow for *not* writing to disk (i.e. we
cannot write to an mmap'ed region), so it incurs unnecessary I/O
overhead
2. It doesn't support Windows
3. It exposes a rather primitive C interface (using raw ints to
indicate error status) that is out of place in LLVM's C++

The goal would be to have both LLD and llvm-objdump use this
home-grown implementation of libxar, instead of the system one.

One possibility is to contribute to the existing libxar, but I don't
know if it is really maintained; Apple releases open source tarball
drops every so often, and there's https://mackyle.github.io/xar/ which
appears to be a fork of the now-defunct http://xar.googlecode.com/,
but even that hasn't seen a commit since 2014.

I may have an undergrad who's eager to take this work on, but before
they embark down that path I wanted to run it by y'all. Let me know if
you have thoughts.

Cheers,
Jez

Might want to CC any of the original authors from Apple - I expect
they'd have some thoughts on this, maybe know something about the
maintained-ness status of libxar, etc.

If we do end up writing something of our own, I wonder if it has to be
xar or whether there's something else we should entertain? (maybe the
bitcode format could be reused/broadened/or the like)

+Davide -- do you know who (if anyone) at Apple maintains this?

Jez

Hello,
Silly question but is xml the best format to be used? If so if it is fairly eparate lib, i could try to help with rewriting it.

Best regards,
Pawel Kunio

pt., 30.04.2021, 15:23 użytkownik Jez via llvm-dev <llvm-dev@lists.llvm.org> napisał:

  • Wu

I will say XAR is a very specific format for AppStore submission. Is there any reason you want to replicate the same format on other platforms? Is your goal to build and create AppStore submission from a Windows machine? That is the only reason I think of why you want to port libxar implementation. Otherwise, XAR is in the system for macOS and can probably be an optional configuration for other platforms.

If you just want a general way to package up a bitcode bundle, there are probably better options now and may want to design something that is cross platform and can be easily used for different goals.

Steven

Is there any reason you want to replicate the same format on other platforms?

Of the 3 reasons listed, I guess running on Windows is the least
important. I think it's nice to be fully cross-platform given that
LLVM is a cross-platform project, but there's no pressing use case. I
think not having to write to disk is the biggest benefit here.

Ultimately I was thinking of this as a low-pri, nice-to-have, and
fairly self-contained feature that a beginner could use to get
familiarized with parts of LLVM. (Hence the mention of the eager
undergrad; I wasn't planning to do it myself.)

Jez

Is there any reason you want to replicate the same format on other platforms?

Of the 3 reasons listed, I guess running on Windows is the least
important. I think it's nice to be fully cross-platform given that
LLVM is a cross-platform project, but there's no pressing use case. I
think not having to write to disk is the biggest benefit here.

Ultimately I was thinking of this as a low-pri, nice-to-have, and
fairly self-contained feature that a beginner could use to get
familiarized with parts of LLVM. (Hence the mention of the eager
undergrad; I wasn't planning to do it myself.)

Yeah, I would say let's not invest too much into this without a good use case for that. Also considering our new LTO interface is also file system backed, instead of doing in memory operation, I don't think disk IO is a big concern here as well.

If you want to do it for the sake of AppStore, maybe just implement it behind a CMake check and let it be Darwin only for now.

Steven