Which is more compact, .bc or .ll.gz? And what might be even more compact?

According to the few tests I did, .ll.gz is more compact:

1.00 LLVM bitcode (.bc)
0.80 Gzipped LLVM bitcode (.bc.gz)
4.13 LLVM assembly (.ll)
0.68 Gzipped LLVM assembly (.ll.gz)

However, there's not much in it, considering that a stripped native binary is about 0.40 on the same scale.

So, seeing as projects such as PNaCl want to send LLVM bitcode over the network, are there any proposed solutions for making LLVM bitcode more compact?

Removing or simplifying the names of local variables would be an obvious thing to do. Is there anything else that could be done without changing the bitcode format? (There's an obvious analogy with JavaScript compression techniques.) Does anyone have any idea how much it would help?

Edmund

According to the few tests I did, .ll.gz is more compact:

1.00 LLVM bitcode (.bc)
0.80 Gzipped LLVM bitcode (.bc.gz)
4.13 LLVM assembly (.ll)
0.68 Gzipped LLVM assembly (.ll.gz)

However, there's not much in it, considering that a stripped native
binary is about 0.40 on the same scale.

So, seeing as projects such as PNaCl want to send LLVM bitcode over
the network, are there any proposed solutions for making LLVM bitcode
more compact?

Removing or simplifying the names of local variables would be an
obvious thing to do.

opt -globaldce -strip -strip-dead-prototypes -deadtypeelim

-strip removes names of local vars (and more).

Is there anything else that could be done
without changing the bitcode format? (There's an obvious analogy with
JavaScript compression techniques.) Does anyone have any idea how
much it would help?

You might try some other compression techniques, .xz seems to be
popular these days.

Best regards,
--Edwin

Thanks for the advice.

I tried comparing LLVM bitcode, LLVM assembly, and x86 binary with all the files stripped and LZMA-compressed.

The compressed LLVM assembly was very slightly smaller than the compressed LLVM bitcode. Both were about 1.45 the size of the compressed native binary.

That's not a very exciting ratio. However, perhaps it's interesting that the bitpacking and other ad hoc compression techniques used in LLVM bitcode seem to get in the way of standard compression algorithms.

Of course the compression techniques used in LLVM bitcode have the advantage that they allow the data to be selectively parsed.

Edmund