16-bit bytes for AsmPrinter/DWARF


I’m with a team using 16-bit bytes for an out-of-tree target. The AsmPrinter framework’s implementation of the DWARF debugging format is not very good at distinguishing between target-sized bytes (which is the more common use) and 8-bit-bytes. The DWARF standard itself seems not very good in this regard, actually. So we have had to hack our way around this. I.e., at some call-sites of EmitSymbolValue(), EmitLabelDifference(), EmitIntValue() and EmitLabelReference(), an 8-bit-byte-sized argument has had to be converted to target-byte-size (which is extra hacky for odd numbers of eight-bit-bytes).

We’ve been thinking about what a good upstream fix would look like, and believe that perhaps converting all Size arguments in these call chains to BitSize would be the most practical way. The “cost” would be the multiplications necessary at call-sites. Would this be a good suggestion, and would anybody like to view and accept our patches for this?



LLVM itself is pretty unfriendly to non 8-bit bytes - how are you solving this everywhere else?


That’s true. We have a number of other patches too: Some different helper functions, a target specific constant instead of the magic number 8 and so on.

Ideally, we’d like to upstream it all and would be fine with putting in the work required if the community is positive about it. As an out-of-tree target, we can’t supply tests that would ensure it keeps working, but we’d notice problems and continually upstream fixes.

Recently, we supplied most of our patches in this area to the DCPU16 project.


Hi Jesper,

We have been trying to solve this problem ourselves for a couple of
out of tree backends, and our solution has involved calculating
everything LLVM in "char" size instead of hard-coded 8-bit (so this
would be 16-bit in this case), and then for AsmPrinter we convert from
chars to octets for the purpose of DWARF generation. I think what
you're describing sounds like a similar approach to this problem.

In the long term we hope to push this upstream, as it seems this is a
problem people re-solve again in various out of tree forks and it
makes sense to have a good solution in-tree which would make
everyones' lives easier. This is one of the things we're hoping to fix
with the AAP backend (we're working on a 16-bit addressibility mode).
My colleague Ed Jones is giving a talk about this tomorrow at FOSDEM
(FOSDEM 2017 - Adding 16-bit Character Support in LLVM), and hope to
start a longer discussion about this either at FOSDEM or at EuroLLVM,
and solve this problem for good.


Hi Simon,

It's encouraging that there are others in the same situation that have
been developing similar solutions. I agree this really makes the case
for having a solution in-tree. I just watched your Fosdem video and
indeed we seem to have very similar solutions with DataLayout and so
on, although we have been sticking with the "byte" vocabulary. (As the
standard does not peg it to 8 bits, I think there'd be some
disadvantages to honoring a faulty assumption.)

As you guys seem to be preparing to move on this by introducing an in-
tree target, I guess we at Ericsson should rest for now. We'll be happy
to assist in any way, though, for instance by reviewing code. Please
let us know if there's anything we can do and let's keep in touch!

Thanks, Jesper