writeNopData and non-instructions in .text

Hi all,
all ELF platforms at least and likely all others too allow something
like the following:

    .text
    .asciz "arbitrary long string"
    .p2align 3

Depending on the size of the string, MCAsmBackend::writeNopData is
called to pad text. For x86 and other backends with byte-sized
instructions, this is no problem. Some backends like ARM and PPC
flush explicitly to 16bit/32bit boundaries. There is an interested
question on whether the non-instructions should be leading or trailing
-- I think the behavior in ARM and PPC is wrong in this regard. The R600
backend seems to be just broken by not writing anything. This leaves
SPARC and Mips. Both currently just return false if the padding is not a
multiply of 32bit and the caller just reports an error.

All this makes me wonder:
(1) Why do we allow the backend to fail at all? Shouldn't the
"pad-with-0" or so behavior be the default?
(2) What is the expected order? Pad to instruction size first or last?

Joerg

Hi all,
all ELF platforms at least and likely all others too allow something
like the following:

    .text
    .asciz "arbitrary long string"
    .p2align 3

Depending on the size of the string, MCAsmBackend::writeNopData is
called to pad text. For x86 and other backends with byte-sized
instructions, this is no problem. Some backends like ARM and PPC
flush explicitly to 16bit/32bit boundaries. There is an interested
question on whether the non-instructions should be leading or trailing
-- I think the behavior in ARM and PPC is wrong in this regard.

I am pretty sure I implemented the PowerPC behavior in r191426.

I would be in favor of the following:
1. If the start is aligned *and* the length is aligned, use nops.
2. If the start is aligned but the length is not aligned, insert as many
nops possible but pad out with zero.
3. Otherwise (if the start is misaligned), use *just* zeros.

The R600

aligned place to pad. As such seems the correct behavior is to assume
that end will be aligned and "end - k * instruction size" should be
padded with nops and [start, end - k * instruction size) should be
padded with plain nulls?

Joerg

All this makes me wonder:
(1) Why do we allow the backend to fail at all? Shouldn't the
"pad-with-0" or so behavior be the default?

Probably, yes. I can’t think of a counterargument, anyway.

(2) What is the expected order? Pad to instruction size first or last?

The X86 implementation specifics here were chosen simply to match the cctools as(1) implementation, as that made doing things like binary diffs of the output easier when first bringing up the integrated assembler. Now that we’re long past that, if there’s something more compelling we should do instead, let’s do that.

For example, it’s been requested from time to time that we pad (between functions) with UD2 instead on x86. That seems a reasonable thing to consider, though it would have to be measured carefully for impact on branch prediction, return address prediction, etc.. Even in theoretically unreachable code, there can be interesting interactions in general with blobs of anything in between functions and I don’t know how (or whether) UD2 interacts with any of that. This would also require disambiguating padding inside a function for things like basic block alignment vs. padding in between functions and other places that are supposed to be unreachable for execution.