Another struct-return question

For this C code:

     typedef struct s2 {
       char s2C1 , s2C2;
     } s2td;

clang generates:

     %struct.s2 = type { i8, i8 }

which I lets llvm decide on the actual layout of this type.

For the return statement in:

     struct s2 fs2 ( char fs2p1 ) {
       struct s2 ls2;
       ls2.s2C1 = 'B';
       ls2.s2C2 = fs2p1;
       return ls2;
     }

I see this IR:

     %struct.s2 = type { i8, i8 }

     define i16 @fs2(i8 signext %fs2p1) #0 {
     entry:
       %retval = alloca %struct.s2, align 1 ; [#uses=2 type=%struct.s2*]
       ...
       %3 = bitcast %struct.s2* %retval to i16*, !dbg !46 ; [#uses=1 type=i16*] [debug line = 25:0]
       %4 = load i16* %3, align 1, !dbg !46 ; [#uses=1 type=i16] [debug line = 25:0]
       ret i16 %4, !dbg !46 ; [debug line = 25:0]
     }

which returns the struct in a scalar i16.

I have three questions about this.

1) Larger structs are returned differently, via memcpy. Do
     these methods of returning struct values show through in
     the ultimately generated machine code? It seems hard to
     imagine that, of the many different target code generators
     in llvm, there would not be at least some with standardized
     ABIs that differ in such respects. Does llvm make target-
     dependent transformations for different targets to match
     their ABIs? Or do I have to do that at the level of generating
     llvm IR?

2) To correctly return the value using a bitcast, as in the example, the
     front end has to independently and correctly duplicate the layout that
     llvm will produce. This seems both very fragile and difficult to
     diagnose when it fails. My front end already does record layout, but
     I had previously decided, after a discussion on this list, that it was
     better to let llvm do it. Any advice on the best way here?

3) I am also a little worried about the implications of returning an
     i16, with alignment of 1. Won't this create trouble somewhere, or
     at least lose some benefit of returning as a scalar?

1) Larger structs are returned differently, via memcpy. Do
    these methods of returning struct values show through in
    the ultimately generated machine code? It seems hard to
    imagine that, of the many different target code generators
    in llvm, there would not be at least some with standardized
    ABIs that differ in such respects. Does llvm make target-
    dependent transformations for different targets to match
    their ABIs? Or do I have to do that at the level of generating
    llvm IR?

LLVM handles the low-level ABI details like what registers to use for
arguments, but frontends unfortunately need to handle lots of ABI issues
around struct passing. LLVM isn't really responsible for transforming IR to
make it match any particular ABI.

2) To correctly return the value using a bitcast, as in the example, the
    front end has to independently and correctly duplicate the layout that
    llvm will produce. This seems both very fragile and difficult to
    diagnose when it fails. My front end already does record layout, but
    I had previously decided, after a discussion on this list, that it was
    better to let llvm do it. Any advice on the best way here?

Personally, I wouldn't recommend letting LLVM do struct layout. I would
recommend creating high-level LLVM struct types, but the frontend should
use packed struct types to precisely control the layout. In that way, the
frontend can still make assumptions about the exact layout in memory. Make
sense?

In this particular case, probably all you need to know is the size of the
struct, and notice that it is small. I would try to find the Sys V ABI docs
to get the threshold or check the Clang source code.

3) I am also a little worried about the implications of returning an
    i16, with alignment of 1. Won't this create trouble somewhere, or
    at least lose some benefit of returning as a scalar?

First, the optimizer will typically remove the alloca and the load. Second,
the low alignment on the alloca and load looks like a bug in Clang.

    1) Larger structs are returned differently, via memcpy. Do
         these methods of returning struct values show through in
         the ultimately generated machine code? It seems hard to
         imagine that, of the many different target code generators
         in llvm, there would not be at least some with standardized
         ABIs that differ in such respects. Does llvm make target-
         dependent transformations for different targets to match
         their ABIs? Or do I have to do that at the level of generating
         llvm IR?

LLVM handles the low-level ABI details like what registers to use for arguments, but frontends unfortunately need to handle lots of ABI issues around struct passing. LLVM isn't really responsible for transforming IR to make it match any particular ABI.

    2) To correctly return the value using a bitcast, as in the example, the
         front end has to independently and correctly duplicate the layout that
         llvm will produce. This seems both very fragile and difficult to
         diagnose when it fails. My front end already does record layout, but
         I had previously decided, after a discussion on this list, that it was
         better to let llvm do it. Any advice on the best way here?

Personally, I wouldn't recommend letting LLVM do struct layout. I would recommend creating high-level LLVM struct types, but the frontend should use packed struct types to precisely control the layout. In that way, the frontend can still make assumptions about the exact layout in memory. Make sense?

So, if I build the llvm struct type with packed attribute, will it just put every field in the
next available bit? I see the two ways of accessing fields (GEP) and insertvalue/extractvalue
both identify the field with a field sequence number, so I would have to be sure I could control
the way llvm laid the struct out.

Yes, when you set the packed attribute, each field is laid out on the next
available byte. I believe non-byte sized integers are rounded up in size
the next byte. To handle padding gaps, the frontend needs to manually
insert padding fields, and maintain a mapping from frontend field to LLVM
field number. Padding is typically an [i8 x N] array where N is the
appropriate size to bring you to the byte boundary of the next field. The
advantage is that you can be 100% sure that LLVM and your frontend agree on
the layout of the struct, while unpacked structs will have different
layouts on different targets.

The fact that your frontend seems to want to think about things in terms of
bits suggests that your next question will be about bitfields. For
bitfields, Clang will emit a large integer for all the bits and emit a wide
load with extra masking code to extract the relevant bits. All the
downstream optimizations are designed to handle this as input, so we should
generate good code.

    So, if I build the llvm struct type with packed attribute, will it just put every field in the
    next available bit? I see the two ways of accessing fields (GEP) and insertvalue/extractvalue
    both identify the field with a field sequence number, so I would have to be sure I could control
    the way llvm laid the struct out.

Yes, when you set the packed attribute, each field is laid out on the next available byte. I believe non-byte sized integers are rounded up in size the next byte. To handle padding gaps, the frontend needs to manually insert padding fields, and maintain a mapping from frontend field to LLVM field number. Padding is typically an [i8 x N] array where N is the appropriate size to bring you to the byte boundary of the next field. The advantage is that you can be 100% sure that LLVM and your frontend agree on the layout of the struct, while unpacked structs will have different layouts on different targets.

Ah, bytes, not bits. But I can use that for all fields that start and end on byte boundaries.

FWIW, the reason I care so much is that my front end IR operators that access fields
use bit offsets within the struct and bit sizes, rather than anything that would
identify a field. But I can de-lower (spell check hates me for that) this to field
numbers if there is agreement on the layout.

The fact that your frontend seems to want to think about things in terms of bits suggests that your next question will be about bitfields. For bitfields, Clang will emit a large integer for all the bits and emit a wide load with extra masking code to extract the relevant bits. All the downstream optimizations are designed to handle this as input, so we should generate good code.

Yes, that would have been my next question. I like the answer, as I was worried
about losing optimizations if I generated shift-and-mask code, which I will use
for non-whole-byte fields.

So the next question is, what about fields that that occupy only whole bytes, but
are not 2^n bytes or aren't aligned to their size. Should I treat these as
bitfields and produce shift-and-mask operations to access them?

Thanks for the advice.

Rodney Bates
rodney.m.bates@acm.org

So the next question is, what about fields that that occupy only whole
bytes, but
are not 2^n bytes or aren't aligned to their size. Should I treat these as
bitfields and produce shift-and-mask operations to access them?

I would represent this with unaligned accesses. You can set the alignment
on loads and stores generated for field access down to what the struct
layout guarantees.

I just checked, and the Sparc backend will splice up such loads into
individual byte accesses.

Thanks for the advice.

No problem. :slight_smile: