PR400 - alignment for LD/ST

Is anyone actively working on this currently? It’s marked as unassigned in BZ.

In next few days I'll get to this. At the moment I'm enabling Packed Structure support in llvm-gcc (based on patches developed by other llvm developers) and adding support for packed bit fields. As part of this work, alignment for LD/ST is required to access packed structure fields which are not aligned naturally.

Is anyone actively working on this currently? It's marked as
unassigned in BZ.

I don't think so.

Reid.

It would be really nice if llvm-as had some extra syntax to let other
compilers specify the exact layout of a struct. Specifically, I'd like
to specify what the byte offsets are for each field without having to
insert phony fields. Would this extra functionality fit into your work
at all?

Not completely. I think, LLVM Packed Structure will help make it
little bit easy. See
  http://llvm.org/docs/LangRef.html#t_pstruct
if it helps.

Christopher,

Hi Mark,

It would be really nice if llvm-as had some extra syntax to let other
compilers specify the exact layout of a struct. Specifically, I'd like
to specify what the byte offsets are for each field without having to
insert phony fields. Would this extra functionality fit into your work
at all?

I suggest you open a Bugzilla PR on this as "enhancement" and detail
exactly what your proposal is. There's lots of ways to skin this cat. I
would actually like to see the structure definition generalized so that
we accomplish the following:

1. We don't need "packed struct" at all (because its redundant with
"struct")
2. Structure definitions are expanded to not only specify the size of
each field
   (by its type) but also its byte width (for padding) and alignment.
3. Without specific padding/alignment specifications, LLVM defaults to
what is
   normal for the ABI on the given target.
4. Some kind of thing for specifying packed "portions" of a structure,
e.g.:
    struct { packed { i2, i3, i5, i6 }, i16 }
   which is just an i32 with identifiable fields. The "packed" grouping
just
   turns off the normal padding/alignment. However, as a whole the
"packed"
   grouping is aligned/padded.

Feel free to include these ideas in the PR if they make sense to you. We
can have a discussion in the PR about the best way to accomplish these
goals. I'm sure Chris and others have various comments on this.

Reid.

I’m not really sure exactly what’s left to do to fully implement what I need. I had assumed that this work would allow me to determine if a LD/ST was based on an aligned pointer or not, but this seems now like the tip of the iceberg.

I’m hoping that I can declare aligned types and retain that information when I pass a pointer to one of these types to function, or declare a pointer to an aligned type as a local variable. It seems that implementing this not only requires the alignment attribute on loads and stores (in the LLVM bc) but propagation of this information from the frontend.

Is this correct?

Yes, but the front-end already has this information. I believe that the only missing piece (for over-aligned data) is being able to represent that info on the load/store instructions.

-Chris

AFAIU, tasks are:

  1. Update LLVM LD/ST instructions to maintain alignment information
  2. Update FE to provide alignment info for LD/ST instructions if it is determined that memory reference is not aligned naturally.
  3. Update target specific code gens to respect alignment information associated with LD/ST

I’m not really sure exactly what’s left to do to fully implement what I need. I had assumed that this work would allow me to determine if a LD/ST was based on an aligned pointer or not, but this seems now like the tip of the iceberg.

I’m hoping that I can declare aligned types and retain that information when I pass a pointer to one of these types to function, or declare a pointer to an aligned type as a local variable. It seems that implementing this not only requires the alignment attribute on loads and stores (in the LLVM bc) but propagation of this information from the frontend.

AFAIU, tasks are:

  1. Update LLVM LD/ST instructions to maintain alignment information

Is this referring to the language itself, i.e. the bytecode/assembly format and associated readers/writers? If so this is probably the portion that it’s most likely I could help with.

Yes.

For the assembly format I assume that it would be a simple optional comma delimited alignment parameter at the end of the instruction, such as:

%tmp5 = load i32* %tmp4, align 4
store i32 %val, i32* %tmp3, align 4

Has the format for the extension of the load/store bytecode been determined? This would need to be applied to both normal and volatile load/store instructions, and I see three options.

  1. Add the ability to encode optional arguments on any load/store and volatile variants. This is general, but I can see why it could be bad given how frequent load/stores tend to be. Perhaps this is why pseudo ops were chosen for the volatile versions?

  2. Create pseudo opcodes for the permutations of volatility and alignment. This will consume 4 more opcodes, but it would not bloat the bytecode when natural alignment is used.

  3. Create pseudo ops for ‘attributed’ load/store instructions. These instructions contain a set of optional attributes encoded in some way. The current volatile load/store instructions could be subsumed by these instructions.

I’m leaning away from 1, but unsure whether 2 or 3 is better.

Thanks

Hi Christopher,

For the assembly format I assume that it would be a simple optional
comma delimited alignment parameter at the end of the instruction,
such as:

%tmp5 = load i32* %tmp4, align 4
store i32 %val, i32* %tmp3, align 4

Yes, correct.

Has the format for the extension of the load/store bytecode been
determined?

Nothing decided yet.

This would need to be applied to both normal and volatile load/store
instructions, and I see three options.

1. Add the ability to encode optional arguments on any load/store and
volatile variants. This is general, but I can see why it could be bad
given how frequent load/stores tend to be. Perhaps this is why pseudo
ops were chosen for the volatile versions?

2. Create pseudo opcodes for the permutations of volatility and
alignment. This will consume 4 more opcodes, but it would not bloat
the bytecode when natural alignment is used.

3. Create pseudo ops for 'attributed' load/store instructions. These
instructions contain a set of optional attributes encoded in some way.
The current volatile load/store instructions could be subsumed by
these instructions.

I'm leaning away from 1, but unsure whether 2 or 3 is better.

3 is best. opcodes are at a premium (six bits) and consuming 8 or more
of them for load/store won't work (there isn't enough left). #2 could
work but it penalizes the (hopefully more common) natural alignment
case. So, I would just conver the Store+Volatile and Load+Volatile
opcodes into Store+Attributes and Load+Attributes. Then the volatile and
alignment attributes can be added for those (relatively uncommon) cases.

Thanks for looking into this Christopher!

Reid.

Sounds good.

Also I wanted to clear something up about the meaning of the alignment attribute. My thinking is that this indicates over/under alignment with respect to natural alignment for the type. This is to say in the Load/Store instruction classes the default alignment value will be zero, meaning natural alignment for the given type. If the alignment value is nonzero then the alignment value should be used over the natural or preferred alignment.

Yes, exactly. To put this another way: if the specified alignment is less than the natural alignment for the type, then the codegen needs to be careful (e.g. use special unaligned loads). If it is more aligned, then this is a guarantee to clients about the load, who can choose to use it or not. If the specified alignment matches the natural alignment, then it can be dropped (set to zero).

-Chris

I have changes in places to implement the bytecode/assembly/instruction changes and propagate them through to SDNode creation in the back end. Woot.

When a load or store instruction of alignment 0 (natural) is visited in SelectionDAGLowering, should it create an SDNode of alignment 0 or should it use the TargetData to set the SDNode’s alignment to the preferred alignment for the load/store’s type for that particular target?

Here’s a related question. It seems that there might be a benefit in knowing about two alignment values for a load/store. The alignment of the load/store itself, but potentially also the alignment of the base pointer used for the load/store. Having an alignment attribute on pointer types would solve both these issues, but having a single alignment attribute on loads/stores doesn’t. This would lead me to propose having an alignment attribute for getelementptr. Thoughts?

I have changes in places to implement the bytecode/assembly/instruction changes and propagate them through to SDNode creation in the back end. Woot.

Great!

When a load or store instruction of alignment 0 (natural) is visited in SelectionDAGLowering, should it create an SDNode of alignment 0 or should it use the TargetData to set the SDNode's alignment to the preferred alignment for the load/store's type for that particular target?

I think it should use the target's abi alignment (not preferred alignment). This way, the codegen doesn't have to deal with alignment zero.

Congrats!

-Chris

I may be misunderstanding, if so, please correct me. However, I think you're trying to bring addressing modes into this. Many targets support "reg+immediate" addressing modes... I assume you're trying to get an alignment value for the 'reg' part, without the "immed" part?

This approach would have a couple of problems. Instead, if you wanted a more general alignment model, I'd suggest going for representing alignments as "offset from alignment".

In this model, you represent each alignment value as a pair <align,offs>, where offs is always less than align. This allows you to say that "this load is 2 bytes away from a 16-byte aligned pointer" for example.

-Chris