PR400 - alignment for LD/ST

Here’s a related question. It seems that there might be a benefit in knowing
about two alignment values for a load/store. The alignment of the load/store
itself, but potentially also the alignment of the base pointer used for the
load/store. Having an alignment attribute on pointer types would solve both
these issues, but having a single alignment attribute on loads/stores
doesn’t. This would lead me to propose having an alignment attribute for
getelementptr. Thoughts?

I may be misunderstanding, if so, please correct me. However, I think
you’re trying to bring addressing modes into this. Many targets support
“reg+immediate” addressing modes… I assume you’re trying to get an
alignment value for the ‘reg’ part, without the “immed” part?

You are correct.

This approach would have a couple of problems. Instead, if you wanted a
more general alignment model, I’d suggest going for representing
alignments as “offset from alignment”.

In this model, you represent each alignment value as a pair <align,offs>,
where offs is always less than align. This allows you to say that “this
load is 2 bytes away from a 16-byte aligned pointer” for example.

Shouldn’t it be, “this load is a multiple of 2-bytes away from a 16-byte aligned pointer”, isn’t that more general?

The case I’m dealing with directly pertains to indexing arrays that are themselves aligned. This information allows loads/stores (say in an unrolled loop) to be coalesced as they can be determined to reference to the same memory.

This would mean that the alignment of the load itself will be align if no offset is provided and min(align, offs) if an offset alignment is provided. Do you think it will be difficult for the front-end to provide these two alignments?

Also, I’ve noticed that a some transformations on Loads/Stores don’t preserve either volatility or alignment information.

In this model, you represent each alignment value as a pair <align,offs>,
where offs is always less than align. This allows you to say that "this
load is 2 bytes away from a 16-byte aligned pointer" for example.

Shouldn't it be, "this load is a multiple of 2-bytes away from a 16-byte aligned pointer", isn't that more general?

It is both more general and less precise :slight_smile:

The case I'm dealing with directly pertains to indexing arrays that are themselves aligned. This information allows loads/stores (say in an unrolled loop) to be coalesced as they can be determined to reference to the same memory.

Ok.

This would mean that the alignment of the load itself will be align if no offset is provided and min(align, offs) if an offset alignment is provided. Do you think it will be difficult for the front-end to provide these two alignments?

The front-end won't be able to give you information about a loop of code that traverses an array, but subsequent optimization/analysis passes can.

I think that either form of information would be easy to get, but I don't know what the tradeoffs are (loss of generality or loss of precision). Devang, do you have any thoughts on this or idea of how it would impact a vectorizor?

Also, I've noticed that a some transformations on Loads/Stores don't preserve either volatility or alignment information.

I'm not suprised about the alignment piece (it hasn't been filled in yet) but not preserving volatility is definitely a bug.

-Chris

When you say "load is multiple of 4 bytes away from a 8-byte aligned data"
it is not clear whether it is 16-byte aligned or not. However, "load is 4 bytes
away from a 8-byte aligned data" is clear - it is aligned at 12-byte and not
16-byte.

However, that means, for loops this becomes "load is N bytes away
from a 8-byte aligned data" where N is dependent on IV.

So in the loop case analysis of the IV is necessary to determine the actual alignment?
Would the form for the loops require that the offset be tracked as a multiple of the IV stride?
Would the IV stride analysis normally be performed in the front end (and thus end up in the BC)?

If this is the the case then I’d take Chris’s suggestion of the <align, offs> pair, as it is clear in the static offset case and the actual alignment could be deduced, along with an <align, stride> pair for the IV case. This the previous meaning of simply would be equivalent to <align, stride> where align == stride.

However, that means, for loops this becomes “load is N bytes away
from a 8-byte aligned data” where N is dependent on IV.

I said, this but I am not sure it is OK to say

%tmp = load i32* %tmp1, align N

So in the loop case analysis of the IV is necessary to determine the actual alignment?

only, if array index is based on IV

Would the form for the loops require that the offset be tracked as a multiple of the IV stride?

How to represent

A[ I + 4], where I is IV
?

Would the IV stride analysis normally be performed in the front end (and thus end up in the BC)?

unlikely. Induction variable analysis pass is more suitable to provide this info.

Ok. This got a little over my head, but what I take away is:

  1. I’ll submit a patch for just the alias parameter on loads and stores. The alias parameter indicates the final alignment of the load or store (nothing to do with base+imm or base+offset addressing).

  2. The more sophisticated alignment information for vectorization of loads/stores will likely have to be implemented as an analysis pass and wouldn’t need to be represented in the BC or assembly.

Ok I think this makes sense. We can always make the information more rich later down the line. Thanks!

-Chris