Alignment of pointee

Hi all,

Is there a way to express in the IR that a pointer's value is a multiple of, say, 32 bytes? I.e. the data the pointer points to has an alignment of 32 bytes. I am not meaning the natural alignment determined by the object's size. I am talking about a double* pointer and like to explicitly overestimate the alignment.

I am trying to add this pointer as a function's argument, so that later aligned (vector-) loads would get generated.

See the pseudo code of what I try to accomplish:

define void @foo( double* noalias %arg0 )
{
    // switching to C style
   for( int outer=0 ; outer < end ; ++outer ) {
     for( int inner=0 ; inner < 4 ; ++inner ) {
arg0[ outer*4 + inner ] += arg0[ outer*4 + inner ];
   }
}

The loop vectorizer does its job on the 'inner' loop and generates vector loads/adds/stores for this code. However, the vector loads/stores are not optimally aligned as they could be resulting a lot of boilerplate code produced in codegen (lots of permutations).

After vectorization the code looks similar to

define void @foo( double* noalias %arg0 )
{
    // switching to C style
   for( int outer=0 ; outer < end ; ++outer ) {

vector.body: ; preds = %vector.body, %L5
   %index = phi i64 [ 0, %L5 ], [ %index.next, %vector.body ]
   %42 = add i64 %7, %index
   %43 = getelementptr double* %arg1, i64 %42
   %44 = bitcast double* %43 to <4 x double>*
   %wide.load = load <4 x double>* %44, align 8

   %132 = fadd <4 x double> %wide.load, %wide.load54

   %364 = getelementptr double* %arg0, i64 %93
   %365 = bitcast double* %364 to <4 x double>*
   store <4 x double> %329, <4 x double>* %365, align 8
   }
}

One can see that if the initial alignment of the pointee of %arg0 was 32 bytes and since the vectorizer operates on a loop with a fixed trip count of 4 and the size of double is 8 bytes, the vector loads and stores could be ideally aligned with 32 bytes (which on my target architecture would result in vector loads without additional permutations.

Is it somehow possible to achieve this? I am generating the IR with the builder, i.e. I am not coming from C or clang.

Thank you,
Frank

If you are generating the loads and stores, you could just set the alignment to whatever you want, i.e. 32 bytes in your case.

I have wondered about it in a general case, when you simply want to have an alignment information on the pointer, and not on loads/stores. My idea was to invent a builtin, something like "assert_aligned", that does nothing, other than manifest the alignment by the fact of its existence. For example:
   %argp = call i8* llvm.assert.aligned(%arg0, 32)
would state that the pointer %argp is aligned to 32 bytes, and the value of it is the same as %arg0 at the place of the "call".

That was a while ago and maybe there are other ways of doing it now.

-Krzysztof

One can see that if the initial alignment of the pointee of %arg0 was 32
bytes and since the vectorizer operates on a loop with a fixed trip
count of 4 and the size of double is 8 bytes, the vector loads and
stores could be ideally aligned with 32 bytes (which on my target
architecture would result in vector loads without additional permutations.

Is it somehow possible to achieve this? I am generating the IR with the
builder, i.e. I am not coming from C or clang.

If you are generating the loads and stores, you could just set the alignment to whatever you want, i.e. 32 bytes in your case.

I can't. Take a look again at the first piece of code. The loads occur in the 'inner' loop. Only for the first iteration the alignment of 32 bytes is true, not for iteration 2, 3 and 4. So, the alignment information cannot enter at the point of loading. Thus, the idea of attaching the information right at the pointer's definition, i.e. as the argument.

I have wondered about it in a general case, when you simply want to have an alignment information on the pointer, and not on loads/stores. My idea was to invent a builtin, something like "assert_aligned", that does nothing, other than manifest the alignment by the fact of its existence. For example:
  %argp = call i8* llvm.assert.aligned(%arg0, 32)
would state that the pointer %argp is aligned to 32 bytes, and the value of it is the same as %arg0 at the place of the "call".

That was a while ago and maybe there are other ways of doing it now.

Should be doable this way. Although I am not sure whether a assertion or an annotation would be cleaner.

There should already be a solution.

That would be quite helpful, especially alignment information for pointers passed into a function. There is "alignment" information for pointer arguments, but it only applied to "byval" pointers, at least back when I was actively interested in it.

I'm wondering if there already exists a common approach to this problem. Some kind of metadata perhaps?

-Krzysztof

From: "Frank Winter" <fwinter@jlab.org>
To: llvmdev@cs.uiuc.edu
Sent: Tuesday, March 25, 2014 9:23:59 AM
Subject: Re: [LLVMdev] Alignment of pointee

>>
>> One can see that if the initial alignment of the pointee of %arg0
>> was 32
>> bytes and since the vectorizer operates on a loop with a fixed
>> trip
>> count of 4 and the size of double is 8 bytes, the vector loads and
>> stores could be ideally aligned with 32 bytes (which on my target
>> architecture would result in vector loads without additional
>> permutations.
>>
>> Is it somehow possible to achieve this? I am generating the IR
>> with the
>> builder, i.e. I am not coming from C or clang.
>
> If you are generating the loads and stores, you could just set the
> alignment to whatever you want, i.e. 32 bytes in your case.
I can't. Take a look again at the first piece of code. The loads
occur
in the 'inner' loop. Only for the first iteration the alignment of 32
bytes is true, not for iteration 2, 3 and 4. So, the alignment
information cannot enter at the point of loading. Thus, the idea of
attaching the information right at the pointer's definition, i.e. as
the
argument.

I had a patchset that I had posted for review some months (a year?) ago to implement __builtin_assume_aligned. This ended up side-tracked in a discussion re: generalized invariants. We should, however, certainly revisit this soon.

-Hal

Related to this subject is the attribute(aligned(X)) that can be set on a type in C/C++. It is being used when generating the load / stores / memcpy / … but is lost with respect to the type’s attribute. In many a case this could help various analysis or transforms to provide more accurate results when such a type is used. The __builtin_assume_aligned could be an way to solve this.

Cheers,

From: "Arnaud Allard de Grandmaison" <arnaud.adegm@gmail.com>
To: "Hal Finkel" <hfinkel@anl.gov>
Cc: "Frank Winter" <fwinter@jlab.org>, "LLVM Developers Mailing List" <llvmdev@cs.uiuc.edu>
Sent: Tuesday, March 25, 2014 12:21:53 PM
Subject: Re: [LLVMdev] Alignment of pointee

Related to this subject is the __attribute__(aligned(X)) that can be
set on a type in C/C++. It is being used when generating the load /
stores / memcpy / ... but is lost with respect to the type's
attribute. In many a case this could help various analysis or
transforms to provide more accurate results when such a type is
used. The __builtin_assume_aligned could be an way to solve this.

Also, FWIW, at least one user of mine is a big fan of Intel's new align_value attribute (which is, at least, discussed here: http://software.intel.com/en-us/articles/data-alignment-to-assist-vectorization), because it plays well with TMP and other related C++ programming techniques (better than __builtin_assume_aligned).

-Hal

What's the status of providing alignment information for a pointer? Can I build an aligned pointer, e.g. a pointer with its pointee 32 bytes aligned (float*) and use it as a function's parameter? This would be very helpful for optimization passes to get the alignment info on load/stores right. Which in turn is a crucial performance feature (and on some architectures necessary to generate legal vectorized code).

Best wishes,
Frank

Hi Frank,

Two things:

1. For function parameters, you can now provide the align attribute for pointer types so specify the pointer alignments (see the language reference).

2. The llvm.assume intrinsic will provide this feature. The necessary patches are undergoing review. I expect them to land within the next month.

Thanks again,
Hal