Alignment analysis

Hi,

I have been looking at a way to get alignment information propagation
on load/store instructions. I see that instcombine does some alignment
propagation based on value tracking helper ComputeKnownBits, however
because of compile time heuristic it won't recurse much when the value
comes from a phi
(https://github.com/llvm/llvm-project/blob/d2e8fb331835fcc565929720781a5fd64e66fc17/llvm/lib/Analysis/ValueTracking.cpp#L1550)

Is there a way to do more aggressive alignment propagation? I'm
generating code going through the NVPTX target and because of the
missing alignment information the backend will have to break up vector
loads into scalar ones causing performance problems. (I also tried
AlignmentFromAssumptionsPass but it doesn't help).

I assume this may be a problem for other backends as well. How is it handled?

Here is a simple showing how limited the propagation is:
https://godbolt.org/z/ej1vWb7ax

In the code below `p[offset]` should be aligned as the offset comes
from a phi(0, offset << 8) however the alignment analysis doesn't
detect it:

float *getAlignedPtr() __attribute__((assume_aligned (32)));
void anchor();

float loadaligned(int offset, bool c) {
float* p = getAlignedPtr();
int offset1 = offset << 8;
if(c) {
anchor();
offset = offset1;
} else {
offset = 0;
}
// p[offset1] and p[0] are marked as aligned but not p[offset]
return p[offset] + p[offset1] + p[0];
}

Hi Thomas,

the `sext` of offset confuses the alignment computation.
If you make offset and offset1 `long int` it seems to work
fine: https://godbolt.org/z/zYdro6YPz

I haven't dug in deeper but this might help. Probably worth
to file a bug/issue.

~ Johannes

Hi Johannes,

It is not really the `sext` that is confusing the analysis, the
problem is that the computeKnownBits will only go one level up after
the phi, so if there is any instruction in between the `shl` and the
phi the analysis will fail.

Here is the same example with long int and an extra `add` in between:
https://godbolt.org/z/r8asbaWWc

This is done on purpose due to this limitation:

// Recurse, but cap the recursion to one level, because we don't
// want to waste time spinning around in loops.
computeKnownBits(IncValue, Known2, MaxAnalysisRecursionDepth - 1, RecQ);

https://github.com/llvm/llvm-project/blob/d2e8fb331835fcc565929720781a5fd64e66fc17/llvm/lib/Analysis/ValueTracking.cpp#L1548

Good point.

Cut-offs are always tricky. One could certainly make
this a command line option though.

~ Johannes