Restrict qualifier on class members

Hi,

I’m trying to abstract some special pointers with a class, like in the example program below:

1 #define __remote attribute((address_space(1)))
2 #include <stdint.h>
3
4 __remote int* A;
5 __remote int* B;
6
7 class RemotePtr {
8 private:
9 __remote int* __restrict a;
10
11 public:
12 RemotePtr(__remote int* a) : a(a) {}
13
14 __remote int& at(int n) {
15 return a[n];
16 }
17 };
18
19 int main(int argc, char** argv) {
20 RemotePtr a(A);
21 RemotePtr b(B);
22
23 #pragma unroll 4
24 for(int i=0; i<4; ++i) {
25 a.at(i) += b.at(i);
26 }
27
28 return 0;
29 }

It’s given that pointer a, in each object of the class RemotePtr, is the only pointer that can access the array pointed by it. So, I tried __remote int* __restrict a; (line 9) construct to tell Clang the same. This doesn’t seem to work and I see no noliass in the generated IR. Specifically, I want lines 23-26 optimized assuming no aliasing between A and B. Any reason why Clang shouldn’t annotate memory accesses in lines 23-26 with noaliass taking line 9 into account?

The higher level problem is this: is there a way to compile lines 23-26 assuming no aliasing between A and B, by just doing something in the RemotePtr class (so that main is clear of ugly code)? If that’s not possible, is there a way to tell Clang that lines 23-26 should assume no aliasing at all, by some pragma?

Thank you,
Bandhav

Hi Bandhav,

Jeroen Dobbelaere (CC’ed) is currently working on support for restrict qualified local variables and struct members.

The patches exist but are not merged yet. If you want to give it a try apply .

Initially I could only think of this solution for your problem:

Michael (CC’ed) might now another annotation to get llvm.access metadata for the loop, which should do what you intend.

Cheers,

Johannes

Unfortunately https://llvm.org/docs/LangRef.html#llvm-loop-parallel-accesses-metadata
is not a solution here. A loop-parallel access does not imply
non-aliasing. The obvious case is when only reading from a location,
but even when a location is written to I'd be careful to deduce that
they do not alias since it might be a "benign data race" or the value
never used. Additionally, LLVM's loop unroller is known to now handle
noalias metadata correctly as it just copies it.

There has been a discussion here:
http://lists.llvm.org/pipermail/llvm-dev/2020-May/141587.html

Michael

Hi Bandhav,

I was originally going to cover this in my now defunct EuroLLVM talk but… we had this exact same problem on Unity’s HPC# Burst compiler - how to track no-aliasing on structs. We were constrained in that we had to make it work with LLVM versions all the way back to shipped LLVM 6, so what we did was:

  • Add module-level metadata that tracked whether a given struct member field was no-alias.
  • Added our own alias analysis using createExternalAAWrapperPass to register it in the pass pipeline.
    This allowed us to have zero modifications to LLVM and do something useful with aliasing. The one ‘issue’ with it is if you have a stack-allocated struct that is SROA’ed you will lose the info that it was a struct, or if you are in a private/internal linkage function that has the struct as an argument, the opt passes can modify the function signature to lose the struct too. We had to do some mitigations here to get perfect aliasing on our usecases.

Hope this helps,
-Neil.

Hi Jeroen,

That’s great! I was trying to use the patch, what’s the latest version of the project we could apply it on?

Hi Neil,

That seems like what I can do as well! Do you happen to have some examples lying around? Maybe a pointer to the planned presentation, if that’s okay?

Thank you,
Bandhav

Hi Bandhav,

as mentioned in the summary of https://reviews.llvm.org/D69542 :

The base version is b2a37cfe2bda0bc8c4d2e981922b5ac59c429bdc (June 12, 2020)

Greetings,

Jeroen Dobbelaere

image001.jpg

ATT00001.txt (280 Bytes)

Hi Jeroen,

Sorry, I missed that. I tried the patch, and this program:

#include <stdint.h>

#define __remote attribute((address_space(1)))

__remote int* A;
__remote int* B;

void vec_add(__remote int* __restrict a,
__remote int* __restrict b,
int n) {
#pragma unroll 4
for(int i=0; i<n; ++i) {
a[i] += b[i];
}
}

int main(int argc, char** argv) {
__remote int* __restrict a = A;
__remote int* __restrict b = B;

#pragma unroll 4
for(int i=0; i<4; ++i) {
a[i] += b[i];
}

return 0;
}

vec_add give following schedule:

*** Final schedule for %bb.8 ***
SU(0): %33:gpr = LW %56:gpr, -8 :: (load 4 from %ir.scevgep8, !tbaa !14, !noalias !13, addrspace 1)
SU(1): %34:gpr = LW %55:gpr, -8 :: (load 4 from %ir.scevgep14, !tbaa !14, !noalias !13, addrspace 1)
SU(4): %36:gpr = LW %56:gpr, -4 :: (load 4 from %ir.scevgep10, !tbaa !14, !noalias !13, addrspace 1)
SU(5): %37:gpr = LW %55:gpr, -4 :: (load 4 from %ir.scevgep16, !tbaa !14, !noalias !13, addrspace 1)
SU(8): %39:gpr = LW %56:gpr, 0 :: (load 4 from %ir.lsr.iv6, !tbaa !14, !noalias !13, addrspace 1)
SU(9): %40:gpr = LW %55:gpr, 0 :: (load 4 from %ir.lsr.iv12, !tbaa !14, !noalias !13, addrspace 1)
SU(12): %42:gpr = LW %56:gpr, 4 :: (load 4 from %ir.scevgep9, !tbaa !14, !noalias !13, addrspace 1)
SU(13): %43:gpr = LW %55:gpr, 4 :: (load 4 from %ir.scevgep15, !tbaa !14, !noalias !13, addrspace 1)
SU(2): %35:gpr = nsw ADD %34:gpr, %33:gpr
SU(3): SW %35:gpr, %55:gpr, -8 :: (store 4 into %ir.scevgep14, !tbaa !14, !noalias !13, addrspace 1)
SU(6): %38:gpr = nsw ADD %37:gpr, %36:gpr
SU(7): SW %38:gpr, %55:gpr, -4 :: (store 4 into %ir.scevgep16, !tbaa !14, !noalias !13, addrspace 1)
SU(10): %41:gpr = nsw ADD %40:gpr, %39:gpr
SU(11): SW %41:gpr, %55:gpr, 0 :: (store 4 into %ir.lsr.iv12, !tbaa !14, !noalias !13, addrspace 1)
SU(14): %44:gpr = nsw ADD %43:gpr, %42:gpr
SU(15): SW %44:gpr, %55:gpr, 4 :: (store 4 into %ir.scevgep15, !tbaa !14, !noalias !13, addrspace 1)
SU(16): %57:gpr = nuw nsw ADDI %57:gpr, 4
SU(17): %56:gpr = ADDI %56:gpr, 16
SU(18): %55:gpr = ADDI %55:gpr, 16

And main gives following schedule:

*** Final schedule for %bb.0 ***
SU(0): %2:gpr = LUI target-flags(riscv-hi) @A
SU(2): %4:gpr = LUI target-flags(riscv-hi) @B
SU(3): %5:gpr = LW %4:gpr, target-flags(riscv-lo) @B :: (dereferenceable load 4 from @B, !tbaa !9, !noalias !22)
SU(1): %3:gpr = LW %2:gpr, target-flags(riscv-lo) @A :: (dereferenceable load 4 from @A, !tbaa !9, !noalias !22)
SU(4): %6:gpr = LW %5:gpr, 0 :: (load 4 from %ir.3, !tbaa !14, !noalias !22, addrspace 1)
SU(5): %7:gpr = LW %3:gpr, 0 :: (load 4 from %ir.1, !tbaa !14, !noalias !22, addrspace 1)
SU(6): %8:gpr = nsw ADD %7:gpr, %6:gpr
SU(7): SW %8:gpr, %3:gpr, 0 :: (store 4 into %ir.1, !tbaa !14, !noalias !22, addrspace 1)
SU(8): %9:gpr = LW %5:gpr, 4 :: (load 4 from %ir.arrayidx.1, !tbaa !14, !noalias !22, addrspace 1)
SU(9): %10:gpr = LW %3:gpr, 4 :: (load 4 from %ir.arrayidx1.1, !tbaa !14, !noalias !22, addrspace 1)
SU(10): %11:gpr = nsw ADD %10:gpr, %9:gpr
SU(11): SW %11:gpr, %3:gpr, 4 :: (store 4 into %ir.arrayidx1.1, !tbaa !14, !noalias !22, addrspace 1)
SU(12): %12:gpr = LW %5:gpr, 8 :: (load 4 from %ir.arrayidx.2, !tbaa !14, !noalias !22, addrspace 1)
SU(13): %13:gpr = LW %3:gpr, 8 :: (load 4 from %ir.arrayidx1.2, !tbaa !14, !noalias !22, addrspace 1)
SU(14): %14:gpr = nsw ADD %13:gpr, %12:gpr
SU(15): SW %14:gpr, %3:gpr, 8 :: (store 4 into %ir.arrayidx1.2, !tbaa !14, !noalias !22, addrspace 1)
SU(16): %15:gpr = LW %5:gpr, 12 :: (load 4 from %ir.arrayidx.3, !tbaa !14, !noalias !22, addrspace 1)
SU(17): %16:gpr = LW %3:gpr, 12 :: (load 4 from %ir.arrayidx1.3, !tbaa !14, !noalias !22, addrspace 1)
SU(18): %17:gpr = nsw ADD %16:gpr, %15:gpr
SU(20): $x10 = COPY $x0
SU(19): SW %17:gpr, %3:gpr, 12 :: (store 4 into %ir.arrayidx1.3, !tbaa !14, !noalias !22, addrspace 1)

This is great! Memory accesses are marked noalias. I wanted memory accesses to be annotated as noalias to basically remove loop-carried dependencies so that I can reorder them for efficient scheduling. But when I look at Schedule DAG,

For vec_add I see something like this (note BotQ.A, scheduler can choose any of those => no loop carried dependence):

  • Latency limited.
    ** ScheduleDAGMILive::schedule picking next node
    Queue BotQ.P:
    Queue BotQ.A: 16 15 11 7 3
    Cand SU(16) ORDER
    Pick Bot ORDER

For main, at best I see something like this:
** ScheduleDAGMILive::schedule picking next node
Cycle: 45 BotQ.A
Queue BotQ.P:
Queue BotQ.A: 12 13
Cand SU(12) ORDER
Cand SU(13) ORDER

In theory, schedules for vec_add and main should be the same right? Is there anything else I should do to make the __restrict remove loop-carried dependence in main?

Attaching IR and scheduler log for reference…

image001.jpg

tmp.log (78.1 KB)

tmp.ll (10.7 KB)

Hi Bandhav,

I did notice in the previous example, that the vectorizer used the noalias information, but that it also got stripped during the vectorization.

That is certainly one of the places where the noalias handling can be improved.

It would be interesting to see if the necessary information gets through to the MIR level when vectorization is disabled.

That should be visible with the ‘ptr_provenance’ field.

Greetings,

Jeroen Dobbelaere

image001.jpg

Hi Jeroen,

Does that mean in this case, even though the frontend correctly interprets __restrict on struct fields, the backend is only considering nolias metadata on function arguments?

Thanks,
Bandhav

image001.jpg

Hi Bandhav,

When an optimization pass modifies load/stores in a way that the provenance is lost (like the vectorizer sometimes does), the ‘noalias function arguments’ might indeed

provide a fallback.

I checked that when vectorization is disabled, the ptr_provenance information remains available.

(using: -fno-slp-vectorize -fno-vectorize -fno-tree-vectorize )

When the ptr_provenance information is available, the scheduler will make use of it. You can use following diff to ensure that the info is also printed at MIR level.

diff --git a/llvm/lib/CodeGen/MachineOperand.cpp b/llvm/lib/CodeGen/MachineOperand.cpp

index 5e4d5edb9ce6…fc871e756e03 100644

— a/llvm/lib/CodeGen/MachineOperand.cpp

+++ b/llvm/lib/CodeGen/MachineOperand.cpp

@@ -1166,6 +1166,10 @@ void MachineMemOperand::print(raw_ostream &OS, ModuleSlotTracker &MST,

OS << ", !noalias ";

AAInfo.NoAlias->printAsOperand(OS, MST);

}

  • if (AAInfo.NoAliasProvenance) {

  • OS << ", !ptr_provenance ";

  • AAInfo.NoAliasProvenance->printAsOperand(OS, true, MST);

  • }

if (getRanges()) {

OS << ", !range ";

Greetings,

Jeroen Dobbelaere

image001.jpg