Hi Duncan & Tobi,
Thanks a lot for your interest, and for pointing out differences in GIMPLE I missed.
Attached is simplified test case. Is it good?
Tobi, regarding runtime alias analysis: in KernelGen we already do it along with runtime values substitution. For example:
<------------------ kernelgen_main_loop_17: compile started --------------------->
Integer args substituted:
offset = 32, ptrValue = 47248855040
offset = 40, ptrValue = 47246749696
offset = 48, ptrValue = 47247802368
offset = 16, value = 64
offset = 20, value = 64
offset = 24, value = 64
MemoryAccess to pointer: float* inttoptr (i64 47246749696 to float*)
{ Stmt__12_cloned[i0, i1, i2] → MemRef_nttoptr (i64 47246749696 to float*)[4096i0 + 64i1 + i2] }
allocSize: 4 storeSize: 4
replacedBy: { Stmt__12_cloned[i0, i1, i2] → NULL[o0] : o0 >= 47246749696 + 16384i0 + 256i1 + 4i2 and o0 <= 47246749699 + 16384i0 + 256i1 + 4i2 }
MemoryAccess to pointer: float* inttoptr (i64 47247802368 to float*)
{ Stmt__12_cloned_[i0, i1, i2] → MemRef_nttoptr (i64 47247802368 to float*)[4096i0 + 64i1 + i2] }
allocSize: 4 storeSize: 4
replacedBy: { Stmt__12_cloned_[i0, i1, i2] → NULL[o0] : o0 >= 47247802368 + 16384i0 + 256i1 + 4i2 and o0 <= 47247802371 + 16384i0 + 256i1 + 4i2 }
MemoryAccess to pointer: float* inttoptr (i64 47248855040 to float*)
{ Stmt__12_cloned_[i0, i1, i2] → MemRef_nttoptr (i64 47248855040 to float*)[4096i0 + 64i1 + i2] }
allocSize: 4 storeSize: 4
replacedBy: { Stmt__12_cloned_[i0, i1, i2] → NULL[o0] : o0 >= 47248855040 + 16384i0 + 256i1 + 4i2 and o0 <= 47248855043 + 16384i0 + 256i1 + 4i2 }
Number of good nested parallel loops: 3
Average size of loops: 64 64 64
<------------------------------ Scop: end ----------------------------------->
<------------------------------ Scop: start --------------------------------->
<------------------- Cloog AST of Scop ------------------->
for (c2=0;c2<=63;c2++) {
for (c4=0;c4<=63;c4++) {
for (c6=0;c6<=63;c6++) {
Stmt__12_cloned_(c2,c4,c6);
}
}
}
<--------------------------------------------------------->
Context:
{ : }
Statements {
Stmt__12_cloned_
Domain :=
{ Stmt__12_cloned_[i0, i1, i2] : i0 >= 0 and i0 <= 63 and i1 >= 0 and i1 <= 63 and i2 >= 0 and i2 <= 63 };
Scattering :=
{ Stmt__12_cloned_[i0, i1, i2] → scattering[0, i0, 0, i1, 0, i2, 0] };
ReadAccess :=
{ Stmt__12_cloned_[i0, i1, i2] → NULL[o0] : o0 >= 47246749696 + 16384i0 + 256i1 + 4i2 and o0 <= 47246749699 + 16384i0 + 256i1 + 4i2 };
ReadAccess :=
{ Stmt__12_cloned_[i0, i1, i2] → NULL[o0] : o0 >= 47247802368 + 16384i0 + 256i1 + 4i2 and o0 <= 47247802371 + 16384i0 + 256i1 + 4i2 };
WriteAccess :=
{ Stmt__12_cloned_[i0, i1, i2] → NULL[o0] : o0 >= 47248855040 + 16384i0 + 256i1 + 4i2 and o0 <= 47248855043 + 16384i0 + 256i1 + 4i2 };
}
<------------------------------ Scop: end ----------------------------------->
<------------------------------ Scop: dependences --------------------------->
Write after read dependences:
{ }
Read after write dependences:
{ }
Write after write dependences:
{ }
loop is parallel
loop is parallel
loop is parallel
<------------------------------ Scop: dependences end ----------------------->
1 polly-detect - Number of regions that a valid part of Scop
<------------------ __kernelgen_main_loop_17: compile completed ------------------->
It works pretty well in many situations, but in this particular case it does not help. Those problematic “Fortran scalar values referred by pointers” (FSVRPs) can only substituted (replaced by actual value) after proper memory analysis. According to current design, memory analysis operates on SCoPs, but Polly is already unable to detect SCoP for the whole group of nested loops due to presence of those FSVRPs. So, chicken and egg problem.
2013/1/2 Tobias Grosser <tobias@grosser.es>
aliasing_f.f90 (3.25 KB)