Hello,
I would like to propose and discuss representation of objects inside the physical storage to aid the generation of more optimal TBAA in Flang.
This came up during my attempt to generate distinct TBAA sub-trees for variables inside COMMON blocks: [flang] Create TBAA subtree for COMMON block variables. by vzakhari · Pull Request #153918 · llvm/llvm-project · GitHub
For the following Fortran code:
common /blk/ a, b, c
a = b + c
I would like to create TBAA tags using different sub-tree roots to make it clear that a, b, c cannot alias. The TBAA tree structure for the particular function containing a = b + c would look like this:
"global data"
|
|- "blk_"
|
|- "blk_/a" <- TBAA tag for a uses this root
|- "blk_/b" <- TBAA tag for b uses this root
|- "blk_/c" <- TBAA tag for c uses this root
There is a problem, though, with variables participating in EQUIVALENCE statements, e.g.:
module data2
real :: glob1, glob2, glob3
equivalence (glob1, glob2)
common /glob1c/ glob1, glob3
end module data2
subroutine test2
use data2
glob1 = glob2 + glob3
end subroutine test2
glob1 and glob2 alias and reside at the same 0 offset within glob1c storage, and they are repsented with different fir.declare operations at the point Flang assigns TBAA tags:
fir.global common @glob1c_(dense<0> : vector<8xi8>) {alignment = 4 : i64} : !fir.array<8xi8>
func.func @_QPtest2() {
%1 = fir.address_of(@glob1c_) : !fir.ref<!fir.array<8xi8>>
%2 = fir.convert %1 : (!fir.ref<!fir.array<8xi8>>) -> !fir.ref<!fir.array<?xi8>>
%3 = fir.coordinate_of %2, %c0 : (!fir.ref<!fir.array<?xi8>>, index) -> !fir.ref<i8>
%4 = fir.convert %3 : (!fir.ref<i8>) -> !fir.ptr<f32>
%5 = fir.declare %4 {uniq_name = "_QMdata2Eglob1"} : (!fir.ptr<f32>) -> !fir.ptr<f32>
%6 = fir.declare %4 {uniq_name = "_QMdata2Eglob2"} : (!fir.ptr<f32>) -> !fir.ptr<f32>
%7 = fir.coordinate_of %2, %c4 : (!fir.ref<!fir.array<?xi8>>, index) -> !fir.ref<i8>
%8 = fir.convert %7 : (!fir.ref<i8>) -> !fir.ref<f32>
%9 = fir.declare %8 {uniq_name = "_QMdata2Eglob3"} : (!fir.ref<f32>) -> !fir.ref<f32>
Flang’s AddAliasTags pass discovers that the source of all the loads and stores is the global glob1c_, but there is little information available about the overlapping (if any) between the members of the physical storage. Flang Lowering deliberately using !fir.ptr type for the fir.declare operations related to the member variables that participate in any EQUIVALENCE. I used this fact to generate TBAA tags like this:
"global data"
|
|- "glob1c_" <- TBAA tags for glob1 and glob2 use this root
|
|- "glob1c_/glob3" <- TBAA tag for glob3 uses this root
This works, but it is suboptimal, because neither glob1 nor glob2 alias with glob3, whereas TBAA does not convey this information. The actual TBAA representation I want to have is the following:
"global data"
|
|- "glob1c_"
|
|- "glob1c_/glob3" <- TBAA tag for glob3 uses this root
|- "glob1c_/<TBD>" <- TBAA tag for glob1 and glob2 use this root
Basically, I need to create subsets of the members within a single COMMON block, where:
- In each subset, a variable overlaps with at least one another variable in the subset.
- Two variables from different subsets never overlap.
This will allow me to create different TBAA sub-tree for each subset.
I do not think we can generate better TBAA for variables withing the subsets. At least Clang is not generating any useful TBAA for C unions, where the same [partial ]overlapping of the members happen.
I think recognizing the fir.coordinate arithmetic in the AddAliasTags to discover the overlapping members is not the right approach. Flang FE has all the information about the offsets of the members within the storage, so I think we should propagate it up to AddAliasTags pass. Other passes may also benefit from it, e.g. AddDebugInfoPass is processing fir.coordinate arithmetic currently and it is only looking at particular sequence of operations, which is not very reliable.
I am suggesting adding a AnyReferenceLike operand and an IntegerAttr to [hl]fir.declare that will represent a reference to the beginning of the physical storage and the byte offset within the storage where the variable starts. The pair is optional, and its absence means that the variable’s location is known to reside at offset 0 in memory as defined by [hl]fir.declare’s memref.
Example for the above case:
%1 = fir.address_of(@glob1c_) : !fir.ref<!fir.array<8xi8>>
%2 = fir.convert %1 : (!fir.ref<!fir.array<8xi8>>) -> !fir.ref<!fir.array<?xi8>>
%3 = fir.coordinate_of %2, %c0 : (!fir.ref<!fir.array<?xi8>>, index) -> !fir.ref<i8>
%4 = fir.convert %3 : (!fir.ref<i8>) -> !fir.ptr<f32>
%5 = fir.declare %4 (storage %1+0) {uniq_name = "_QMdata2Eglob1"} : (!fir.ptr<f32>) -> !fir.ptr<f32>
%6 = fir.declare %4 (storage %1+0) {uniq_name = "_QMdata2Eglob2"} : (!fir.ptr<f32>) -> !fir.ptr<f32>
%7 = fir.coordinate_of %2, %c4 : (!fir.ref<!fir.array<?xi8>>, index) -> !fir.ref<i8>
%8 = fir.convert %7 : (!fir.ref<i8>) -> !fir.ref<f32>
%9 = fir.declare %8 (storage %1+4) {uniq_name = "_QMdata2Eglob3"} : (!fir.ref<f32>) -> !fir.ref<f32>
Another example for local EQUIVALENCE variables:
subroutine test()
real :: local1(10), local2(10)
equivalence(local1(2), local2(5))
end subroutine test
HLFIR:
%1 = fir.alloca !fir.array<52xi8> {uniq_name = "_QFtestElocal1"}
%c12 = arith.constant 12 : index
%2 = fir.coordinate_of %1, %c12 : (!fir.ref<!fir.array<52xi8>>, index) -> !fir.ref<i8>
%3 = fir.convert %2 : (!fir.ref<i8>) -> !fir.ptr<!fir.array<10xf32>>
%c10 = arith.constant 10 : index
%4 = fir.shape %c10 : (index) -> !fir.shape<1>
%5:2 = hlfir.declare %3(%4) (storage %1+12) {uniq_name = "_QFtestElocal1"} : (!fir.ptr<!fir.array<10xf32>>, !fir.shape<1>) -> (!fir.ptr<!fir.array<10xf32>>, !fir.ptr<!fir.array<10xf32>>)
%c0 = arith.constant 0 : index
%6 = fir.coordinate_of %1, %c0 : (!fir.ref<!fir.array<52xi8>>, index) -> !fir.ref<i8>
%7 = fir.convert %6 : (!fir.ref<i8>) -> !fir.ptr<!fir.array<10xf32>>
%c10_0 = arith.constant 10 : index
%8 = fir.shape %c10_0 : (index) -> !fir.shape<1>
%9:2 = hlfir.declare %7(%8) (storage %1+0) {uniq_name = "_QFtestElocal2"} : (!fir.ptr<!fir.array<10xf32>>, !fir.shape<1>) -> (!fir.ptr<!fir.array<10xf32>>, !fir.ptr<!fir.array<10xf32>>)
Having this information, I can construct the subsets of overlapping COMMON members (the size of the members is always static).
Please let me know if you see any issues with that representation or if you can think of a different representation that might be more useful in other parts of Flang.
Thank you!