AliasAnalysis does not look though a memcpy

Hi,

I'm trying to get AA results for two pointers, but it seems that AA
cannot look though a memcpy. For example:

    define dso_local spir_func void @fun() {
    entry:
      ; Store an address of `var'
      %var = alloca i32, align 4
      store i32 42, i32* %var, align 4
      %var.addr = alloca i32*, align 8
      store i32* %var, i32** %var.addr, align 8

      ; Memcpy `var.addr' to `var.addr.tmp'
      %var.addr.tmp = alloca i32*, align 8
      %0 = bitcast i32** %var.addr.tmp to i8*
      %1 = bitcast i32** %var.addr to i8*
      call void @llvm.memcpy.p0i8.p0i8.i64(i8* align 8 %0, i8* align 8 %1, i64 8, i1 false)

      ; Load a copy of `var'
      %var.tmp = load i32*, i32** %var.addr.tmp
      %should.be.42 = load i32, i32* %var.tmp
      ret void
    }

    ; Function Attrs: argmemonly nounwind
    declare void @llvm.memcpy.p0i8.p0i8.i64(i8* nocapture writeonly, i8* nocapture readonly, i64, i1) #1
    attributes #1 = { argmemonly nounwind }

I run it with opt, and get the following:

    $ opt -basicaa -print-alias-sets memcpy.ll -disable-output
    Alias sets for function 'fun':
    Alias Set Tracker: 3 alias sets for 6 pointer values.
      AliasSet[0x5b5df0, 2] may alias, Mod/Ref Pointers: (i32* %var, LocationSize::precise(4)), (i32* %var.tmp, LocationSize::precise(4))
      AliasSet[0x5b5e90, 2] must alias, Mod/Ref Pointers: (i32** %var.addr, LocationSize::precise(8)), (i8* %1, LocationSize::precise(8))
      AliasSet[0x5b7390, 2] must alias, Mod/Ref Pointers: (i8* %0, LocationSize::precise(8)), (i32** %var.addr.tmp, LocationSize::precise(8))

So AA says that %var and %var.tmp "may alias", but I'd expect to get a
"must alias". What can be done to determine that %var and %var.tmp are
actually the same pointer in this case?

Hi,

I'm trying to get AA results for two pointers, but it seems that AA
cannot look though a memcpy. For example:

    define dso_local spir_func void @fun() {
    entry:
      ; Store an address of `var'
      %var = alloca i32, align 4
      store i32 42, i32* %var, align 4
      %var.addr = alloca i32*, align 8
      store i32* %var, i32** %var.addr, align 8

      ; Memcpy `var.addr' to `var.addr.tmp'
      %var.addr.tmp = alloca i32*, align 8
      %0 = bitcast i32** %var.addr.tmp to i8*
      %1 = bitcast i32** %var.addr to i8*
      call void @llvm.memcpy.p0i8.p0i8.i64(i8* align 8 %0, i8* align 8 %1, i64 8, i1 false)

      ; Load a copy of `var'
      %var.tmp = load i32*, i32** %var.addr.tmp
      %should.be.42 = load i32, i32* %var.tmp
      ret void
    }

    ; Function Attrs: argmemonly nounwind
    declare void @llvm.memcpy.p0i8.p0i8.i64(i8* nocapture writeonly, i8* nocapture readonly, i64, i1) #1
    attributes #1 = { argmemonly nounwind }

I run it with opt, and get the following:

    $ opt -basicaa -print-alias-sets memcpy.ll -disable-output
    Alias sets for function 'fun':
    Alias Set Tracker: 3 alias sets for 6 pointer values.
      AliasSet[0x5b5df0, 2] may alias, Mod/Ref Pointers: (i32* %var, LocationSize::precise(4)), (i32* %var.tmp, LocationSize::precise(4))
      AliasSet[0x5b5e90, 2] must alias, Mod/Ref Pointers: (i32** %var.addr, LocationSize::precise(8)), (i8* %1, LocationSize::precise(8))
      AliasSet[0x5b7390, 2] must alias, Mod/Ref Pointers: (i8* %0, LocationSize::precise(8)), (i32** %var.addr.tmp, LocationSize::precise(8))

So AA says that %var and %var.tmp "may alias", but I'd expect to get a
"must alias". What can be done to determine that %var and %var.tmp are
actually the same pointer in this case?

Can anyone suggest how to get the expected results from AA?

InstCombiner and other LLVM passes can probably remove a memcpy and make
IR a lot easier for AA, but I'd like to get the analysis working for
not-optimized IR as well.

Hi,

I'm trying to get AA results for two pointers, but it seems that AA
cannot look though a memcpy. For example:

    define dso_local spir_func void @fun() {
    entry:
      ; Store an address of `var'
      %var = alloca i32, align 4
      store i32 42, i32* %var, align 4
      %var.addr = alloca i32*, align 8
      store i32* %var, i32** %var.addr, align 8

      ; Memcpy `var.addr' to `var.addr.tmp'
      %var.addr.tmp = alloca i32*, align 8
      %0 = bitcast i32** %var.addr.tmp to i8*
      %1 = bitcast i32** %var.addr to i8*
      call void @llvm.memcpy.p0i8.p0i8.i64(i8* align 8 %0, i8* align 8 %1, i64 8, i1 false)

      ; Load a copy of `var'
      %var.tmp = load i32*, i32** %var.addr.tmp
      %should.be.42 = load i32, i32* %var.tmp
      ret void
    }

    ; Function Attrs: argmemonly nounwind
    declare void @llvm.memcpy.p0i8.p0i8.i64(i8* nocapture writeonly, i8* nocapture readonly, i64, i1) #1
    attributes #1 = { argmemonly nounwind }

I run it with opt, and get the following:

    $ opt -basicaa -print-alias-sets memcpy.ll -disable-output
    Alias sets for function 'fun':
    Alias Set Tracker: 3 alias sets for 6 pointer values.
      AliasSet[0x5b5df0, 2] may alias, Mod/Ref Pointers: (i32* %var, LocationSize::precise(4)), (i32* %var.tmp, LocationSize::precise(4))
      AliasSet[0x5b5e90, 2] must alias, Mod/Ref Pointers: (i32** %var.addr, LocationSize::precise(8)), (i8* %1, LocationSize::precise(8))
      AliasSet[0x5b7390, 2] must alias, Mod/Ref Pointers: (i8* %0, LocationSize::precise(8)), (i32** %var.addr.tmp, LocationSize::precise(8))

So AA says that %var and %var.tmp "may alias", but I'd expect to get a
"must alias". What can be done to determine that %var and %var.tmp are
actually the same pointer in this case?

Can anyone suggest how to get the expected results from AA?

InstCombiner and other LLVM passes can probably remove a memcpy and make
IR a lot easier for AA, but I'd like to get the analysis working for
not-optimized IR as well.

Hi, Andrew,

To be clear, you'd like BasicAA to look back, not just trough the
memcpy, but also through the loads and stores of the addresses? BasicAA
currently doesn't do any of that under the presumption that, once SROA,
InstCombine, etc. do promotion and simplification, AA will be able to
understand the rest. Can you say more about the use case?

Thanks,

Hal

Hi,

I'm trying to get AA results for two pointers, but it seems that AA
cannot look though a memcpy. For example:

    define dso_local spir_func void @fun() {
    entry:
      ; Store an address of `var'
      %var = alloca i32, align 4
      store i32 42, i32* %var, align 4
      %var.addr = alloca i32*, align 8
      store i32* %var, i32** %var.addr, align 8

      ; Memcpy `var.addr' to `var.addr.tmp'
      %var.addr.tmp = alloca i32*, align 8
      %0 = bitcast i32** %var.addr.tmp to i8*
      %1 = bitcast i32** %var.addr to i8*
      call void @llvm.memcpy.p0i8.p0i8.i64(i8* align 8 %0, i8* align 8 %1, i64 8, i1 false)

      ; Load a copy of `var'
      %var.tmp = load i32*, i32** %var.addr.tmp
      %should.be.42 = load i32, i32* %var.tmp
      ret void
    }

    ; Function Attrs: argmemonly nounwind
    declare void @llvm.memcpy.p0i8.p0i8.i64(i8* nocapture writeonly, i8* nocapture readonly, i64, i1) #1
    attributes #1 = { argmemonly nounwind }

I run it with opt, and get the following:

    $ opt -basicaa -print-alias-sets memcpy.ll -disable-output
    Alias sets for function 'fun':
    Alias Set Tracker: 3 alias sets for 6 pointer values.
      AliasSet[0x5b5df0, 2] may alias, Mod/Ref Pointers: (i32* %var, LocationSize::precise(4)), (i32* %var.tmp, LocationSize::precise(4))
      AliasSet[0x5b5e90, 2] must alias, Mod/Ref Pointers: (i32** %var.addr, LocationSize::precise(8)), (i8* %1, LocationSize::precise(8))
      AliasSet[0x5b7390, 2] must alias, Mod/Ref Pointers: (i8* %0, LocationSize::precise(8)), (i32** %var.addr.tmp, LocationSize::precise(8))

So AA says that %var and %var.tmp "may alias", but I'd expect to get a
"must alias". What can be done to determine that %var and %var.tmp are
actually the same pointer in this case?

Can anyone suggest how to get the expected results from AA?

InstCombiner and other LLVM passes can probably remove a memcpy and make
IR a lot easier for AA, but I'd like to get the analysis working for
not-optimized IR as well.

Hi, Andrew,

To be clear, you'd like BasicAA to look back, not just trough the
memcpy, but also through the loads and stores of the addresses?

Yes, exactly. I was mostly concerned about memcpy, but you're right,
loads/stores of addresses seems to have the same issue.

BasicAA currently doesn't do any of that under the presumption that,
once SROA, InstCombine, etc. do promotion and simplification, AA will
be able to understand the rest.

That is what I thought. Thanks for the confirmation.

Can you say more about the use case?

OpenCL C has a notion of Generic Address Space (GAS), allowing you to
cast a pointer from any (named) address space to a GAS pointer. Then you
can use this GAS pointer instead of a named AS pointer. Compiler is
responsible to infer the original address space of a GAS pointer when it
is actually used (for load/store), otherwise this is a compilation
error.

So I'm trying to follow a pointer from a point of origin
(i.e. addrspacecast private ptr -> generic ptr) and look through a
sequence of bitcasts, GEPs, function calls and pointer copies through
memory. AliasAnalysis and MemDepAnalysis give me all the required
information to do intraprocedural analysis, except for the memcpy case
that I raised in this thread.

Since a failure to infer an address space means a compilation error, I'm
trying to get this analysis working for debug builds, where we do not
run InstCombine and other aggressive optimizations.

Hi,

I'm trying to get AA results for two pointers, but it seems that AA
cannot look though a memcpy. For example:

    define dso_local spir_func void @fun() {
    entry:
      ; Store an address of `var'
      %var = alloca i32, align 4
      store i32 42, i32* %var, align 4
      %var.addr = alloca i32*, align 8
      store i32* %var, i32** %var.addr, align 8

      ; Memcpy `var.addr' to `var.addr.tmp'
      %var.addr.tmp = alloca i32*, align 8
      %0 = bitcast i32** %var.addr.tmp to i8*
      %1 = bitcast i32** %var.addr to i8*
      call void @llvm.memcpy.p0i8.p0i8.i64(i8* align 8 %0, i8* align 8 %1, i64 8, i1 false)

      ; Load a copy of `var'
      %var.tmp = load i32*, i32** %var.addr.tmp
      %should.be.42 = load i32, i32* %var.tmp
      ret void
    }

    ; Function Attrs: argmemonly nounwind
    declare void @llvm.memcpy.p0i8.p0i8.i64(i8* nocapture writeonly, i8* nocapture readonly, i64, i1) #1
    attributes #1 = { argmemonly nounwind }

I run it with opt, and get the following:

    $ opt -basicaa -print-alias-sets memcpy.ll -disable-output
    Alias sets for function 'fun':
    Alias Set Tracker: 3 alias sets for 6 pointer values.
      AliasSet[0x5b5df0, 2] may alias, Mod/Ref Pointers: (i32* %var, LocationSize::precise(4)), (i32* %var.tmp, LocationSize::precise(4))
      AliasSet[0x5b5e90, 2] must alias, Mod/Ref Pointers: (i32** %var.addr, LocationSize::precise(8)), (i8* %1, LocationSize::precise(8))
      AliasSet[0x5b7390, 2] must alias, Mod/Ref Pointers: (i8* %0, LocationSize::precise(8)), (i32** %var.addr.tmp, LocationSize::precise(8))

So AA says that %var and %var.tmp "may alias", but I'd expect to get a
"must alias". What can be done to determine that %var and %var.tmp are
actually the same pointer in this case?

Can anyone suggest how to get the expected results from AA?

InstCombiner and other LLVM passes can probably remove a memcpy and make
IR a lot easier for AA, but I'd like to get the analysis working for
not-optimized IR as well.

Hi, Andrew,

To be clear, you'd like BasicAA to look back, not just trough the
memcpy, but also through the loads and stores of the addresses?

Yes, exactly. I was mostly concerned about memcpy, but you're right,
loads/stores of addresses seems to have the same issue.

We've not generally seen a need to put this kind of logic into BasicAA,
and even if we were motivated to do so, we'd need to be very careful
about compile-time impact.

BasicAA currently doesn't do any of that under the presumption that,
once SROA, InstCombine, etc. do promotion and simplification, AA will
be able to understand the rest.

That is what I thought. Thanks for the confirmation.

Can you say more about the use case?

OpenCL C has a notion of Generic Address Space (GAS), allowing you to
cast a pointer from any (named) address space to a GAS pointer. Then you
can use this GAS pointer instead of a named AS pointer. Compiler is
responsible to infer the original address space of a GAS pointer when it
is actually used (for load/store), otherwise this is a compilation
error.

That seems scary :slight_smile: -- Can this inference not be done syntactically in
Clang?

-Hal

Finkel, Hal J. writes:

Hi,

I'm trying to get AA results for two pointers, but it seems that AA
cannot look though a memcpy. For example:

    define dso_local spir_func void @fun() {
    entry:
      ; Store an address of `var'
      %var = alloca i32, align 4
      store i32 42, i32* %var, align 4
      %var.addr = alloca i32*, align 8
      store i32* %var, i32** %var.addr, align 8

      ; Memcpy `var.addr' to `var.addr.tmp'
      %var.addr.tmp = alloca i32*, align 8
      %0 = bitcast i32** %var.addr.tmp to i8*
      %1 = bitcast i32** %var.addr to i8*
      call void @llvm.memcpy.p0i8.p0i8.i64(i8* align 8 %0, i8* align 8 %1, i64 8, i1 false)

      ; Load a copy of `var'
      %var.tmp = load i32*, i32** %var.addr.tmp
      %should.be.42 = load i32, i32* %var.tmp
      ret void
    }

    ; Function Attrs: argmemonly nounwind
    declare void @llvm.memcpy.p0i8.p0i8.i64(i8* nocapture writeonly, i8* nocapture readonly, i64, i1) #1
    attributes #1 = { argmemonly nounwind }

I run it with opt, and get the following:

    $ opt -basicaa -print-alias-sets memcpy.ll -disable-output
    Alias sets for function 'fun':
    Alias Set Tracker: 3 alias sets for 6 pointer values.
      AliasSet[0x5b5df0, 2] may alias, Mod/Ref Pointers: (i32* %var, LocationSize::precise(4)), (i32* %var.tmp, LocationSize::precise(4))
      AliasSet[0x5b5e90, 2] must alias, Mod/Ref Pointers: (i32** %var.addr, LocationSize::precise(8)), (i8* %1, LocationSize::precise(8))
      AliasSet[0x5b7390, 2] must alias, Mod/Ref Pointers: (i8* %0, LocationSize::precise(8)), (i32** %var.addr.tmp, LocationSize::precise(8))

So AA says that %var and %var.tmp "may alias", but I'd expect to get a
"must alias". What can be done to determine that %var and %var.tmp are
actually the same pointer in this case?

Can anyone suggest how to get the expected results from AA?

InstCombiner and other LLVM passes can probably remove a memcpy and make
IR a lot easier for AA, but I'd like to get the analysis working for
not-optimized IR as well.

Hi, Andrew,

To be clear, you'd like BasicAA to look back, not just trough the
memcpy, but also through the loads and stores of the addresses?

Yes, exactly. I was mostly concerned about memcpy, but you're right,
loads/stores of addresses seems to have the same issue.

We've not generally seen a need to put this kind of logic into BasicAA,
and even if we were motivated to do so, we'd need to be very careful
about compile-time impact.

Understood. I assume that the best way to ensure that is to add this
logic into a new AA.

BasicAA currently doesn't do any of that under the presumption that,
once SROA, InstCombine, etc. do promotion and simplification, AA will
be able to understand the rest.

That is what I thought. Thanks for the confirmation.

Can you say more about the use case?

OpenCL C has a notion of Generic Address Space (GAS), allowing you to
cast a pointer from any (named) address space to a GAS pointer. Then you
can use this GAS pointer instead of a named AS pointer. Compiler is
responsible to infer the original address space of a GAS pointer when it
is actually used (for load/store), otherwise this is a compilation
error.

That seems scary :slight_smile: -- Can this inference not be done syntactically in
Clang?

From a frontend perspective, a GAS pointer is just a pointer:

  void foo(int *p, int *q) { *p = 42; *q = 43; };

Until Clang reaches a call site, it has no idea about real address
spaces of `p' and `q'. When we do reach a call site, `foo()' can already
be CodeGen'ed, so we can't really change anything.

Finkel, Hal J. writes:

Hi,

I'm trying to get AA results for two pointers, but it seems that AA
cannot look though a memcpy. For example:

    define dso_local spir_func void @fun() {
    entry:
      ; Store an address of `var'
      %var = alloca i32, align 4
      store i32 42, i32* %var, align 4
      %var.addr = alloca i32*, align 8
      store i32* %var, i32** %var.addr, align 8

      ; Memcpy `var.addr' to `var.addr.tmp'
      %var.addr.tmp = alloca i32*, align 8
      %0 = bitcast i32** %var.addr.tmp to i8*
      %1 = bitcast i32** %var.addr to i8*
      call void @llvm.memcpy.p0i8.p0i8.i64(i8* align 8 %0, i8* align 8 %1, i64 8, i1 false)

      ; Load a copy of `var'
      %var.tmp = load i32*, i32** %var.addr.tmp
      %should.be.42 = load i32, i32* %var.tmp
      ret void
    }

    ; Function Attrs: argmemonly nounwind
    declare void @llvm.memcpy.p0i8.p0i8.i64(i8* nocapture writeonly, i8* nocapture readonly, i64, i1) #1
    attributes #1 = { argmemonly nounwind }

I run it with opt, and get the following:

    $ opt -basicaa -print-alias-sets memcpy.ll -disable-output
    Alias sets for function 'fun':
    Alias Set Tracker: 3 alias sets for 6 pointer values.
      AliasSet[0x5b5df0, 2] may alias, Mod/Ref Pointers: (i32* %var, LocationSize::precise(4)), (i32* %var.tmp, LocationSize::precise(4))
      AliasSet[0x5b5e90, 2] must alias, Mod/Ref Pointers: (i32** %var.addr, LocationSize::precise(8)), (i8* %1, LocationSize::precise(8))
      AliasSet[0x5b7390, 2] must alias, Mod/Ref Pointers: (i8* %0, LocationSize::precise(8)), (i32** %var.addr.tmp, LocationSize::precise(8))

So AA says that %var and %var.tmp "may alias", but I'd expect to get a
"must alias". What can be done to determine that %var and %var.tmp are
actually the same pointer in this case?

Can anyone suggest how to get the expected results from AA?

InstCombiner and other LLVM passes can probably remove a memcpy and make
IR a lot easier for AA, but I'd like to get the analysis working for
not-optimized IR as well.

Hi, Andrew,

To be clear, you'd like BasicAA to look back, not just trough the
memcpy, but also through the loads and stores of the addresses?

Yes, exactly. I was mostly concerned about memcpy, but you're right,
loads/stores of addresses seems to have the same issue.

We've not generally seen a need to put this kind of logic into BasicAA,
and even if we were motivated to do so, we'd need to be very careful
about compile-time impact.

Understood. I assume that the best way to ensure that is to add this
logic into a new AA.

BasicAA currently doesn't do any of that under the presumption that,
once SROA, InstCombine, etc. do promotion and simplification, AA will
be able to understand the rest.

That is what I thought. Thanks for the confirmation.

Can you say more about the use case?

OpenCL C has a notion of Generic Address Space (GAS), allowing you to
cast a pointer from any (named) address space to a GAS pointer. Then you
can use this GAS pointer instead of a named AS pointer. Compiler is
responsible to infer the original address space of a GAS pointer when it
is actually used (for load/store), otherwise this is a compilation
error.

That seems scary :slight_smile: -- Can this inference not be done syntactically in
Clang?

From a frontend perspective, a GAS pointer is just a pointer:

  void foo(int *p, int *q) { *p = 42; *q = 43; };

Until Clang reaches a call site, it has no idea about real address
spaces of `p' and `q'. When we do reach a call site, `foo()' can already
be CodeGen'ed, so we can't really change anything.

Is this supposed to work like template instantiation? Are you guaranteed
to only get one (unique) set of address spaces for the function arguments?

We can change the order that functions are emitted in Clang if necessary.

-Hal

Finkel, Hal J. writes:

Finkel, Hal J. writes:

Hi,

I'm trying to get AA results for two pointers, but it seems that AA
cannot look though a memcpy. For example:

    define dso_local spir_func void @fun() {
    entry:
      ; Store an address of `var'
      %var = alloca i32, align 4
      store i32 42, i32* %var, align 4
      %var.addr = alloca i32*, align 8
      store i32* %var, i32** %var.addr, align 8

      ; Memcpy `var.addr' to `var.addr.tmp'
      %var.addr.tmp = alloca i32*, align 8
      %0 = bitcast i32** %var.addr.tmp to i8*
      %1 = bitcast i32** %var.addr to i8*
      call void @llvm.memcpy.p0i8.p0i8.i64(i8* align 8 %0, i8* align 8 %1, i64 8, i1 false)

      ; Load a copy of `var'
      %var.tmp = load i32*, i32** %var.addr.tmp
      %should.be.42 = load i32, i32* %var.tmp
      ret void
    }

    ; Function Attrs: argmemonly nounwind
    declare void @llvm.memcpy.p0i8.p0i8.i64(i8* nocapture writeonly, i8* nocapture readonly, i64, i1) #1
    attributes #1 = { argmemonly nounwind }

I run it with opt, and get the following:

    $ opt -basicaa -print-alias-sets memcpy.ll -disable-output
    Alias sets for function 'fun':
    Alias Set Tracker: 3 alias sets for 6 pointer values.
      AliasSet[0x5b5df0, 2] may alias, Mod/Ref Pointers: (i32* %var, LocationSize::precise(4)), (i32* %var.tmp, LocationSize::precise(4))
      AliasSet[0x5b5e90, 2] must alias, Mod/Ref Pointers: (i32** %var.addr, LocationSize::precise(8)), (i8* %1, LocationSize::precise(8))
      AliasSet[0x5b7390, 2] must alias, Mod/Ref Pointers: (i8* %0, LocationSize::precise(8)), (i32** %var.addr.tmp, LocationSize::precise(8))

So AA says that %var and %var.tmp "may alias", but I'd expect to get a
"must alias". What can be done to determine that %var and %var.tmp are
actually the same pointer in this case?

Can anyone suggest how to get the expected results from AA?

InstCombiner and other LLVM passes can probably remove a memcpy and make
IR a lot easier for AA, but I'd like to get the analysis working for
not-optimized IR as well.

Hi, Andrew,

To be clear, you'd like BasicAA to look back, not just trough the
memcpy, but also through the loads and stores of the addresses?

Yes, exactly. I was mostly concerned about memcpy, but you're right,
loads/stores of addresses seems to have the same issue.

We've not generally seen a need to put this kind of logic into BasicAA,
and even if we were motivated to do so, we'd need to be very careful
about compile-time impact.

Understood. I assume that the best way to ensure that is to add this
logic into a new AA.

BasicAA currently doesn't do any of that under the presumption that,
once SROA, InstCombine, etc. do promotion and simplification, AA will
be able to understand the rest.

That is what I thought. Thanks for the confirmation.

Can you say more about the use case?

OpenCL C has a notion of Generic Address Space (GAS), allowing you to
cast a pointer from any (named) address space to a GAS pointer. Then you
can use this GAS pointer instead of a named AS pointer. Compiler is
responsible to infer the original address space of a GAS pointer when it
is actually used (for load/store), otherwise this is a compilation
error.

That seems scary :slight_smile: -- Can this inference not be done syntactically in
Clang?

From a frontend perspective, a GAS pointer is just a pointer:

  void foo(int *p, int *q) { *p = 42; *q = 43; };

Until Clang reaches a call site, it has no idea about real address
spaces of `p' and `q'. When we do reach a call site, `foo()' can already
be CodeGen'ed, so we can't really change anything.

Is this supposed to work like template instantiation? Are you guaranteed
to only get one (unique) set of address spaces for the function
arguments?

Yes, just like in C++ template, if `foo' is called with different sets
of address spaces, a compiler have to create different function
instantiations for each set.

We can change the order that functions are emitted in Clang if necessary.

I haven't thought this is actually configurable. I'd really appreciate
if you can give me a pointer on how to do this.

Hi, Andrew,

I'd like to fork this part of the thread and move it to cfe-dev. My best
advice is to handle this in Clang, not LLVM, and I've cc'd Richard and
John for their advice. More inline...

...

Can you say more about the use case?

OpenCL C has a notion of Generic Address Space (GAS), allowing you to
cast a pointer from any (named) address space to a GAS pointer. Then you
can use this GAS pointer instead of a named AS pointer. Compiler is
responsible to infer the original address space of a GAS pointer when it
is actually used (for load/store), otherwise this is a compilation
error.

That seems scary :slight_smile: -- Can this inference not be done syntactically in
Clang?

From a frontend perspective, a GAS pointer is just a pointer:

  void foo(int *p, int *q) { *p = 42; *q = 43; };

Until Clang reaches a call site, it has no idea about real address
spaces of `p' and `q'. When we do reach a call site, `foo()' can already
be CodeGen'ed, so we can't really change anything.

Is this supposed to work like template instantiation? Are you guaranteed
to only get one (unique) set of address spaces for the function
arguments?

Yes, just like in C++ template, if `foo' is called with different sets
of address spaces, a compiler have to create different function
instantiations for each set.

I think that you should handle this in Clang using TreeTransform, in a
sense, just like C++ template instantiation. See
lib/Sema/TreeTransform.h, and there are a number of examples in lib/Sema
of transforms using this infrastructure. Using TreeTransform you would
create variants of each function with the right address spaces, based on
usage, and then emit them all during CodeGen. because you'd do this
prior to code generation, you don't need to worry about the emission
ordering.

-Hal

Hi Andrew,

Can you please provide a reference to the relevant part of the OpenCL specification describing this feature? This sounds like an extremely surprising and problematic language design choice, and I’d like to make sure we’re not misinterpreting the specification.

(Some specific things that are unclear here: Where can GAS pointers be used? Can I put them in a struct? Can I make an array of them? Are all array elements required to point to the same address space? Are they mutable? Can I assign pointers from multiple different address spaces to the same GAS pointer? Must functions taking GAS pointers be defined in the same translation unit as the call? Can different GAS parameters resolve to different address spaces? Can you take the address of a function taking GAS pointers?)

Hi Andrew,

Can you please provide a reference to the relevant part of the OpenCL specification describing this feature? This sounds like an extremely surprising and problematic language design choice, and I'd like to make sure we're not misinterpreting the specification.

(Some specific things that are unclear here: Where can GAS pointers be used? Can I put them in a struct? Can I make an array of them? Are all array elements required to point to the same address space? Are they mutable? Can I assign pointers from multiple different address spaces to the same GAS pointer? Must functions taking GAS pointers be defined in the same translation unit as the call? Can different GAS parameters resolve to different address spaces? Can you take the address of a function taking GAS pointers?)

Andrew, please correct me if I'm wrong... It looks like the answer to all of Richard's questions is: yes . This doesn't look like template instantiation. I retract my recommendation in that regard.

  https://www.khronos.org/registry/OpenCL/specs/2.2/html/OpenCL_C.html#the-generic-address-space

which explicitly says that you get to do this:

kernel void bar(global int *g, local int *l)
{
    int *var;

    if (...)
        var = g;
    else
        var = l;
    *var = 42;
    ...
}

where the address space associated with a particular variable can be control-dependent. Also, it can change over time:

global int *gp;
local int *lp;
private int *pp;

int *p;
p = gp; // legal
p = lp; // legal
p = pp; // legal

If you can't represent these directly (e.g., your global address space is also your generic address space) then you might need a fat-pointer representation which you optimize, where possible, by propagating AS info where possible.

-Hal

Hi, Andrew,

I'd like to fork this part of the thread and move it to cfe-dev. My best
advice is to handle this in Clang, not LLVM, and I've cc'd Richard and
John for their advice. More inline...

...

Can you say more about the use case?

OpenCL C has a notion of Generic Address Space (GAS), allowing you to
cast a pointer from any (named) address space to a GAS pointer. Then you
can use this GAS pointer instead of a named AS pointer. Compiler is
responsible to infer the original address space of a GAS pointer when it
is actually used (for load/store), otherwise this is a compilation
error.

That seems scary :slight_smile: -- Can this inference not be done syntactically in
Clang?

From a frontend perspective, a GAS pointer is just a pointer:

  void foo(int *p, int *q) { *p = 42; *q = 43; };

Until Clang reaches a call site, it has no idea about real address
spaces of `p' and `q'. When we do reach a call site, `foo()' can already
be CodeGen'ed, so we can't really change anything.

Is this supposed to work like template instantiation? Are you guaranteed
to only get one (unique) set of address spaces for the function
arguments?

Yes, just like in C++ template, if `foo' is called with different sets
of address spaces, a compiler have to create different function
instantiations for each set.

I think that you should handle this in Clang using TreeTransform, in a
sense, just like C++ template instantiation. See
lib/Sema/TreeTransform.h, and there are a number of examples in lib/Sema
of transforms using this infrastructure. Using TreeTransform you would
create variants of each function with the right address spaces, based on
usage, and then emit them all during CodeGen. because you'd do this
prior to code generation, you don't need to worry about the emission
ordering.

-Hal

Hi, Andrew, I'd like to fork this part of the thread and move

    > it to cfe-dev. My best advice is to handle this in Clang, not
    > LLVM, and I've cc'd Richard and John for their advice. More
    > inline...

    >> ...
    >>>>>>> Can you say more about the use case?

OpenCL C has a notion of Generic Address Space (GAS), allowing you

    >>>>>>> to

cast a pointer from any (named) address space to a GAS

    >>>>>>> pointer. Then you

can use this GAS pointer instead of a named AS pointer. Compiler

    >>>>>>> is

responsible to infer the original address space of a GAS pointer

    >>>>>>> when it

is actually used (for load/store), otherwise this is a compilation
error.

    Hal:
    >>>>> That seems scary :slight_smile: -- Can this inference not be done
    >>>>> syntactically in Clang?

Actually the generic address space is an OpenCL 2.x feature and does not
require address space inference since it is expected that the hardware
will be able to address any kind of address spaces (except the constant
one, but it is a borderline case). Or it could even be done with run-time
resolution, probably in a less efficient way.

But of course in some case it might be useful to remove some generic
address space using some address space inference at compile time and
replace it by some more concrete address spaces.

An important use case could be to compile OpenCL 2.x programs to run on
OpenCL 1.x-level hardware, without generic address space support, or
perhaps because instructions using generic address space are less
efficient on some OpenCL 2.x hardware, or...

Is it this the kind of usage Andrew you are working on?

Hi, Andrew, I’d like to fork this part of the thread and move
it to cfe-dev. My best advice is to handle this in Clang, not
LLVM, and I’ve cc’d Richard and John for their advice. More
inline…

Can you say more about the use case?
OpenCL C has a notion of Generic Address Space (GAS), allowing you

to
cast a pointer from any (named) address space to a GAS

pointer. Then you
can use this GAS pointer instead of a named AS pointer. Compiler

is
responsible to infer the original address space of a GAS pointer

when it
is actually used (for load/store), otherwise this is a compilation
error.

Hal:

That seems scary :slight_smile: – Can this inference not be done
syntactically in Clang?

Actually the generic address space is an OpenCL 2.x feature and does not
require address space inference since it is expected that the hardware
will be able to address any kind of address spaces (except the constant
one, but it is a borderline case). Or it could even be done with run-time
resolution, probably in a less efficient way.

But of course in some case it might be useful to remove some generic
address space using some address space inference at compile time and
replace it by some more concrete address spaces.

An important use case could be to compile OpenCL 2.x programs to run on
OpenCL 1.x-level hardware, without generic address space support, or
perhaps because instructions using generic address space are less
efficient on some OpenCL 2.x hardware, or…

Sounds to me like this doesn’t require frontend changes, then. (Well, mostly: if we need to use fat pointers to encode the address space, then the frontend might need changes to encode that in the data layout and to increase the size of pointers and such, but that sounds like the extent of the frontend’s responsibilities.) A pass to pick a concrete address space for a pointer based on usage would belong in the middle-end.

I think it depends a lot on the language rules. If the language rules
are set up so that we can easily propagate qualifiers from the arguments
without a complex analysis, well, okay, TreeTransform makes sense. But I
think it's much more likely that this would have to be a data-flow-sensitive,
best-effort analysis that simply fails in arbitrary ways if the optimizer
isn't able to fully eliminate a use of the generic address space.

To enable a TreeTransform-based implementation, we'd have to be able to infer
a concrete address space immediately for every place where the GAS would
otherwise be used in the function, and when those places are e.g. the types
of local variables, that inference must prove to be consistent with other uses
of the variable/whatever as we propagate type information forward.

I don't really know how that inference would work; it sounds incredibly
complicated. But then, admittedly, so does a data-flow-sensitive rewrite.

John.

Oh, and now I've found the existing discussion where we've come to the
conclusion that a first-class GAS is acceptable. Great!

John.

Ronan KERYELL writes:

    > Hi, Andrew, I'd like to fork this part of the thread and move
    > it to cfe-dev. My best advice is to handle this in Clang, not
    > LLVM, and I've cc'd Richard and John for their advice. More
    > inline...

    >> ...
    >>>>>>> Can you say more about the use case?

OpenCL C has a notion of Generic Address Space (GAS), allowing you

    >>>>>>> to

cast a pointer from any (named) address space to a GAS

    >>>>>>> pointer. Then you

can use this GAS pointer instead of a named AS pointer. Compiler

    >>>>>>> is

responsible to infer the original address space of a GAS pointer

    >>>>>>> when it

is actually used (for load/store), otherwise this is a compilation
error.

    Hal:
    >>>>> That seems scary :slight_smile: -- Can this inference not be done
    >>>>> syntactically in Clang?

Actually the generic address space is an OpenCL 2.x feature and does not
require address space inference since it is expected that the hardware
will be able to address any kind of address spaces (except the constant
one, but it is a borderline case). Or it could even be done with run-time
resolution, probably in a less efficient way.

But of course in some case it might be useful to remove some generic
address space using some address space inference at compile time and
replace it by some more concrete address spaces.

An important use case could be to compile OpenCL 2.x programs to run on
OpenCL 1.x-level hardware, without generic address space support, or
perhaps because instructions using generic address space are less
efficient on some OpenCL 2.x hardware, or...

Is it this the kind of usage Andrew you are working on?

You're right, my definition of GAS was too strict - OpenCL specification
does allow dynamic GAS resolution, but it requires backend (or hardware)
support to do that.

For static GAS resolution my use cases are:
1) backends (e.g. based on OpenCL 1.x) which do not support GAS natively.
2) performance and code size improvements that can be archived by GAS
   static resolution.

For (2) I'm probably fine with running this analysis after
optimizations, but support for debug (no-opt) can be important for (1).

Finkel, Hal J. writes:

Hi Andrew,

Can you please provide a reference to the relevant part of the OpenCL
specification describing this feature? This sounds like an extremely
surprising and problematic language design choice, and I'd like to
make sure we're not misinterpreting the specification.

Sorry, GAS compile-time resolution (inference) is not a mandatory as I
believed, and OpenCL 2.0 compiler is allowed to fallback to dynamic
resolution (generate instructions which can work with any AS).

Though it is still beneficial to infer as much as possible at compile
time.

As Hal already mentioned, GAS is described here:
  https://www.khronos.org/registry/OpenCL/specs/2.2/html/OpenCL_C.html#the-generic-address-space

(Some specific things that are unclear here:
Where can GAS pointers be used?

Everywhere where a named pointer can be used. Only casts from GAS to a
named AS are disallowed.

Can I put them in a struct? Can I make an array of them?

Yes.

Are all array elements required to point to the same address space?

Specification does not explicitly forbid this, so the answer should be:
no, elements of an array can have different address space.

Are they mutable? Can I assign pointers from multiple different
address spaces to the same GAS pointer?

Yes.

Must functions taking GAS pointers be defined in the same translation
unit as the call?

No, they can be defined in different translation units.

Can different GAS parameters resolve to different address spaces?

Yes.

Can you take the address of a function taking GAS pointers?)

Luckily, function pointers are not allowed in OpenCL at all.

Andrew, please correct me if I'm wrong... It looks like the answer to
all of Richard's questions is: yes.

This doesn't look like template instantiation. I retract my
recommendation in that regard.

Sorry, I was not clear about this. Specification is not clear about this
either, but from my understanding, this should work like a templates
when a GAS pointer is a function argument: if a function is called twice
with parameters of different AS, we have to duplicate the function.

  https://www.khronos.org/registry/OpenCL/specs/2.2/html/OpenCL_C.html#the-generic-address-space

which explicitly says that you get to do this:

kernel void bar(global int *g, local int *l)
{
    int *var;

    if (...)
        var = g;
    else
        var = l;
    *var = 42;
    ...
}

where the address space associated with a particular variable can be control-dependent. Also, it can change over time:

global int *gp;
local int *lp;
private int *pp;

int *p;
p = gp; // legal
p = lp; // legal
p = pp; // legal

If you can't represent these directly (e.g., your global address space
is also your generic address space) then you might need a fat-pointer
representation which you optimize, where possible, by propagating AS
info where possible.

So you suggest to optimize address spaces in LLVM IR, and if the
optimization fails somewhere, then generate a generic code in a device
backend. Is this accurate?

This is probably the best approach, and from my understanding, this is
how it works for many OpenCL 2.0 implementations.

What I'm trying to figure out, is how to write an analysis that can
statically infer all address spaces in an "average code".

So it looks like I cannot modify Clang to make IR more friendly for this
analysis, and I need to run some optimizations to infer an address space
in more complicated cases.