OpenCL address space and mangling

Hello Mon Ping,

I apologize for the mail length, but I hope to explain as clear as I can the points I think need to be discussed.

Sorry of being late to this conversation. It doesn’t look consistent me. Address
space numbers are not language constructs. The language constructs are global
and local. Coming out of clang, I think it is more natural for the AS mangling
and the type to match. In C++, clang will generate different names for
structures which can be identical and uses those names consistently to mangle
the function, e.g.,
   %struct.foo = type { i32, i32 }
   define void @_Z4testR3foo(%struct.foo* %foo)

I view the address spaces coming out of clang represent how the target represent
memory is a logical. How a particular llvm maps them to physical memory is
target dependent. A backend may map them all the address spaces to the same
physical memory or to different ones. Due to this, I don’t think it make sense
to distinguish between the two in clang for a particular target.

I agree, the fact that opencl address spaces are handled like other address spaces is a technical aspect. To have a common way I don't see a strict limitation in how address spaces are mangled (they can be numbers decided as convention in clang, or defined by targets, or whatever), but still mangling should preserve the differences that are present in the source language.

I want to remark IMO an important aspect:
"Pointer types may have an optional address space attribute defining the numbered address space where the pointed-to object resides. The default address space is number zero. The semantics of non-zero address spaces are target-specific." (LLVM Language Reference Manual — LLVM 18.0.0git documentation)

From this description I understand that address spaces in the IR are physical address spaces. Because of this I consider wrong to use this property as is to represent inside the IR logical address space. Doing that would imply that each backend should be aware of language specific mapping: currently this is not the case and IMO it's a bad idea to have this.

But a derived information from the source language is still useful to perform optimization, both in the IR and later in the backend: the logical distinction of address spaces is still useful and IMO shoul be represented in the IR. Have both logical and physical address spaces information (it's not important to know is "AS1" means global or local, it's enough to know that 1 is differnt from 2) would be useful to have a better alias analysis also for those targets that physically have one unique address space. I consider that this can be solved independently from the mangling problem.

The answer to both question, I suggested to introduce another map in order to preserve the distinction between address spaces also for those targets that do not have physical distinct address spaces, like X86, and through this solve the problem related to the mangler.

As previously discussed, this is not the only viable solution, the mapping of logical address spaces to physical address spaces can be delayed till instruction selection: this would allow the frontend to lower this information in a target independent manner demanding a late IR pass the mapping task (this task would be language/target dependent, so basically who builds the pass pipeline must schedule this language dependent task that requires target informations). Still here may be useful to preserve the logical information of address spaces.
This kind of solution is feasible, but simply it does not seem the way chosen in clang to solve the problem.

My proposal was the one with the minimal impact on the codebase trying to maintain a desirable flexibility in order to build opencl toolchain compatible with the past.

Could you explain to me what you are proposing? How the mangler should be fixed? How address spaces are lowered in the IR? This lowering is target dependent or not? The mangling is also target dependent?

Thanks in advance.

Best regards,
-Michele

HI Michele,

Hello Mon Ping,

I apologize for the mail length, but I hope to explain as clear as I can the points I think need to be discussed.

Sorry of being late to this conversation. It doesn’t look consistent me. Address
space numbers are not language constructs. The language constructs are global
and local. Coming out of clang, I think it is more natural for the AS mangling
and the type to match. In C++, clang will generate different names for
structures which can be identical and uses those names consistently to mangle
the function, e.g.,
  %struct.foo = type { i32, i32 }
  define void @_Z4testR3foo(%struct.foo* %foo)

I view the address spaces coming out of clang represent how the target represent
memory is a logical. How a particular llvm maps them to physical memory is
target dependent. A backend may map them all the address spaces to the same
physical memory or to different ones. Due to this, I don’t think it make sense
to distinguish between the two in clang for a particular target.

I agree, the fact that opencl address spaces are handled like other address spaces is a technical aspect. To have a common way I don't see a strict limitation in how address spaces are mangled (they can be numbers decided as convention in clang, or defined by targets, or whatever), but still mangling should preserve the differences that are present in the source language.

I want to remark IMO an important aspect:
"Pointer types may have an optional address space attribute defining the numbered address space where the pointed-to object resides. The default address space is number zero. The semantics of non-zero address spaces are target-specific." (LLVM Language Reference Manual — LLVM 18.0.0git documentation)

From this description I understand that address spaces in the IR are physical address spaces. Because of this I consider wrong to use this property as is to represent inside the IR logical address space. Doing that would imply that each backend should be aware of language specific mapping: currently this is not the case and IMO it's a bad idea to have this.

But a derived information from the source language is still useful to perform optimization, both in the IR and later in the backend: the logical distinction of address spaces is still useful and IMO shoul be represented in the IR. Have both logical and physical address spaces information (it's not important to know is "AS1" means global or local, it's enough to know that 1 is differnt from 2) would be useful to have a better alias analysis also for those targets that physically have one unique address space. I consider that this can be solved independently from the mangling problem.

The answer to both question, I suggested to introduce another map in order to preserve the distinction between address spaces also for those targets that do not have physical distinct address spaces, like X86, and through this solve the problem related to the mangler.

As previously discussed, this is not the only viable solution, the mapping of logical address spaces to physical address spaces can be delayed till instruction selection: this would allow the frontend to lower this information in a target independent manner demanding a late IR pass the mapping task (this task would be language/target dependent, so basically who builds the pass pipeline must schedule this language dependent task that requires target informations). Still here may be useful to preserve the logical information of address spaces.
This kind of solution is feasible, but simply it does not seem the way chosen in clang to solve the problem.

My proposal was the one with the minimal impact on the codebase trying to maintain a desirable flexibility in order to build opencl toolchain compatible with the past.

Could you explain to me what you are proposing? How the mangler should be fixed? How address spaces are lowered in the IR? This lowering is target dependent or not? The mangling is also target dependent?

IMO, the description only indicates that an address space is completely target dependent. For the current x86 target, address spaces > 255 are used for a non-standard address for the stack protector while every other address space overlaps and maps to the same region in memory. A target can defined it differently or make some address spaces illegal but it is up to the target.

When generating code for a particular target, clang need to decide on how to map the global, local, etc.. for a specific target. Currently, for X86, it decides to use different address space to distinguish for overloading knowing that in the target, the address spaces will physically overlap. This keeps the two sides consistent when mangling based on the LLVM IR address space and keeps the overloaded functions to be distinguished for this particular target. This choice, as you noted, is to make the mapping target dependent. If a target wants to map everything to the same address space and wants to overloading of their functions because there is no distinction, it can make that choice at this level.

My objection to the logical map is that by introducing the CL address names to an address space numbering, it looks very target dependent and if the logical address space vs LLVM IR address space doesn’t match, it looks inconsistent. In that case, I think we should do what we are currently doing. Instead of a logical map, if we want to preserve the language constructs in a target independent manner, we should use the language construct names in the overloading as that is language dependent and independent of AS numbers which are LLVM IR concepts; which I believe Eli indicated as well. If we want to preserve compatibility for some target, we can make it target dependent if they want to map use current address space mapping today or use the language mapping. I don’t know how Eli or the other code owners feel about having that compatibly mode which will be useful for people want to preserve the old behavior. Opinions?

Thanks,
  — Mon Ping

Hello Mon Ping,

IMO, the description only indicates that an address space is completely target dependent. For the current x86 target, address spaces > 255 are used for a non-standard address for the stack protector while every other address space overlaps and maps to the same region in memory. A target can defined it differently or make some address spaces illegal but it is up to the target.

When generating code for a particular target, clang need to decide on how to map the global, local, etc.. for a specific target. Currently, for X86, it decides to use different address space to distinguish for overloading knowing that in the target, the address spaces will physically overlap. This keeps the two sides consistent when mangling based on the LLVM IR address space and keeps the overloaded functions to be distinguished for this particular target. This choice, as you noted, is to make the mapping target dependent. If a target wants to map everything to the same address space and wants to overloading of their functions because there is no distinction, it can make that choice at this level.

So why the addrspace map for X86 is still the trivial one IF the assumption in the backend is that whatever number I choose less than 255 is the same as 0? Maybe for X86 defining a non trivial map is a correct fix, but it's not true in general!

What if an hypothetical backend would enforce that there exists ONLY address space zero? Why I should not be able to produce a correct mangle for opencl overloaded function that refers to different logical address spaces?

My objection to the logical map is that by introducing the CL address names to an address space numbering, it looks very target dependent and if the logical address space vs LLVM IR address space doesn’t match, it looks inconsistent. In that case, I think we should do what we are currently doing. Instead of a logical map, if we want to preserve the language constructs in a target independent manner, we should use the language construct names in the overloading as that is language dependent and independent of AS numbers which are LLVM IR concepts; which I believe Eli indicated as well. If we want to preserve compatibility for some target, we can make it target dependent if they want to map use current address space mapping today or use the language mapping. I don’t know how Eli or the other code owners feel about having that compatibly mode which will be useful for people want to preserve the old behavior. Opinions?

The idea of having something target independent seems considered bad in the previous messages. IMO the usage of numbers can be unpleasant, implementation dependent, but I haven't seen a standardized mangling for OpenCL C.

My point is that *every* target the mangler should produce different names even if the address space translation map is the trivial one.
How the address space information is propagated in the IR and the mangling IMO are orthogonal problem: so the inconsistency you underline conceptually cannot exist by definition.

What I noticed is that the mangler now produces wrong names respect to its purpose (X86 is only the test case).

Thanks for your reply.

Regards,
-Michele

Up.

Regards,
-Michele

Hi Michele,

Sorry for the delay response.

Hello Mon Ping,

IMO, the description only indicates that an address space is completely target dependent. For the current x86 target, address spaces > 255 are used for a non-standard address for the stack protector while every other address space overlaps and maps to the same region in memory. A target can defined it differently or make some address spaces illegal but it is up to the target.

When generating code for a particular target, clang need to decide on how to map the global, local, etc.. for a specific target. Currently, for X86, it decides to use different address space to distinguish for overloading knowing that in the target, the address spaces will physically overlap. This keeps the two sides consistent when mangling based on the LLVM IR address space and keeps the overloaded functions to be distinguished for this particular target. This choice, as you noted, is to make the mapping target dependent. If a target wants to map everything to the same address space and wants to overloading of their functions because there is no distinction, it can make that choice at this level.

So why the addrspace map for X86 is still the trivial one IF the assumption in the backend is that whatever number I choose less than 255 is the same as 0? Maybe for X86 defining a non trivial map is a correct fix, but it's not true in general!

What if an hypothetical backend would enforce that there exists ONLY address space zero? Why I should not be able to produce a correct mangle for opencl overloaded function that refers to different logical address spaces?

Yes, you are right that in general it is not true. As we both agree, LLVM address spaces are completely target dependent. The question is why would someone want to produce different overloaded functions when the llvm backend address space only supports one. It can’t be for code generation since the code will be the same. A backend may want them to all be mangled to the be the same since they would collapse the number of CL builtin functions they would need to support.

There are cases when clients may want different LLVM IR address spaces and the mangling. One case is if someone uses the address space for alias analysis. Another case is that a platform has a set of devices, some with physical address spaces, and wants to keep the mangled name consistent for the platform. Both of these cases are target dependent on why they want to do so.

My objection to the logical map is that by introducing the CL address names to an address space numbering, it looks very target dependent and if the logical address space vs LLVM IR address space doesn’t match, it looks inconsistent. In that case, I think we should do what we are currently doing. Instead of a logical map, if we want to preserve the language constructs in a target independent manner, we should use the language construct names in the overloading as that is language dependent and independent of AS numbers which are LLVM IR concepts; which I believe Eli indicated as well. If we want to preserve compatibility for some target, we can make it target dependent if they want to map use current address space mapping today or use the language mapping. I don’t know how Eli or the other code owners feel about having that compatibly mode which will be useful for people want to preserve the old behavior. Opinions?

The idea of having something target independent seems considered bad in the previous messages. IMO the usage of numbers can be unpleasant, implementation dependent, but I haven't seen a standardized mangling for OpenCL C.

My point is that *every* target the mangler should produce different names even if the address space translation map is the trivial one.

I’m not convinced on this point. Can you please explain the use case that you want to support again?

How the address space information is propagated in the IR and the mangling IMO are orthogonal problem: so the inconsistency you underline conceptually cannot exist by definition.

What I noticed is that the mangler now produces wrong names respect to its purpose (X86 is only the test case).

If we have both a logical map and a llvm address space maps, I think it is confusing that the mangled name address space differs from the physical llvm map. It is like having a type name for managing that has no relationship with the type name in the LLVM IR or the language it is coming from. If we need to support the CL address spaces mangling to be different from the LLVM IR address space, I think it would be better to be target independent and force mangling to be based on the language (global, local, etc..) , which it sounds like you were not opposed of. As noted above, there are cases where we want them to match.

— Mon Ping

My use case is OpenCL. In this language the abstraction of address spaces is
explicit, so whatever is the way I implement this abstraction I would need to
have different names for overloaded functions that differs only for address
space qualifiers for pointers, as the mangling is a technique to avoid name
collisions for function with same name but different signatures.

The mangling is just a frontend matter to solve names collision and preserve
source language aspects. For OpenCL I should not care if physical address spaces
exists. Who implements these OpenCL functions (these functions are the OpenCL
builtins) for a target that have already a target library with similar functions
would need only to call them.

The pure solution would be the one proposed by Eli: I don't have any objection
to this solution.
The mangler now has a bug, so it must be fixed. The pure solution implicitly
breaks the binary compatibility. If we do not have problem with this (so we
consider a matter for the users to solve the problem, e.g. with a forced update
of libraries) the right patch is to have a target independent mangling for OpenCL.

Still we would have problems if we consider SPIR: in its specification there is
a fixed mangling scheme (that it's the one produced by the current mangler). In
this case we have two choice: we change the SPIR mangling or we allow targets to
override the target independent mangling for OpenCL with the one based on the
TargetAddrSpaceMap.

*Based on all this would see the mangling proposed by Eli the default except for
targets that explicitly requires a mangling scheme based on the target address
spaces map (e.g. the SPIR target).*

Thanks.

Regards,

-Michele

----- A little digression ------

How OpenCL address spaces are lowered in the IR is another problem orthogonal
problem.

For targets like PTX or R600 we have real distinct address spaces so, also here
is fine to use those.

On X86 we can use target address spaces in the range [0-255] as the backend has
the assumption that they are all equivalent to address space 0.
But this is not the general case for CPUs targets. So on a generic CPU is
correct to map all the OpenCL address spaces to the default target address space
(0).

For alias analysis purpose, I started a discussion (quite huge) in the LLVMDev
mailing list (
http://lists.cs.uiuc.edu/pipermail/llvmdev/2013-August/064620.html ) to find a
way to represent logical address spaces in the IR so to be able to distinguish
between two memory location that are physically in the same address space but
logically in two logical different (maybe also disjoint) address spaces.

A reasonable solution is here (
http://lists.cs.uiuc.edu/pipermail/llvmdev/2013-August/064807.html ): use TBAA
similar metadata to describe relationship between source level address spaces (a
tree structure to represent inclusion relationship and "constant" property) and
add them to load store instructions (as done for the TBAA). From this is
possible to introduce in LLVM a new AliasAnalysis for address spaces that use
these informations to decide aliasing when the physical address space is the
same, and when target address spaces are used the query is done to the target
that knows these address spaces (so it can answer true if two physical address
spaces are disjoint or not.)

In this way in Clang, the target address space map keep the current semantic,
address space metadata should be emitted in the case of OpenCL and attached to
load-store, etc, instructions.

The pure solution would be the one proposed by Eli: I don't have any objection
to this solution.
The mangler now has a bug, so it must be fixed. The pure solution implicitly
breaks the binary compatibility. If we do not have problem with this (so we
consider a matter for the users to solve the problem, e.g. with a forced update
of libraries) the right patch is to have a target independent mangling for OpenCL.

I think there's another reason for desiring a target independent mangling: a system may contain several OpenCL devices and the actual implementation of address spaces (in particular whether they're "front-end annotations only" or actually denote physically different regions of memory) may depend on the OpenCL device (with its associated backend). (In the conventional OpenCL usage it may not matter since one could postpone the address space resolution to later in the process; once you've got to process already produced SPIR I think it does.)

Cheers,
Dave

-- IMPORTANT NOTICE: The contents of this email and any attachments are confidential and may also be privileged. If you are not the intended recipient, please notify the sender immediately and do not disclose the contents to any other person, use it for any purpose, or store or copy the information in any medium. Thank you.

ARM Limited, Registered office 110 Fulbourn Road, Cambridge CB1 9NJ, Registered in England & Wales, Company No: 2557590
ARM Holdings plc, Registered office 110 Fulbourn Road, Cambridge CB1 9NJ, Registered in England & Wales, Company No: 2548782

In attachment a proposal to implement target independent mangling with the
option for targets to force the use of target address space based mangling.

Regards,
-Michele

mangling-rev5.patch (4.68 KB)

This patch looks fine to me.

  — Mon Ping

This patch looks fine to me.

If this patch seems generally fine, I would appreciate if someone can commit it
for me, because I haven't commit access.

Thanks in advance.

Best Regards,
-Michele

Reup and updated version of the patch!

Thanks in advance.

Regards,
-Michele

mangling.patch (4.68 KB)

Hi,

while I think most people agree with the direction things are going there
look to still
be some fiddly details. As one instance, when I run this on a standard OSS
LLVM build I get
a new test failure in test/CodeGenOpenCL/local.cl. Since behaviour is being
made more sophisticated, it seems
it would be good to have add some tests that verify the new behaviour so we
can detect any
modifications that change it. But the patch looks to be progressing.

Cheers,
Dave

Hi David,

I've fixed the test and added another test specific for mangling checking. To
simplify testing I've added a command line option (similar to
-ffake-address-space-map).

In attachment the new version of the patch.

Thanks in advance.

Best Regards,
Michele

mangling-rev6.diff (11.5 KB)

Hi Michele,

This patch LGTM. Assuming you'd still like someone to commit it on your behalf, I'll leave it for a day in case
anyone else has any comments or issues, but can commit it for you end of tomorrow if nothing comes up.

Thanks for working on this,

Cheers,
Dave

Hi Michele,

This patch LGTM. Assuming you'd still like someone to commit it on your behalf, I'll leave it for a day in case
anyone else has any comments or issues, but can commit it for you end of tomorrow if nothing comes up.

Thanks for working on this,

Yes, if you can commit it I would appreciate :-). I agree with your plan.

Thanks again.

Regards,
-Michele

Committed r190684.