Hello Mon Ping,
I apologize for the mail length, but I hope to explain as clear as I can the points I think need to be discussed.
Sorry of being late to this conversation. It doesn’t look consistent me. Address
space numbers are not language constructs. The language constructs are global
and local. Coming out of clang, I think it is more natural for the AS mangling
and the type to match. In C++, clang will generate different names for
structures which can be identical and uses those names consistently to mangle
the function, e.g.,
%struct.foo = type { i32, i32 }
define void @_Z4testR3foo(%struct.foo* %foo)I view the address spaces coming out of clang represent how the target represent
memory is a logical. How a particular llvm maps them to physical memory is
target dependent. A backend may map them all the address spaces to the same
physical memory or to different ones. Due to this, I don’t think it make sense
to distinguish between the two in clang for a particular target.
I agree, the fact that opencl address spaces are handled like other address spaces is a technical aspect. To have a common way I don't see a strict limitation in how address spaces are mangled (they can be numbers decided as convention in clang, or defined by targets, or whatever), but still mangling should preserve the differences that are present in the source language.
I want to remark IMO an important aspect:
"Pointer types may have an optional address space attribute defining the numbered address space where the pointed-to object resides. The default address space is number zero. The semantics of non-zero address spaces are target-specific." (LLVM Language Reference Manual — LLVM 18.0.0git documentation)
From this description I understand that address spaces in the IR are physical address spaces. Because of this I consider wrong to use this property as is to represent inside the IR logical address space. Doing that would imply that each backend should be aware of language specific mapping: currently this is not the case and IMO it's a bad idea to have this.
But a derived information from the source language is still useful to perform optimization, both in the IR and later in the backend: the logical distinction of address spaces is still useful and IMO shoul be represented in the IR. Have both logical and physical address spaces information (it's not important to know is "AS1" means global or local, it's enough to know that 1 is differnt from 2) would be useful to have a better alias analysis also for those targets that physically have one unique address space. I consider that this can be solved independently from the mangling problem.
The answer to both question, I suggested to introduce another map in order to preserve the distinction between address spaces also for those targets that do not have physical distinct address spaces, like X86, and through this solve the problem related to the mangler.
As previously discussed, this is not the only viable solution, the mapping of logical address spaces to physical address spaces can be delayed till instruction selection: this would allow the frontend to lower this information in a target independent manner demanding a late IR pass the mapping task (this task would be language/target dependent, so basically who builds the pass pipeline must schedule this language dependent task that requires target informations). Still here may be useful to preserve the logical information of address spaces.
This kind of solution is feasible, but simply it does not seem the way chosen in clang to solve the problem.
My proposal was the one with the minimal impact on the codebase trying to maintain a desirable flexibility in order to build opencl toolchain compatible with the past.
Could you explain to me what you are proposing? How the mangler should be fixed? How address spaces are lowered in the IR? This lowering is target dependent or not? The mangling is also target dependent?
Thanks in advance.
Best regards,
-Michele