Hello,
We would like to propose a way for improving the diagnostics for
address_space by being able to pass strings as an argument to it
instead of just an integer. This was initially proposed before
(http://lists.llvm.org/pipermail/cfe-dev/2018-August/058702.html) but
did not focus on it at the time.
Reasoning:
Clang's __attribute__((address_space(...))) feature uses an arbitrary
integer as the discriminator of address spaces. This is effectively a
global name space of integers in an API. The meaningless integers are
used in diagnostics, which is not very informative. Moreover, the
maintenance of the integer assignments is error-prone. It would be
better if this could be a string rather than an integer. It's easy
enough to pick string prefixes that distinguish the use in one part of
an API from others in separately-maintained APIs, just as with symbol
name prefixes in C APIs.
Goals:
- Allow for address_space to accept strings so various APIs do not
need to share the same global name space of integers
(address_space("API1") != address_space("API2"))
- Print the string argument instead of an integer if address_space is
provided with a string for more understandable error messages.
(Possible) Implementation:
Off the top of my head, it seems that a large chunk of this would be
changing how address spaces are managed. Currently, it seems that all
address_spaces are limited to just integers with a few spaces reserved
for OpenCL and CUDA under the LangAS enum. I'm thinking that LangAS
itself could be a base class with 2 different subclasses that take
either integers or strings and when storing them in a qualifier, part
of the address space mask could be dedicated to indicating if this
LangAS is a string or integer. The only thing I can't think of is how
printing the AS would work in a diagnostic since, based off the
existing TypePrinter/Qualifiers::print() method, I do not think I
would be able to store a reference back to some mapping of addr spaces
to strings since it seems that the Qualifiers class is meant to be 32
bits long and passed by value.
I was going to come up with a proof of concept in the meantime, but
wanted to ask sooner to see if anyone had ideas on this or ideas on
how this could be implemented. Any feedback is welcome and I don't
mind answering questions.
Thanks,
Leonard
Stash a lookup table from integers to strings in Context and dynamically allocate integers for new strings. You can then keep integers in most of the code, writing/displaying strings for the integers with an entry in the table when writing to files or displaying.
Jacob Lifshay
@Jacob Lifshay
In the case of the TypePrinter though, is there a way to access
ASTContext? Part of my dilemma is that for diagnostics that simply
print the type or the qualifiers, there doesn't seem to be an
established way to access Context so I wouldn't be able to access this
mapping. I'm thinking either this will require some refactoring of how
types are printed to expose ASTContext or perhaps Qualifiers could be
changed somehow such that the 23 bits allocated for the address space
could instead point to some representation of an address_space that in
turn could represent either a string or integer.
- Leonard
I don’t know about clang, but it should work for llvm.
Just to understand better on the language side do you suggest to keep integer value for an attribute too? I think for the legacy or speed reason some users might still prefer integers.
In Clang AST address spaces are stored in Qualifiers as integer values. So you would need some conversion and a map from string to integers to be stored. ASTContext could be a good place to store the map since it’s accessible during all the different phases - parsing, sema, codegen.
Btw, are you suggesting this only for Clang? Other than emitting the address space map (as metadata perhaps?) in IR, are there any other changes you plan?
Cheers,
Anastasia
I don’t (at the moment) have an opinion on the implementation, but to the broader principle of improving the representation of these address spaces, this sounds like it would scratch an itch of mine. In XLA nvptx backend, passing around these integer address spaces has lead to a few near-bugs, and when reading nvptx IR, it would be super-nice if I could somehow be reminded of what address space 5 is.
-Justin
@Anastasia
Just to understand better on the language side do you suggest to keep integer value for an attribute too? I think for the legacy or speed reason some users might still prefer integers.
Yup. I should've clarified on this, I intend to also keep the existing
integer parameters. It's just that no string address space corresponds
to any integer address space.
In Clang AST address spaces are stored in Qualifiers as integer values. So you would need some conversion and a map from string to integers to be stored. ASTContext could be a good place to store the map since it's accessible during all the different phases - parsing, sema, codegen.
I see that this map could be stored in ASTContext. Currently, I think
it's more a matter of accessing this map in the TypePrinter. I think
the printer is accessed mostly through various getAsString() methods
of a QualType but these do not access ASTContext or Sema in any way. I
think this would require refactoring a bunch of refactoring for how
types can be printed to expose ASTContext to the TypePrinter.
Alternatively, since the address space is only stored in the
Qualifiers class as part of a mask, instead Qualifiers could have some
reference to some abstract representation of an address space which
could represent either an integer or string. But these are just some
of my ideas and would like to know if there are better ways of
approaching this.
Btw, are you suggesting this only for Clang? Other than emitting the address space map (as metadata perhaps?) in IR, are there any other changes you plan?
Yup, this is only meant to be a clang feature. Should've clarified on
this also. We have no intentions on changing llvm or dumping this map
as metadata.
- Leonard
What integer are you emitting in LLVM for a given string?
I assume we need a deterministic and immutable mapping if you don’t propagate the string to LLVM.
I don't know where the integers are emitted for address spaces during
codegen, but I figure what we could do is before IR codegen, keep a
set of all integer spaces used. During codegen, the integer we emit
will be a random number up to the largest possible address space, that
is not in this initial set, for each string in our address space
string container. In this case, the integer values would be lazily
evaluated.
That strategy does not seem stable with respect to LTO for instance: the address space “Foo” in foo.c would be different from the address space “Foo” in bar.c from the LLVM point of view.
Hmm. I don’t know the internals of how lto works, but do you think it would be feasible for lto to differentiate between the 2 "Foo"s if we also dumped the string map as part of metadata?
Without understanding LTO in particular, you can see problem as the ability to encode an identifier for a namespace valid across multiple files. Having a string map in the LLVM module would achieve this since you can always remap from integer to the “canonical” ID which is the string.
Leonard Chan via cfe-dev <cfe-dev@lists.llvm.org> writes:
I don't know where the integers are emitted for address spaces during
codegen, but I figure what we could do is before IR codegen, keep a
set of all integer spaces used. During codegen, the integer we emit
will be a random number up to the largest possible address space, that
is not in this initial set, for each string in our address space
string container. In this case, the integer values would be lazily
evaluated.
I get nervous any time "random" is used with respect to compiler output.
We need things to be deterministic to be able to debug problems.
-David
I think this is a wider problem to solve. Adding explicit semantic to address spaces is something we would indeed like to see in LLVM IR. Neil Hickey (in CC) was touching this in his LLVM Dev Meeting talk: https://llvm.org/devmtg/2018-10/talk-abstracts.html#talk8
I see that this map could be stored in ASTContext. Currently, I think
it’s more a matter of accessing this map in the TypePrinter. I think
the printer is accessed mostly through various getAsString() methods
of a QualType but these do not access ASTContext or Sema in any way. I
think this would require refactoring a bunch of refactoring for how
types can be printed to expose ASTContext to the TypePrinter.
It seems for printing some Decls the ASTContext is being queried from there, but we are not storing it in the types indeed.
Alternatively, since the address space is only stored in the
Qualifiers class as part of a mask, instead Qualifiers could have some
reference to some abstract representation of an address space which
could represent either an integer or string. But these are just some
of my ideas and would like to know if there are better ways of
approaching this.
This could work too as soon as you find some way to unique address space strings that are the same.
I would still suggest to create some sort of a map of Int->String and avoid modifying too much how qualifiers are stored and represented. You can perhaps store this as static field in Qualifiers. You could then add some method similar to getAddressSpaceAttributePrintValue() that can be used to obtain the string representation.