address_space pragma

We would like to propose a new pragma that changes the error message
produced when using pointers with different address spaces. For
example, given the following:

  1 #define SPACE1 __attribute__((address_space(1)))
  2 #define SPACE2 __attribute__((address_space(2)))
  3
  4 void func(char SPACE1 *a){}
  5
  6 int main() {
  7   char SPACE2 *a;
  8   func(a);
  9 }

The following error is produced:

error: passing '__attribute__((address_space(2))) char *' to parameter
of type '__attribute__((address_space(1))) char *' changes address
space of pointer

We would like to propose a pragma that would allow for binding an
address_space to a string, and a list of other attributes used with
address_space, such that the error will instead print the string
passed instead of the whole attribute. That is, this would be dumped
instead:

error: passing 'SPACE2 char *' to parameter of type 'SPACE1 char *'
changes address space of pointer

from the following code

  1 #pragma clang address_space "SPACE1"
  2 #pragma clang address_space "SPACE2"
  3 #define SPACE1 __attribute__((address_space("SPACE1")))
  4 #define SPACE2 __attribute__((address_space("SPACE2")))
  5
  6 void func(char SPACE1 *a){}
  7
  8 int main() {
  9   char SPACE2 *a;
 10   func(a);
 11 }

The proposed usage would be along the lines:
`#pragma clang address_space <string> <variable-attributes>`

where the address_space pragma accepts a string as the first argument
to indicate what gets printed in the error and a list of other
attributes that one would want to pair with address_space.

This would also be useful with codebases that use Sparse
(http://sparse.wiki.kernel.org/), a static analyzer used on the Linux
kernel that uses special annotations to convey information about
certain data types. One such usage is with `__user` which is defined
as `__attribute__((noderef, address_space(1)))`, where pointers
declared with `__user` complain if they are dereferenced or it’s used
for address spaces that aren’t address_space(1).

In the case of tools that use Sparse, instead of printing
`__attribute__((noderef, address_space(1))) void *` for the __user
macro, `__user void *` would be printed instead when using `#pragma
clang address_space "__user" noderef`.

The changes in this patch would include (though I could naively be
missing some other stuff):
- Adding the address_space pragma which will require some internal
mapping of the string to address_space and a list of other attributes
used with it.
- Changing the address_space attribute to accept strings in addition
to integers. Accepting a string will map to whatever attributes were
used in the pragma while accepting an int defaults to the regular
address_space behavior.

For now we intend to use this primarily with address_space but are
open to feedback on other ways we can improve this or expand on the
usage for other attributes.

- Leo

Is your expectation that these address spaces will be exclusively used for diagnostics?
Should they support promotions between address spaces?

Do you have an idea of what the "variable attributes" would look like?

I am somewhat concerned about introducing a rather significant new language feature using pragmas.

John.

Is your expectation that these address spaces will be exclusively used for diagnostics?

Yes. This will serve as a way of essentially shortening an attribute
list in the diagnostic for address_space since this is what's useful
to us for now, but we are open to expanding this to fit other use
cases as long as we get this.

Should they support promotions between address spaces?

We weren't planning on this. The idea is that each new address space
created with this pragma will be a new address space disassociated
from any previous one.

Do you have an idea of what the "variable attributes" would look like?

Should've clarified on this more. This is the syntax of regular
attributes that you would normally declare in an attribute list but
space separated. So something like

#define __iomem __attribute__((address_space(2), noderef, other_attr(arg)))

would be represented as

#pragma clang address_space "__iomem" noderef other_attr(arg)
#define __iomem __attribute__((address_space("__iomem")))

Though after experimenting for a bit, I'm running into a few
unexpected errors and may change the syntax. I've been looking into
arphaman's patch for the pragma clang attribute and may borrow his
syntax.

Is your expectation that these address spaces will be exclusively used for diagnostics?

Yes. This will serve as a way of essentially shortening an attribute
list in the diagnostic for address_space since this is what's useful
to us for now, but we are open to expanding this to fit other use
cases as long as we get this.

Should they support promotions between address spaces?

We weren't planning on this. The idea is that each new address space
created with this pragma will be a new address space disassociated
from any previous one.

And these are supposed to work like actual language address spaces, right?
Which is to say, they're not interconvertible at all? I'm a little confused about how
this fights into a code-analysis framework when it's actually a major semantic
change.

Do you have an idea of what the "variable attributes" would look like?

Should've clarified on this more. This is the syntax of regular
attributes that you would normally declare in an attribute list but
space separated. So something like

#define __iomem __attribute__((address_space(2), noderef, other_attr(arg)))

would be represented as

#pragma clang address_space "__iomem" noderef other_attr(arg)
#define __iomem __attribute__((address_space("__iomem")))

Though after experimenting for a bit, I'm running into a few
unexpected errors and may change the syntax. I've been looking into
arphaman's patch for the pragma clang attribute and may borrow his
syntax.

The diagnostic goals could probably be achieved by just making a better effort to
recognize and preserve that an address_space attribute was written with a macro.

John.

>
>> Is your expectation that these address spaces will be exclusively used for diagnostics?
>
> Yes. This will serve as a way of essentially shortening an attribute
> list in the diagnostic for address_space since this is what's useful
> to us for now, but we are open to expanding this to fit other use
> cases as long as we get this.
>
>> Should they support promotions between address spaces?
>
> We weren't planning on this. The idea is that each new address space
> created with this pragma will be a new address space disassociated
> from any previous one.

And these are supposed to work like actual language address spaces, right?
Which is to say, they're not interconvertible at all? I'm a little confused about how
this fights into a code-analysis framework when it's actually a major semantic
change.

These are meant to work like regular address spaces, but instead of
accepting an integer to represent a unique space, some unique
serializable ID is given instead. That is in your code, you should be
able to pass strings instead of integers to all instances you use the
attribute address_space, assuming you declare the pragma beforehand,
and you shouldn't have to change anything else.

I could be missing something about address spaces, but my
understanding was that they were for preventing usage of pointers from
one space in another and all address spaces were unique.

>> Do you have an idea of what the "variable attributes" would look like?
>
> Should've clarified on this more. This is the syntax of regular
> attributes that you would normally declare in an attribute list but
> space separated. So something like
>
> ```
> #define __iomem __attribute__((address_space(2), noderef, other_attr(arg)))
> ```
>
> would be represented as
>
> ```
> #pragma clang address_space "__iomem" noderef other_attr(arg)
> #define __iomem __attribute__((address_space("__iomem")))
> ```
>
> Though after experimenting for a bit, I'm running into a few
> unexpected errors and may change the syntax. I've been looking into
> arphaman's patch for the pragma clang attribute and may borrow his
> syntax.

The diagnostic goals could probably be achieved by just making a better effort to
recognize and preserve that an address_space attribute was written with a macro.

I'll look into this also.

Is your expectation that these address spaces will be exclusively used for diagnostics?

Yes. This will serve as a way of essentially shortening an attribute
list in the diagnostic for address_space since this is what's useful
to us for now, but we are open to expanding this to fit other use
cases as long as we get this.

Should they support promotions between address spaces?

We weren't planning on this. The idea is that each new address space
created with this pragma will be a new address space disassociated
from any previous one.

And these are supposed to work like actual language address spaces, right?
Which is to say, they're not interconvertible at all? I'm a little confused about how
this fights into a code-analysis framework when it's actually a major semantic
change.

These are meant to work like regular address spaces, but instead of
accepting an integer to represent a unique space, some unique
serializable ID is given instead. That is in your code, you should be
able to pass strings instead of integers to all instances you use the
attribute address_space, assuming you declare the pragma beforehand,
and you shouldn't have to change anything else.

Yes, I understood that.

I could be missing something about address spaces, but my
understanding was that they were for preventing usage of pointers from
one space in another and all address spaces were unique.

Okay, so you do want them to function as actual address spaces. And I can
understand why this is useful in the kernel. I think I'm just surprised to hear this
called "static analysis"; it's much more like a language dialect that's fortunately
easy to disable with macros. For example, if the kernel wanted to use this to
give __user pointers a different representation or use different code-generation
patterns for accesses to them, it could pretty easily do that using these annotations
and the right compiler support.

So, as I see it, the goals of the feature are:

- Arbitrary code can declare an address space nominally without worrying about
  having picked a unique number. Definitely a win.

- The program can have all the benefits of a custom qualifier in terms of type-checking,
  and the compiler can be smart enough to just erase the address space when generating
  code. (Conceivably it could erase it to some other attribute space.) Also a clear win.

- The address space name will get preserved in diagnostics. This one is questionable
  to me because it's only "preserved" in a really ugly way: the programmer wrote
  `__user`, and even with this feature, the diagnostic will be talking about
  `__attribute__((address_space("user")))`. This is why I think you want to look into
  presenting the original macro if you can.

- The attribute can automatically apply the optional attributes. This part also seems
  poorly-motivated to me. `__attribute__((address_space("user")))` is still an awful
  thing to write all over the place, so in practice you're always going to want to define
  a macro that programmers will use, and of course that macro can apply other attributes.

I'll also repeat that I don't think you want do this with a pragma; I can imagine all sorts
of ways to extend this feature in the future, but jamming all that information after a #pragma
is very awkward, and you can't use the preprocessor inside a #pragma. Consider
making it an `__address_space user` declaration instead, with some ad-hoc syntax inside
to describe the various properties of the address space.

John.

>
>>
>>>
>>>> Is your expectation that these address spaces will be exclusively used for diagnostics?
>>>
>>> Yes. This will serve as a way of essentially shortening an attribute
>>> list in the diagnostic for address_space since this is what's useful
>>> to us for now, but we are open to expanding this to fit other use
>>> cases as long as we get this.
>>>
>>>> Should they support promotions between address spaces?
>>>
>>> We weren't planning on this. The idea is that each new address space
>>> created with this pragma will be a new address space disassociated
>>> from any previous one.
>>
>> And these are supposed to work like actual language address spaces, right?
>> Which is to say, they're not interconvertible at all? I'm a little confused about how
>> this fights into a code-analysis framework when it's actually a major semantic
>> change.
>
> These are meant to work like regular address spaces, but instead of
> accepting an integer to represent a unique space, some unique
> serializable ID is given instead. That is in your code, you should be
> able to pass strings instead of integers to all instances you use the
> attribute address_space, assuming you declare the pragma beforehand,
> and you shouldn't have to change anything else.

Yes, I understood that.

> I could be missing something about address spaces, but my
> understanding was that they were for preventing usage of pointers from
> one space in another and all address spaces were unique.

Okay, so you do want them to function as actual address spaces. And I can
understand why this is useful in the kernel. I think I'm just surprised to hear this
called "static analysis"; it's much more like a language dialect that's fortunately
easy to disable with macros. For example, if the kernel wanted to use this to
give __user pointers a different representation or use different code-generation
patterns for accesses to them, it could pretty easily do that using these annotations
and the right compiler support.

So, as I see it, the goals of the feature are:

- Arbitrary code can declare an address space nominally without worrying about
  having picked a unique number. Definitely a win.

- The program can have all the benefits of a custom qualifier in terms of type-checking,
  and the compiler can be smart enough to just erase the address space when generating
  code. (Conceivably it could erase it to some other attribute space.) Also a clear win.

- The address space name will get preserved in diagnostics. This one is questionable
  to me because it's only "preserved" in a really ugly way: the programmer wrote
  `__user`, and even with this feature, the diagnostic will be talking about
  `__attribute__((address_space("user")))`. This is why I think you want to look into
  presenting the original macro if you can.

Oh it wouldn't be `__user` the macro that get's presented, it would be
the string passed to address_space() that's printed. So in this
example it would be "user", not `__user`. I think it would feel
awkward if we worked around this one special case where if we have an
address_space attribute that accepted a string and was defined in a
macro, then we would print the macro, otherwise the string. I'm not
sure what overhead there is on that, but I believe the way the whole
attribute gets printed is by just recording the start and end
SourceLocation for the whole attribute, so instead we could just
return the start and end SL of the string argument.

- The attribute can automatically apply the optional attributes. This part also seems
  poorly-motivated to me. `__attribute__((address_space("user")))` is still an awful
  thing to write all over the place, so in practice you're always going to want to define
  a macro that programmers will use, and of course that macro can apply other attributes.

I'll also repeat that I don't think you want do this with a pragma; I can imagine all sorts
of ways to extend this feature in the future, but jamming all that information after a #pragma
is very awkward, and you can't use the preprocessor inside a #pragma. Consider
making it an `__address_space user` declaration instead, with some ad-hoc syntax inside
to describe the various properties of the address space.

Could you clarify on this `__address_space user` declaration? One of
the things we would like is for a given address_space to always have a
given list of attributes, but that could also be handled in the macro.

A few other error messages look for macro definitions and, if there is a macro that expands to the correct character sequence, use that macro name instead in the error message. Is there a reason that the same approach wouldn’t work here?

David

I'm also looking into this. Could you provide an example of a
diagnostic that does this?

Is your expectation that these address spaces will be exclusively used for diagnostics?

Yes. This will serve as a way of essentially shortening an attribute
list in the diagnostic for address_space since this is what's useful
to us for now, but we are open to expanding this to fit other use
cases as long as we get this.

Should they support promotions between address spaces?

We weren't planning on this. The idea is that each new address space
created with this pragma will be a new address space disassociated
from any previous one.

And these are supposed to work like actual language address spaces, right?
Which is to say, they're not interconvertible at all? I'm a little confused about how
this fights into a code-analysis framework when it's actually a major semantic
change.

These are meant to work like regular address spaces, but instead of
accepting an integer to represent a unique space, some unique
serializable ID is given instead. That is in your code, you should be
able to pass strings instead of integers to all instances you use the
attribute address_space, assuming you declare the pragma beforehand,
and you shouldn't have to change anything else.

Yes, I understood that.

I could be missing something about address spaces, but my
understanding was that they were for preventing usage of pointers from
one space in another and all address spaces were unique.

Okay, so you do want them to function as actual address spaces. And I can
understand why this is useful in the kernel. I think I'm just surprised to hear this
called "static analysis"; it's much more like a language dialect that's fortunately
easy to disable with macros. For example, if the kernel wanted to use this to
give __user pointers a different representation or use different code-generation
patterns for accesses to them, it could pretty easily do that using these annotations
and the right compiler support.

So, as I see it, the goals of the feature are:

- Arbitrary code can declare an address space nominally without worrying about
having picked a unique number. Definitely a win.

- The program can have all the benefits of a custom qualifier in terms of type-checking,
and the compiler can be smart enough to just erase the address space when generating
code. (Conceivably it could erase it to some other attribute space.) Also a clear win.

- The address space name will get preserved in diagnostics. This one is questionable
to me because it's only "preserved" in a really ugly way: the programmer wrote
`__user`, and even with this feature, the diagnostic will be talking about
`__attribute__((address_space("user")))`. This is why I think you want to look into
presenting the original macro if you can.

Oh it wouldn't be `__user` the macro that get's presented, it would be
the string passed to address_space() that's printed. So in this
example it would be "user", not `__user`.

It'll be __attribute__((address_space("user"))). I don't think there are any diagnostics that
actually extract the argument to address_space and print that; they all just print the type,
and the type printer will expand the whole attribute because it doesn't know any better.

I think it would feel
awkward if we worked around this one special case where if we have an
address_space attribute that accepted a string and was defined in a
macro, then we would print the macro, otherwise the string. I'm not
sure what overhead there is on that, but I believe the way the whole
attribute gets printed is by just recording the start and end
SourceLocation for the whole attribute, so instead we could just
return the start and end SL of the string argument.

The type printer has multiple hacks dedicated to printing out types in a more
user-friendly way. There are a lot of types where most users don't know about the
attributes that actually implement them; two examples that come to mind are vector
types and the ARC ownership qualifiers. Making an effort to pretty-print a qualifier
would not be unprecedented.

- The attribute can automatically apply the optional attributes. This part also seems
poorly-motivated to me. `__attribute__((address_space("user")))` is still an awful
thing to write all over the place, so in practice you're always going to want to define
a macro that programmers will use, and of course that macro can apply other attributes.

I'll also repeat that I don't think you want do this with a pragma; I can imagine all sorts
of ways to extend this feature in the future, but jamming all that information after a #pragma
is very awkward, and you can't use the preprocessor inside a #pragma. Consider
making it an `__address_space user` declaration instead, with some ad-hoc syntax inside
to describe the various properties of the address space.

Could you clarify on this `__address_space user` declaration? One of
the things we would like is for a given address_space to always have a
given list of attributes, but that could also be handled in the macro.

I was just sketching an alternative language design that I think works better than a pragma.
You're adding a language feature here; you have to do proper language design for it.

John.

Ok. So we will no longer use the pragma. We will instead opt for your
suggestion of printing the macro. So the new goals are now:

- Having address_space() accept strings in addition to integers to
represent unique address spaces
- Changing the type printer to print the macro instead of the whole
attribute if the attribute was passed a string instead of an int.

Ok. So we will no longer use the pragma. We will instead opt for your
suggestion of printing the macro. So the new goals are now:

- Having address_space() accept strings in addition to integers to
represent unique address spaces

Is this still important if you get the diagnostic improvement? If you were declaring address
spaces somehow, this seems fine, but I don't really want to make address_space("whatever")
implicitly declare an address space with specific semantics when it's not hard to imagine
wanting that for other purposes in the future.

- Changing the type printer to print the macro instead of the whole
attribute if the attribute was passed a string instead of an int.

I don't know that the latter needs to be restricted; it seems like a nice usability improvement
for all address-space users.

John.

>
> Ok. So we will no longer use the pragma. We will instead opt for your
> suggestion of printing the macro. So the new goals are now:
>
> - Having address_space() accept strings in addition to integers to
> represent unique address spaces

Is this still important if you get the diagnostic improvement? If you were declaring address
spaces somehow, this seems fine, but I don't really want to make address_space("whatever")
implicitly declare an address space with specific semantics when it's not hard to imagine
wanting that for other purposes in the future.

Not entirely. This was something that's still remnant of when we were
using pragma. It's something that would be nice for us now, but could
also be discussed later down the line.

> - Changing the type printer to print the macro instead of the whole
> attribute if the attribute was passed a string instead of an int.

I don't know that the latter needs to be restricted; it seems like a nice usability improvement
for all address-space users.

Ok, so in general if address_space is used in a macro, then print the macro.

Yeah. You'll need to record this spelling information in the AttributedType, since
the Type won't otherwise have a source location attached.

I'm not sure we actually make an AttributedType for address_space, but doing
so is reasonable.

We should do a quick parse of the macro's token sequence to verify that it's just
an attribute. Unfortunately, I'm not really sure how to do that.

John.

As I have it now, I store a map of source locations to macro
definition names as they are being parsed and that's where I attempt
to find the attribute. Since the printing of the address_space is
controlled by the qualifier, I added an extra field to the qualifier
instead for storing the macro name. I can get this name during the
address space attribute construction where the source location of
address_space is available to me.

Would you mind if I added you as a reviewer to this patch after I clean it up?

As I have it now, I store a map of source locations to macro
definition names as they are being parsed and that's where I attempt
to find the attribute. Since the printing of the address_space is
controlled by the qualifier, I added an extra field to the qualifier
instead for storing the macro name. I can get this name during the
address space attribute construction where the source location of
address_space is available to me.

Would you mind if I added you as a reviewer to this patch after I clean it up?

Most of this preprocessor stuff is really beyond me; I'm probably not the best
reviewer. Richard might have a better suggestion.

I *think* the macro name can be recovered from the existing recorded
information in the AST; it's just not really obvious. (Adding a method for this
would be great!) Again, I'm not an expert, but I think the idea is:
  - You need to find the right level of macro expansion, keeping in mind that the
    reference to the attribute macro might itself be written in a macro.
  - You need to get the ExpansionInfo for that expansion.
  - You need to verify that it's a macro body expansion and that it's not a function
    macro expansion.
  - At that point, getExpansionLocStart() should be the location of the token for the
    macro, and you can just ask the preprocessor for the spelling of that token.

John.

Hi,

I'm a bit late to the party, but all this address space stuff sounds interesting to me as we are heavy users of address space functionality downstream.

If I understand correctly, you want to include the spelling of the AS in the type. Would this be restricted purely to spellings as macros, or would it also work for keywords? There are a bunch of hard coded strings in the type printer to print out the specific names of ASes for OpenCL, and we have also added our ASes to that hack. If it were possible to encode this straight in the type itself based on keyword/identifier instead of special-casing in the printer, that would probably be cleaner.

Feel free to add me as reviewer if you make a patch.

/ Bevin

As I have it now, I store a map of source locations to macro
definition names as they are being parsed and that’s where I attempt
to find the attribute. Since the printing of the address_space is
controlled by the qualifier, I added an extra field to the qualifier
instead for storing the macro name. I can get this name during the
address space attribute construction where the source location of
address_space is available to me.

Would you mind if I added you as a reviewer to this patch after I clean it up?

Most of this preprocessor stuff is really beyond me; I’m probably not the best
reviewer. Richard might have a better suggestion.

I think the macro name can be recovered from the existing recorded
information in the AST; it’s just not really obvious. (Adding a method for this
would be great!) Again, I’m not an expert, but I think the idea is:

  • You need to find the right level of macro expansion, keeping in mind that the
    reference to the attribute macro might itself be written in a macro.
  • You need to get the ExpansionInfo for that expansion.
  • You need to verify that it’s a macro body expansion and that it’s not a function
    macro expansion.
  • At that point, getExpansionLocStart() should be the location of the token for the
    macro, and you can just ask the preprocessor for the spelling of that token.

Yes, that’s basically right. Lexer::getImmediateMacroName does most of this dance for you, but you’d still need to work out the right level of macro expansion for yourself. (You probably want to step out through expansion ranges until you reach one whose start is attribute token and whose end is the closing ) token, or something like that.) It would be nice to give this treatment to all cases where a macro exactly covers all the tokens in a type attribute.

Depending on how you want this to work, there’s another option: Preprocessor::getLastMacroWithSpelling can be used to find the macro defined most recently before a given source location that expands to a given sequence of tokens. Unlike getImmediateMacroName, getLastMacroWithSpelling should only be used when producing a diagnostic, because it needs to look through all defined macros (including all macros imported from modules) which is not really efficient enough to be appropriate for a compilation that doesn’t produce diagnostics.

I currently have an implementation that uses
Preprocessor::getLastMacroWithSpelling to skim through the macro
definition tokens for finding __attribute__ and address_space, but
based off the unittests for Lexer::getImmediateMacroName, that looks
like it could also work.

The main issue I'm running into is that both of these require a
SourceLocation, which doesn't seem to be available when printing the
Qualifier in the typeprinter. So the only way I can actually use the
macro in the typeprinter is by saving either the SourceLocation or
macro name in the qualifier, which in turn is created through calls to
ASTContext::getAddrSpaceQualType.

My initial guess for tackling this is changing getAddrSpaceQualType to
accept an optional SourceLocation, but that function is used in many
places where a SourceLocation may not be available.

I currently have an implementation that uses
Preprocessor::getLastMacroWithSpelling to skim through the macro
definition tokens for finding attribute and address_space, but
based off the unittests for Lexer::getImmediateMacroName, that looks
like it could also work.

The main issue I’m running into is that both of these require a
SourceLocation, which doesn’t seem to be available when printing the
Qualifier in the typeprinter. So the only way I can actually use the
macro in the typeprinter is by saving either the SourceLocation or
macro name in the qualifier, which in turn is created through calls to
ASTContext::getAddrSpaceQualType.

My initial guess for tackling this is changing getAddrSpaceQualType to
accept an optional SourceLocation, but that function is used in many
places where a SourceLocation may not be available.

I think that’s the wrong place to handle this. Instead, I would suggest you look at the places where we form an AttributedType / AttributedTypeLoc, which should be a lot rarer, and should always have a source location. Then you can save the macro name information on the AttributedType itself (say, as an IdentifierInfo*).