parsing code snippets

Hi @clang,

is there any way to parse code snippets wrt. to e.g. a function context?
Here is the practical problem:

void foo(float* a)
{
#pragma scout vectorize aligned(a)
   for (int i = 0; i < 100; ++i)
     a[i] = 0.0;
}

The argument for "aligned" is an expression, that is expected to be well-formed in the function context. Now the idea is to parse the argument, then compute the MemRegion it refers to and eventually align all accesses to SubRegions of the given MemRegion (the MemRegion part is already done).
As long as the argument is an identifier only, retrieving the expression is easy. But of course I want to support other expressions (including member and array subscript expressions) too.
I'm afraid that there is no way to achieve this using clang means, because there is no way to access an ASTContext from a PragmaHandler.
The opther way would be to create a Sema object, set its state "somehow" to the particular function DeclContext and the input stream to the argument string and then call ParseExpression. But Sema seems not to be prepared for such a task either.
However before I write my own stripped-down parser I better ask here. Maybe I've overlooked something.

Best regards
Olaf Krzikalla

Why not turn the "#pragma scout vectorize" token sequence into a special token (tok::pragma_scout_vectorize), then have the parser recognize that token (in whatever contexts is makes sense, e.g., a statement context for this case) and handle the actual #pragma parsing?

  - Doug

Hi,

Why not turn the "#pragma scout vectorize" token sequence into a special token (tok::pragma_scout_vectorize), then have the parser recognize that token (in whatever contexts is makes sense, e.g., a statement context for this case) and handle the actual #pragma parsing?

Hmm, that still means heavy changes in the parser itself, doesn't it? At least I haven't found any callback or customization machinery there.
Another thing I have to consider is that I want to switch to C0xx attributes as fast as possible. That is, in the future the code shall look like this:

    [[scout::vectorize, scout::aligned(a)]]
    for (int i = 0; i < 100; ++i)
      a[i] = 0.0;

My current approach is a dirty hack, which however worked out so far:
I published Parser::ParseExpression and Parser::ConsumeAnyToken (the latter is necessary in order to move Parser::Tok away from the "pragma")
and store a global pointer to the parser in clang::ParseAST, which I later retrieve from my pragma handler. Of course this is proof-of-concept only. However, judging from N2418, sec.10:

attribute-argument: assignment-expression | type-id

clang will need such a feature in the future anyway in order to support custom attributes (I hope that such a feature is planned).
My questions are:
1. Do you have any headaches if the Parser have two public functions ParseExpression and ParseTypeName (which internally might do some preparation work before calling their private counterparts)?
2. How can we expose the current parser obejct to custom handlers (attribute or pragma)?

Best regards
Olaf Krzikalla

Hi,

Why not turn the "#pragma scout vectorize" token sequence into a special token (tok::pragma_scout_vectorize), then have the parser recognize that token (in whatever contexts is makes sense, e.g., a statement context for this case) and handle the actual #pragma parsing?

Hmm, that still means heavy changes in the parser itself, doesn't it? At
least I haven't found any callback or customization machinery there.
Another thing I have to consider is that I want to switch to C0xx
attributes as fast as possible. That is, in the future the code shall
look like this:

   [[scout::vectorize, scout::aligned(a)]]
   for (int i = 0; i < 100; ++i)
     a[i] = 0.0;

My current approach is a dirty hack, which however worked out so far:
I published Parser::ParseExpression and Parser::ConsumeAnyToken (the
latter is necessary in order to move Parser::Tok away from the "pragma")
and store a global pointer to the parser in clang::ParseAST, which I
later retrieve from my pragma handler. Of course this is
proof-of-concept only. However, judging from N2418, sec.10:

attribute-argument: assignment-expression | type-id

clang will need such a feature in the future anyway in order to support
custom attributes (I hope that such a feature is planned).
My questions are:
1. Do you have any headaches if the Parser have two public functions
ParseExpression and ParseTypeName (which internally might do some
preparation work before calling their private counterparts)?

To put this into the open-source tree, I'd want some evidence that this is the right way to support the kind of extensibility you're looking for. I think there's probably a better way.

2. How can we expose the current parser obejct to custom handlers
(attribute or pragma)?

I don't have a good answer. For attributes, the idea was that we would have a way to describe the grammar of the attribute (with a *very* restrictive syntax), and have tblgen generate the attribute parsers from that. But, as far as I know, nobody is working on this at the moment.

  - Doug

To put this into the open-source tree, I'd want some evidence that this is the right way to support the kind of extensibility you're looking for. I think there's probably a better way.

Yep, I'm unsure about the proper approach either...

2. How can we expose the current parser obejct to custom handlers
(attribute or pragma)?

I don't have a good answer. For attributes, the idea was that we would have a way to describe the grammar of the attribute (with a *very* restrictive syntax), and have tblgen generate the attribute parsers from that. But, as far as I know, nobody is working on this at the moment.

... but maybe we can tie and handle both issues together.
I wouldn't mind about the handling of expressions inside pragmas (which actually is part of the preprocessor!) if I have the freedom to handle them in attribute lists. And if nobody is working an that issue and I need it, then... well, I will be the one (I hesitated mainly because actually Sean Hunt had worked on that issue and there is already some handling of C0xx attributes in clang at various locations).

I'm sure, attributes were initially and still are intendend for the use with expressions (N2418 clearly mentions expressions and
according to the draft n3242 the grammar for arguments is broadened even further).
That said, IMHO we need
1. a callback mechanism for attributes similiar to the PragmaHandler (I would restrict customization to scoped attributes and use the scope name similiar tho the pragma ns).
2. a map from Stmts to Attrs in the ASTContext similiar to DeclAttrs.
3. an attr::Kind "UserDefined" and an appropriate Attr subclass.
4. means in the attribute handler in order to conveniently parse attribute arguments.

I think that we can reach an agreement about approaches for pt.1-3 rather quickly.
For pt.4 my first idea would be an attribute handler which has some (protected) functions which supports the parsing of argument lists.
Or we pass the parser as an argument to the handler function (like it is done with the preprocessor in the pragma handler) but then we have to publish the parser interface.
Of course the other way is a somewhat declarative interface. However I don't know how to design it open and extensible. Is there any preliminary work in that direction?

Best regards
Olaf Krzikalla

The fine print: If I work on attributes, I want a bit in LangOptions in order to enable attributes beyond C++0x :slight_smile:

To put this into the open-source tree, I'd want some evidence that this is the right way to support the kind of extensibility you're looking for. I think there's probably a better way.

Yep, I'm unsure about the proper approach either...

2. How can we expose the current parser obejct to custom handlers
(attribute or pragma)?

I don't have a good answer. For attributes, the idea was that we would have a way to describe the grammar of the attribute (with a *very* restrictive syntax), and have tblgen generate the attribute parsers from that. But, as far as I know, nobody is working on this at the moment.

... but maybe we can tie and handle both issues together.
I wouldn't mind about the handling of expressions inside pragmas (which actually is part of the preprocessor!) if I have the freedom to handle them in attribute lists. And if nobody is working an that issue and I need it, then... well, I will be the one (I hesitated mainly because actually Sean Hunt had worked on that issue and there is already some handling of C0xx attributes in clang at various locations).

I don't know if Sean intends to continue this work. Hopefully, he'll chime in one way or the other.

I'm sure, attributes were initially and still are intendend for the use with expressions (N2418 clearly mentions expressions and
according to the draft n3242 the grammar for arguments is broadened even further).

Yes.

That said, IMHO we need
1. a callback mechanism for attributes similiar to the PragmaHandler (I would restrict customization to scoped attributes and use the scope name similiar tho the pragma ns).

This is actually different from the way I was planning to go. The intent for attributes was to describe the grammar in the .td file, rather than calling out to a separately-registered handler. I'm open to both options, of course.

2. a map from Stmts to Attrs in the ASTContext similiar to DeclAttrs.

Right, we need this and Sema code to check the attributes added to statements and expressions.

3. an attr::Kind "UserDefined" and an appropriate Attr subclass.
4. means in the attribute handler in order to conveniently parse attribute arguments.

Interesting. Do we need dynamic registration of handlers? That's a big step beyond just making it easy to add new attributes.

In the long run, we definitely want that, I think. This would allow special library-specific compiler plugins (e.g. a Qt meta-object compiler implemented as a plugin) to offer and process attributes containing meta-information.

Sebastian

At first I second Sebastian. If you provide a plugin interface then you definitely want to support dynamic handling of attributes.

At second, if you think about the general picture, then dynamic handling actually boils down to two customizeable steps: parsing and semantic checking of arguments.
Up to the parsing of arguments: IMHO there are three types of arguments: no argument, expressions and type-ids (which one did I forget? - ah, yes, a special one: VersionTuple). For arguments of these types simply declarations in one way or another have to be sufficient. A fourth type would be "user-defined". In that case a handler function similiar to PragmaHandler::HandlePragma is called with the Preprocessor as an argument enabling the user to parse the argument by its own.
Now to the semantic checking: I don't know if we should try to make the semantic checking declarative at all. It is of course possible to check the proper number of arguments, but whether an expression has to be an ICE or the proper type (see e.g. IBOutletCollection) should be checked in handler functions.

OK. I haven't talked about the current implementation yet. To make it short: I don't like it. Not at all. In fact it is the reason for my delayed response. I have digged into code and tried to understand what's going on. It took a while. All the semantic checking is in SemaDeclAttr but IMHO needs to be better tied to the attributes (there is also an appropriate comment in SemaDeclAttr.cpp). Then the Attr factory is the AttributeList. But it looks to me like it mirrors the attribute kinds generated from Attr.td.

Couldn't we stuff all this together? So that for each attribute we get a subclass of Attr holding the data (like now) and in addition a factory object which knows about the syntax and semantic of the attribute arguments. Then we can use tblgen to create the Attr taxonomy and the factories in one step. And of course we would provide an option for ASTConsumer to register own Attr factories.

Best regards
Olaf Krzikalla

At first I second Sebastian. If you provide a plugin interface then you definitely want to support dynamic handling of attributes.

At second, if you think about the general picture, then dynamic handling actually boils down to two customizeable steps: parsing and semantic checking of arguments.
Up to the parsing of arguments: IMHO there are three types of arguments: no argument, expressions and type-ids (which one did I forget? - ah, yes, a special one: VersionTuple). For arguments of these types simply declarations in one way or another have to be sufficient. A fourth type would be "user-defined". In that case a handler function similiar to PragmaHandler::HandlePragma is called with the Preprocessor as an argument enabling the user to parse the argument by its own.
Now to the semantic checking: I don't know if we should try to make the semantic checking declarative at all. It is of course possible to check the proper number of arguments, but whether an expression has to be an ICE or the proper type (see e.g. IBOutletCollection) should be checked in handler functions.

We'd like to be able to express some simple constraints declaratively (e.g., "only a function attribute"), but in general, I agree with you: we need some way to specify an arbitrary expression or handler function to determine whether the semantic constraints of the attribute are met.

OK. I haven't talked about the current implementation yet. To make it short: I don't like it. Not at all. In fact it is the reason for my delayed response. I have digged into code and tried to understand what's going on. It took a while. All the semantic checking is in SemaDeclAttr but IMHO needs to be better tied to the attributes (there is also an appropriate comment in SemaDeclAttr.cpp). Then the Attr factory is the AttributeList. But it looks to me like it mirrors the attribute kinds generated from Attr.td.

Right, this stuff is a mess. Sean's GSoC project last year cleaned up some of the mess---Attr.td is now used to generate the attribute classes and their serialization/deserialization---but didn't get as far as fixing parsing or semantic analysis. The current scheme should be completely replaced with something generated via tblgen.

Couldn't we stuff all this together? So that for each attribute we get a subclass of Attr holding the data (like now) and in addition a factory object which knows about the syntax and semantic of the attribute arguments. Then we can use tblgen to create the Attr taxonomy and the factories in one step. And of course we would provide an option for ASTConsumer to register own Attr factories.

Yes, and I see these as two separate steps: (1) using tblgen to generate the parsing and semantic analysis bits for all of the attributes, and (2) extending the plugin architecture to support the addition of attributes.

  - Doug