Weird behavior while parsing nested (and non) pragmas

Hi there,
I have a question about a weird behavior I am observing when parsing C programs which contains nested pragmas.

The bottom line is that I need to extend some of the functions in Sema in order to associate the pragmas to the correct statement. Let's consider now the following code:

1 int main() {
2 #pragma omp parallel
3 {
4 #pragma omp barrier
5 #pragma omp master
6 ;
7 }
8 }

my overloaded version of ActOnCompoundStmt(SourceLocation lbrac, SourceLocation rbrac, ...) is called every time a block is consumed, when I try to print the location of the left bracket and the right bracket I obtain something weird:

1) ActOnCompoundStmt() -> left bracket 5:4, right bracket 7:2 (line:column)
2) ActOnCompoundStmt() -> left bracket 2:22, right bracket 8:1

Naturally 1 is referred to the inner compound stmt while 2 to the main body. As you see, the location of the left bracket is not correct, is this an intended behavior or I am doing something wrong. I have to say, to be complete, that in order to parse the pragmas I manually call the ConsumeToken() of the Parser class, could be this the problem?

cheers, Simone

Yes. ConsumeToken returns the SourceLocation of the token just consumed. Now, ParseCompoundStatement asserts that the current token is an opening brace and then calls ParseCompoundStatementBody. PCSB doesn't contain an assert. It simply assumes that the first token is the opening brace and consumes it, storing the returned source location, which is then passed to ActOnCompoundStatement as the location of the lbrace.

Now, if you do your own thing there, consume the lbrace, and leave some other random token in the stream for PCSB to consume, obviously the source location you get would be that of that random thing.

Sebastian

I have to say,
to be complete, that in order to parse the pragmas I manually call the
ConsumeToken() of the Parser class, could be this the problem?
     

Yes. ConsumeToken returns the SourceLocation of the token just consumed. Now, ParseCompoundStatement asserts that the current token is an opening brace and then calls ParseCompoundStatementBody. PCSB doesn't contain an assert. It simply assumes that the first token is the opening brace and consumes it, storing the returned source location, which is then passed to ActOnCompoundStatement as the location of the lbrace.

I see.

Now, if you do your own thing there, consume the lbrace, and leave some other random token in the stream for PCSB to consume, obviously the source location you get would be that of that random thing.
   

Ok, I understand, but how can I insert a token in the token stream?

thanks for the help, Simone

You could just manipulate Tok as stored in Parser, but it would be better IMO to just capture the source location yourself and pass it on.

After reading the existing code more carefully, I still don't understand how you got to where you are without tripping over an assertion.

Sebastian

I have to say,
to be complete, that in order to parse the pragmas I manually call the
ConsumeToken() of the Parser class, could be this the problem?

Yes. ConsumeToken returns the SourceLocation of the token just consumed. Now, ParseCompoundStatement asserts that the current token is an opening brace and then calls ParseCompoundStatementBody. PCSB doesn't contain an assert. It simply assumes that the first token is the opening brace and consumes it, storing the returned source location, which is then passed to ActOnCompoundStatement as the location of the lbrace.

I see.
     

Now, if you do your own thing there, consume the lbrace, and leave some other random token in the stream for PCSB to consume, obviously the source location you get would be that of that random thing.

Ok, I understand, but how can I insert a token in the token stream?
     

You could just manipulate Tok as stored in Parser, but it would be better IMO to just capture the source location yourself and pass it on.

After reading the existing code more carefully, I still don't understand how you got to where you are without tripping over an assertion.

:slight_smile: that's comfortable.

I found the way to insert a new token in the stream by using the EnterTokenStream from the preprocessor class.
But still there is a problem, for example when the code is the following:

1 int main() {
2 {
3 #pragma omp barrier
4 #pragma omp master
5 ;
6 }
7 }

if I insert a 'random' token in the stream the parser gives an error saying: "Expecting an expression". So it looks like that the '{' has been already consumed before the #pragmas are handled. Which actually makes sense but this doens't explain why the SourceLocation for the '{' is wrong. It is actually true that if I insert a token in the stream, a semicolon for example (which is an expression so makes the parser happy) the location of the left bracket becomes 3:4 (the one I forced for the ; token).

This could solve my problem but I actually don't like having the ';' every time I am handling pragmas.

To be complete I have to say that my pragma handling is working on Clang2.7, but I don't expect to have a different behavior in the current svn version.

Any help is appreciated. Cheers, Simone

I have to say,
to be complete, that in order to parse the pragmas I manually call the
ConsumeToken() of the Parser class, could be this the problem?

Yes. ConsumeToken returns the SourceLocation of the token just consumed. Now, ParseCompoundStatement asserts that the current token is an opening brace and then calls ParseCompoundStatementBody. PCSB doesn't contain an assert. It simply assumes that the first token is the opening brace and consumes it, storing the returned source location, which is then passed to ActOnCompoundStatement as the location of the lbrace.

I see.
     

Now, if you do your own thing there, consume the lbrace, and leave some other random token in the stream for PCSB to consume, obviously the source location you get would be that of that random thing.

Ok, I understand, but how can I insert a token in the token stream?
     

You could just manipulate Tok as stored in Parser, but it would be better IMO to just capture the source location yourself and pass it on.

After reading the existing code more carefully, I still don't understand how you got to where you are without tripping over an assertion.

I actually managed to solve the problem by overwriting the SourceLocation of the left bracket passed to the ActOnCompoundStmt() function. This works for me but it is just a workaround of a strange behavior which to me looks like a bug in the clang parser.

cheers, Simone

The parser keeps a one-token cache (in Tok), so when you're pushing a new token back into the stream, you need to account for that.

  - Doug