Stmt.getLocEnd()

Hi everybody,

I am now using LibTooling with the RefactoringTool to add some comments in some C codes.

I am now trying to add comments after a given statement (the body of a for to say the truth);
in order to detect the location of the end of a statement I use "Stmt.getLocEnd()", but I seems that its output is often inaccurate.

Here are two examples, if I insert "/* HERE */" at the end location:

  for(i=0; i<N/2; i++) res +=mand(i,N/4/* HERE */);
  for(i=0; i<mand(N,N); i++) res /* HERE */++;

I would expect the following output:

  for(i=0; i<N/2; i++) res +=mand(i,N/4);/* HERE */
  for(i=0; i<mand(N,N); i++) res ++;/* HERE */

It looks like a bug. Am I right or did I miss something ?

I found a previous recent discussion about this matter (http://clang-developers.42468.n3.nabble.com/Statements-that-end-with-td4028311.html), but It looks like nothing has been done.

- Antoine

Hi everybody,

I am now using LibTooling with the RefactoringTool to add some comments in
some C codes.

I am now trying to add comments after a given statement (the body of a for
to say the truth);
in order to detect the location of the end of a statement I use
"Stmt.getLocEnd()", but I seems that its output is often inaccurate.

Here are two examples, if I insert "/* HERE */" at the end location:

        for(i=0; i<N/2; i++) res +=mand(i,N/4/* HERE */);
        for(i=0; i<mand(N,N); i++) res /* HERE */++;

I would expect the following output:

        for(i=0; i<N/2; i++) res +=mand(i,N/4);/* HERE */
        for(i=0; i<mand(N,N); i++) res ++;/* HERE */

It looks like a bug. Am I right or did I miss something ?

I found a previous recent discussion about this matter (
http://clang-developers.42468.n3.nabble.com/Statements-that-end-with-td4028311.html),
but It looks like nothing has been done.

Yes, this is a known issue, and as far as I'm aware nothing has been done.
We're usually working around this by looking for the next ';' token.

I guess it's the good old "patches welcome" answer - or wait until it
becomes important enough for somebody else to solve.

Cheers,
/Manuel

The other issue here is that (by design) getLocEnd() doesn’t actually return the SourceLocation just past the end of the statement, but the SourceLocation of the start of the last token in the statement. If you actually want the SourceLocation following the statement, you’ll have to use Lexer::getLocForEndOfToken.

Jordan

Thank you both of you for your answer. The last token thing makes perfect sense !
Let me try that one by monday and come back to you.

  • Antoine

I tried using "Lexer::getLoxForEndOfToken()", but that still wouldn't give the desired behaviour.

For instance, for

   for(i=0; i<mand(N,N); i++) res /* HERE */++;

It would now return the following location:

   for(i=0; i<mand(N,N); i++) res +/* HERE */+;

The token is only the first "+", not the whole "++".

I used the quick-and-dirty trick of Manuel, searching for the next ";", and that works. Still I feel insecure about this solution.

Any idea to follow Jordan's proposal ? (so that I would get the location after the "++" and not only the first "+")

Regards,

- Antoine

That sounds wrong; the token is definitely "++", not "+". What code
are you using to find that it's a single "+"?

-- James

My bad, I was using the function the wrong way.

But I noticed that I couldn't go through a semicolon using "Lexer::getLocForEndOfToken" if there is a space before the semicolon.

For instance, let's consider this code:

  for(i=0; i<mand(N,N); i++) res ++ ;

Initially, the SourceLocation retreieved with "getLocEnd()" is before the "++":

  for(i=0; i<mand(N,N); i++) res /*HERE*/++ ;

If I call "Lexer::getLocForEndOfToken", it will point to after "++": nice:

  for(i=0; i<mand(N,N); i++) res ++/*HERE*/ ;

Then if I call again getLocForEndOfToken, the result will point to the exact same location (I need to call "getLocWithOffset")

  for(i=0; i<mand(N,N); i++) res ++/*STILL HERE*/ ;

In the case I don't have any space before the semicolon, the return value of the second call to getLocForEndOfToken will point to after it:

  for(i=0; i<mand(N,N); i++) res ++;/*HERE*/
  (no space before the ";")

Is that the expected behaviour ? I find it pretty annoying in my very situation because I have no choice but looping with "SourceLocation::getLocWithOffset" until I find a ";".

Regards,

- Antoine

Hm. It's expected behavior because an Expr can be nested inside other Exprs, in which case you only want the beginning and end of the Expr to include the expression itself. Consider "a + b * c;" The semicolon is not part of "b * c"; you could argue it's part of "a + b * c", but that's not consistent. On the other hand, we otherwise don't track the location of the semicolon anywhere.

I can see this being an actual deficiency we want to fix; if you're interested in pursuing this, please file a bug report at http://llvm.org/bugs/. Meanwhile, Lexer::findLocationAfterToken will probably be a more resilient way to find the semicolon.

Jordan

Yes, the behaviour is not consistent to my point of view. I'll file a bug report.
Anyway, I'm using "findLocationAfterToken" as you suggested and it works like a charm: thank you for your help !

- Antoine

I guess the semi should be part of the infamously missing expression statement in the Ast :slight_smile:

Cheers,
/Manuel