confusing getLocEnd() behavior


I am working on source-to-source transformation tool and want to get
the original source code for a statement token by token. I am using
statement's getLocStart/getLocEnd, SourceLocation's getLocWithOffset,
SourceManager's getCharacterData and Lexer::MeasureTokenLength. My
code worked fine until I ran into some DeclStmts:

1) struct A { int a; } s;
2) struct A { int a; };
3) union A { int a; };
4) union { int a; };

For 1) getLocEnd() works fine.
For 2) my code doesn't work because getLocEnd() is smaller than
getLocStart(). End's getRawEncoding() returns 0. I found a workaround
for this case by calling getLocEnd() of DeclStmt's getSingleDecl().
Case 3) has the same problem as 2), same workaround works fine.
Case 4) has the same problem as 2), but this time after applying my
workaround the resulting SourceLocation is the same as getLocStart(),
that is points to token "union".

So I guess the questions are:
* Is this expected behavior? If yes, then what exactly getLocEnd() returns?
* How could I get the end location for case 4)?
* Is there a better way to get statement as token strings?
* As an alternative to previous question, can I somehow find the total
character length of statement in the original source code?


Could you post a bit more context (a minimal example) of what you are precisely looking at? I could not reproduce the error for the cases 1-4. If I put them outside of a method, they don’t lead to a DeclStmt, but a CXXRecordDec. If I put them into a method, e.g.:

void f() {
union { int a; };

I get the DeclStmt but it seems to have the right code range (column 2 to 19).

In general, the code location handling is quite inconsistent and we are currently looking into how to improve this.


a quick example for Visual Studio 2010:

file's a.cpp contents:

void foo()
   union { int a; };

When compiled with libs from clang 3.1, I have the following output:

top-level-decl: __builtin_va_list
top-level-decl: foo
(CompoundStmt 0x5ff8f8
  (DeclStmt 0x5ff8e8
    0x5ff690 "<anonymous union at a.cpp:3:4> =
      (CXXConstructExpr 0x5ff888 'union <anonymous at a.cpp:3:4>''void
(void) throw()')"))
a.cpp:2:1 <=> a.cpp:4:1
(DeclStmt 0x5ff8e8
  0x5ff690 "<anonymous union at a.cpp:3:4> =
    (CXXConstructExpr 0x5ff888 'union <anonymous at a.cpp:3:4>''void
(void) throw()')")
a.cpp:3:4 <=> <invalid loc>

As you can see, location for last DeclStmt is "a.cpp:3:4 <=> <invalid

", that is getLocEnd() returned not what I was expecting.


I cannot really reproduces that. With clang 3.2 on a linux machine, I have:

void foo()
union { int a; };

clang -cc1 -ast-dump

typedef __int128 __int128_t;
typedef unsigned __int128 __uint128_t;
typedef __va_list_tag __builtin_va_list[1];
void foo() (CompoundStmt 0x51ec658 <, line:4:1>
(DeclStmt 0x51ec640 <line:3:3, col:19>
0x51ec280 " =
(CXXConstructExpr 0x51ec5b8 col:3 'union '‘void (void) throw()’)"))

The DeclStmt appears to have the right source range <line:3:3, col:19>. However, you do not seem to print that range in your dump. Can you check that it is not in there? And maybe retry with clang 3.2 based libs?

Latest version from trunk did the trick, thanks. However, even with
this newer version I ran into another problem:

void foo()
  int* buf = new int[10];
  buf = new int[10];

AST dump from clang:

void foo() (CompoundStmt 0x4302b8 <test.cpp:2:1, line:5:1>
  (DeclStmt 0x430208 <line:3:3, col:25>
    0x42fd00 "int *buf =
      (CXXNewExpr 0x4301c8 <col:14, col:18> 'int *'
        (IntegerLiteral 0x42fd30 <col:22> 'int' 10))")
  (BinaryOperator 0x4302a0 <line:4:3, col:13> 'int *' lvalue '='
    (DeclRefExpr 0x430218 <col:3> 'int *' lvalue Var 0x42fd00 'buf' 'int *')
    (CXXNewExpr 0x430260 <col:9, col:13> 'int *'
      (IntegerLiteral 0x430230 <col:17> 'int' 10))))

DeclStmt has correct end location, but binary operator's end location
(and CXXNewExpr as well) points to 'int' token, not to last closing
bracket or semicolon. I suppose it shouldn't be that way.


I have just experimented with this a bit. I would consider this a bug, but I am not yet sure what the right fix is. I’ll look into it some more…

Forgot to update, here is the bug report: I also added a rewriter
example that illustrates the problem