I was using llvm/clang 3.9 for developement, and something did not work as I intended.
Given a code like below, when I do ./clang -cc1 -ast-dump test.cpp,
int main(int argc, char ** argv)
int mul = 2;
mul = 4 * mul;
it prints an AST like…
`-FunctionDecl 0xd128d48 </home/joo/test.cpp:2:1, line:7:1> line:2:5 main ‘int (int, char **)’
-ParmVarDecl 0xd128bd0 <col:10, col:14> col:14 argc ‘int’
-ParmVarDecl 0xd128c70 <col:20, col:28> col:28 argv ‘char **’
-CompoundStmt 0xd128ff0 <line:3:1, line:7:1> -DeclStmt 0xd128ec8 <line:4:3, col:14> -VarDecl 0xd128e48 <col:3, col:13> col:7 used mul ‘int’ cinit
`-IntegerLiteral 0xd128ea8 col:13 ‘int’ 2
-BinaryOperator 0xd128f90 <line:5:3, col:13> ‘int’ lvalue ‘=’
-DeclRefExpr 0xd128ee0 col:3 ‘int’ lvalue Var 0xd128e48 ‘mul’ ‘int’
-BinaryOperator 0xd128f68 <col:9, col:13> 'int' '*' -IntegerLiteral 0xd128f08 <col:9> 'int' 4 -ImplicitCastExpr 0xd128f50 col:13 ‘int’
-DeclRefExpr 0xd128f28 <col:13> 'int' lvalue Var 0xd128e48 'mul' 'int' -ReturnStmt 0xd128fd8 <line:6:3, col:10>
`-IntegerLiteral 0xd128fb8 col:10 ‘int’ 0
Please note where the end of the BinaryOperator, and the ReturnStmt is pointing.
Either cases was not pointing to the end of the statements but rather pointing to the beginning of second ‘mul’ and the beginning of ‘0’ respectively. I’ve checked in 6.0 for the same thing and happens to point to the same location. So I’d assume that I was misunderstanding the usage of Stmt::getLocEnd(). May I ask what’s the proper way to retrieve the end location of a statement?
Hi, thanks for the reply!
AFAICS there seem to be statments which does not match the case. Does it mean that I have to handle every special cases(such as the ‘int mul = 2;’ declstmt in the example seems to have an endloc which actually points the end) in my code? Or can I assume that Lexer::getLocForEndOfToken will be a unified way to handle every statments? I’m looking for a way to look up for the SourceRange of a statement whether it containes a ; or not. (such as a compound statement/ for loops)
Hi, thanks for the reply!
AFAICS there seem to be statments which does not match the case. Does it
mean that I have to handle every special cases(such as the 'int mul = 2;'
declstmt in the example seems to have an endloc which actually points the
end) in my code?
Unfortunately, yes, if you care where the semicolons are, you need to be
aware that some Stmt subclasses include the semicolon (at least DeclStmt
and possibly others), some have an implied semicolon (at least all the Expr
subclasses and possibly others), and some don't have a trailing semicolon
at all (CompoundStmt, C++ try/catch have none, IfStmt etc don't have one
"of their own" but might have one as part of their substatement).
If you were interested in fixing this, we have been considering the
addition of an ExprStmt node to represent an expression-statement (a
statement of the form "expression;"), which would hold the location of the
trailing semicolon of the expression. We'd need to analyse the memory cost
of adding such AST nodes, but if it's reasonably small, we'd accept a patch
to add that.
Or can I assume that Lexer::getLocForEndOfToken will be a unified way to