How to add a function to an AST?

Hi, everyone :
      I want to add a function or a statement to an AST? Is there simple way to do this? Using Sema or call the FunctionDecl's create method? Thanks.

xtxwy

Hi, everyone :
     I want to add a function or a statement to an AST? Is there simple
way to do this? Using Sema or call the FunctionDecl's create method? Thanks.

Generally speaking, the easy way is to add it in the code and let clang
parse the code (seriously, that's what works best, and has a couple of
other upsides - you can also do that programatically).

Or to turn it around, the other question would be: why do you want to add a
function to the AST instead of adding it to the code? :slight_smile:

Cheers,
/Manuel

Hello Manuel,

I’m dealing with the same issue like maxs.

Generally speaking, the easy way is to add it in the code and let clang
parse the code (seriously, that's what works best, and has a couple of
other upsides - you can also do that programatically).

Do you prefer this workflow:

1. Add new lines into the buffer
2. Reparse the buffer and generate a complete new tree?

That seems like a really heavy workload for adding something new.
What are the other upsides? Sorry, I’m new to the Clang-API world :(.

Or to turn it around, the other question would be: why do you want to add
a function to the AST instead of adding it to the code? :slight_smile:

Easy :slight_smile: I can add every line of code via the Rewriter class, however, it is
challenging to guarantee that my insertion is valid. For instance, I could
add this function: void foo(int a, int b int c) without any problem…
Besides syntax errors semantic errors can happen. For instance, a function
with the same name and same parameter list exists already in the scope…

On the other hand, inserting directly into the tree means that we can
evaluate the expression.
After the AST safeguards that everything is fine, we only have to use pretty
print function.

Okay, another example:

Lets say we would like to change the number of arguments of a FunctionDecl
by deleting one argument.
We make the assumption that at least one argument is in the parameter list.

Text editing requires a lot of effort.
If the function has one parameter, we can use the start and end location of
the parameter. No big deal.
If the function has many arguments, it is more complicated. We have to
delete the parameter like before and have to figure out the position of the
corresponding comma.

Doing this by editing the AST is much more simpler. Delete the argument in
the AST and call pretty print, right? Comma handing is the job of pretty
print, right?

I’m looking for your answer. It seems that you an AST pro.

Cheers!

Joshua :slight_smile:

Hello Manuel,

I’m dealing with the same issue like maxs.

>> Generally speaking, the easy way is to add it in the code and let clang
>> parse the code (seriously, that's what works best, and has a couple of
>> other upsides - you can also do that programatically).

Do you prefer this workflow:

1. Add new lines into the buffer
2. Reparse the buffer and generate a complete new tree?

That seems like a really heavy workload for adding something new.
What are the other upsides? Sorry, I’m new to the Clang-API world :(.

It's not actually a "heavy" workflow - clang has some libraries that help
making that workflow amazingly little code. You can look at
clang-modernize, which uses exactly this procedure to migrate code to c++11
(and beyond :wink:

>> Or to turn it around, the other question would be: why do you want to
add
>> a function to the AST instead of adding it to the code? :slight_smile:

Easy :slight_smile: I can add every line of code via the Rewriter class, however, it
is
challenging to guarantee that my insertion is valid. For instance, I could
add this function: void foo(int a, int b int c) without any problem…
Besides syntax errors semantic errors can happen. For instance, a function
with the same name and same parameter list exists already in the scope…

On the other hand, inserting directly into the tree means that we can
evaluate the expression.
After the AST safeguards that everything is fine, we only have to use
pretty
print function.

Okay, another example:

Lets say we would like to change the number of arguments of a FunctionDecl
by deleting one argument.
We make the assumption that at least one argument is in the parameter list.

Text editing requires a lot of effort.
If the function has one parameter, we can use the start and end location of
the parameter. No big deal.
If the function has many arguments, it is more complicated. We have to
delete the parameter like before and have to figure out the position of the
corresponding comma.

Doing this by editing the AST is much more simpler. Delete the argument in
the AST and call pretty print, right? Comma handing is the job of pretty
print, right?

Unfortunately this is not true.
1. inserting a node in the AST will not automatically "validate" it - afaik
clang relies on its C++ analysis in Sema to only run into building correct
ASTs; you can still add nodes to the AST, if you're really careful
2. the AST has undocumented invariants, so you might run into problems
later that you don't even know of now
3. clang doesn't come with a pretty-printer that actually outputs "real
code" (at least if I'm not completely missing something) - the pretty
printers I know are made for debugging or diagnostics
4. rewriting text is easy - in fact, we've written a whole library as part
of clang (libTooling) to make it as easy as possible; the tools you find in
clang's extra tools repository are written on top of that architecture

Cheers,
/Manuel

for production use. :slight_smile: Try "clang -cc1 -ast-print".

-Eli

Hello Eli, Manuel,

thanks for the pointer to clang-modernize.

4. rewriting text is easy - in fact, we've written a whole library as part of clang (libTooling) to make it as easy as possible; the tools you find in clang's extra tools repository are written on top of that architecture

Okay, I see that the rewriter must be a really powerful tool.

Based on my second example, deleting a function parameter in a FunctionDecl.
Is there any way to get the source location of the comma between two parameters?
I'm happy about every evidence :slight_smile: I didn't find something like ParmTypeLoc
or getParmLoc()....

Cheers!

Joshua

You are probably interested by a ParmVarDecl.

If you need the type you can use:

    ParmVarDecl->getTypeSourceInfo()->getTypeLoc()

A fairly recent transform in clang-modernize rewrites the type of some
parameters, you can take a look. The interesting stuff for you is
located in clang-modernize/PassByValue/PassByValueActions.cpp I think.

I don't know if you can get the comma directly but I'm sure you can find
a way to delete a parameter anyway.

Joshua T <llvm.mailing@gmail.com> writes:

You are probably interested by a ParmVarDecl.

If you need the type you can use:

    ParmVarDecl->getTypeSourceInfo()->getTypeLoc()

A fairly recent transform in clang-modernize rewrites the type of some
parameters, you can take a look. The interesting stuff for you is
located in clang-modernize/PassByValue/PassByValueActions.cpp I think.

I don't know if you can get the comma directly but I'm sure you can find
a way to delete a parameter anyway.

Usually we just use the lexer towards the "," from the end of the
parameter... (you know how many there are, so you know whether there is
going to be a comma or not)

Okay, thank you guys!
It seems that I could solve my problem!

Usually we just use the lexer towards the "," from the end of the parameter... (you know how many there are, so you know whether there is going to be a comma or not)

Finally, a question for Manuel :slight_smile:
Sorry for bothering you but exploring and understanding clang techniques is exiting!

Why do you use the lexer?
Something like this could make it hard to find the correct comma:

void foo (int i /* comment , , , , */, /* , , , , */ int j /* ,*/, /* , */ int k);

Cheers!

Joshua

The next token after "int i" there is the right comma (and similarly
for "int j"). In other words: don't worry, this isn't hard :slight_smile:

-- James

And if the Lexer is !inKeepCommentMode you can also get comments as tokens.