Why doesn't LibClang's cursor faithfully reflect the node information of the source file's AST?

GwokHiujin · September 26, 2023, 6:44am

Hi, I’m recently working on a project that is aiming to batch process some files, which are .i files generated from .c and .cpp files. As we all known, .i files will introduce much more lines than their source files because of the compiler preprocessor’s behaviours, but when we use -ast-dump to print the AST message of a .i file, it will record the origin line number information in one node. For example:

The Image shown above is a screenshot of an AST message exported from a .i file. You can see that AST node at line 3140 faithfully recorded a function declaration that has the correct line range 219-355.

As I understand it (which may be incorrect), the LibClang cursor should traverse down through the AST nodes, so I also think that the Location information it returns via functions such as clang_getExpansionLocation() and clang_getSpellingLocation() should be consistent with what is written in the AST Node.

Everything works well in .c and .cpp files, however, when I use LibClang to process .i files and try to print the cursor’s Location, I find that it returns the line number from the .i file (for example, 177749 in the 3rd image) instead of the line number from the source file as recorded in AST Node.

The clang_getPresumedLocation() function I found in the CXSourceLocation.h file briefly solved some problems with simpler .i files by printing out the positions recorded in the # ... debugging hinting lines. However, not all lines in the .i file are following these hinting lines, which makes this function no longer a solution to my problem.

Oddly enough, I once used the CSA Checker in another project to process .i files directly, and it didn’t create this problem, but printed the original line numbers in the explodedGraph nodes.

So what I’d like to know is: Are cursors in LibClang one-to-one with AST nodes? Is there any way for the user to directly access the information in the AST nodes?

AaronBallman · September 26, 2023, 3:47pm

The cursors in libclang are thin wrappers around the AST nodes. There are private APIs in CXCursor.h like getCursorVariableRef() that can convert from a CXCursor to an AST node, but due to being private APIs, they may not be suitable for your needs.

I would recommend using presumed locations. The line markers tell the compiler “pretend line number N starts here”, so the fact that code doesn’t immediately follow the line markers isn’t an issue. e.g.,

# 123 "dummy.c"
int var; // Presumed location is dummy.c:123

void func(void); // Presumed location is dummy.c:125

# 456 "derp.c"
int other; // Presumed location is derp.c:456

If you’re finding the presumed locations you get back are wrong, then I think we’d need to see a more concrete example of what the .i file contents are and what presumed locations you’re getting back that appear to be incorrect.

GwokHiujin · September 29, 2023, 2:56pm

Thank you for your response! I continued to verify and found that the PresumedLocation itself is fine, and that the cause of my problem is in the AST.

If it doesn’t bother you, I’d like to follow up with you on this question: I found that if there is a function declared like this in the .cpp file, the AST exported from the .cpp file is able to successfully interpret it as a function declaration node:

However, after I converted the .cpp file to an .i file, the corresponding function can no longer be found in the AST, and at the same time, this datatype doesn’t seem to be highlighted in the source file, too.

I’m curious what caused this.

AaronBallman · September 29, 2023, 3:23pm

My guess is that the issue is that the declaration is invalid.

and thus something is dropping it, either when preprocessing to a file or when trying to take the preprocessed code and form a new AST from it.

GwokHiujin · September 29, 2023, 3:36pm

Thanks for the answer. I found that the problem with .i should be in not correctly resolving types such as compute::kernel to a valid int and I will look for other ways to fix it. Thanks again for your help

GwokHiujin · October 25, 2023, 6:39am

By the way, I’ve actually been very curious as to why the cursor used in libclang’s design isn’t strictly traversing abstract syntax tree nodes. Or rather, why not expose the node data structure to the user side.

Is it because of the execution cost of the traversal algorithms or something else？And do other syntax tree traversal tools in Clang Frontend, such as RecursiveASTVisitor, have such a feature?

AaronBallman · October 25, 2023, 11:50am

It is traversing the AST: https://github.com/llvm/llvm-project/blob/6d66440c50e3047b5ce152830d4ccae381c7c2bf/clang/tools/libclang/CursorVisitor.h#L70 (the base classes are ones we commonly use when traversing the AST).

We don’t expose the AST nodes directly through this interface because the libclang interface is intended to be stable while our C++ interface is constantly changing. This allows folks to use libclang without having to constantly modify their code as trunk evolves.

GwokHiujin · October 25, 2023, 11:54am

Thanks for replying!

Topic		Replies	Views
libclang appears to be returning incorrect CXCursor in include statement. Clang Frontend	0	71	January 8, 2014
from source code to a specific AST cursor with libclang. Clang Frontend	0	84	October 7, 2013
libclang source location from buffer index Clang Frontend	3	75	October 15, 2010
libclang parsing generates bad output Clang Frontend	0	71	April 12, 2011
Inconsistency in libclang with source ranges? Clang Frontend	0	73	December 27, 2011

Why doesn't LibClang's cursor faithfully reflect the node information of the source file's AST?

Related Topics