Source range of a C function with #defined return type

Hi,

I have a question regarding the source range of a FunctionDecl.
I am writing a matcher to match function declarations and use the FunctionDecl class’ getSourceRange() method to get the filename, the begin and the end locations of the declarations. For the most part, it works fine. But when a function with a #defined return type in a system header, then getSourceRange() does not give the expected filename and begin source location. For example, func_A below has this issue, but not func_B.
I have also checked with ast-dump and confirmed this is the case in clang 14, 16 and 17.
Is this what it is supposed to be, or is there an issue here? Thanks.

#include <stdbool.h>
// #define bool _Bool
// #define true 1
// #define false 0

bool func_A() {
  return true;
}

_Bool func_B() {
  return true;
}
$ clang -Xclang -ast-dump -fsyntax-only x.c
...
-FunctionDecl 0x564150ae9bc0 </usr/lib/llvm-16/lib/clang/16/include/stdbool.h:20:14, x.c:8:1> line:6:6 func_A 'bool ()'
| `-CompoundStmt 0x564150ae9cf8 <col:15, line:8:1>
|   `-ReturnStmt 0x564150ae9ce8 <line:7:3, /usr/lib/llvm-16/lib/clang/16/include/stdbool.h:21:14>
|     `-ImplicitCastExpr 0x564150ae9cd0 <col:14> 'bool' <IntegralToBoolean>
|       `-IntegerLiteral 0x564150ae9cb0 <col:14> 'int' 1
`-FunctionDecl 0x564150ae9d30 <x.c:10:1, line:12:1> line:10:7 func_B 'bool ()'
  `-CompoundStmt 0x564150ae9e20 <col:16, line:12:1>
    `-ReturnStmt 0x564150ae9e10 <line:11:3, /usr/lib/llvm-16/lib/clang/16/include/stdbool.h:21:14>
      `-ImplicitCastExpr 0x564150ae9df8 <col:14> 'bool' <IntegralToBoolean>
        `-IntegerLiteral 0x564150ae9dd8 <col:14> 'int' 1
1 Like

Let’s consider a further simplified example:

#define VOID void  // line 1

VOID func() {}     // line 3

The AST dump for this is:

`-FunctionDecl 0x557ac5ada090 <test.c:1:14, line:3:14> col:6 func 'void ()'
  `-CompoundStmt 0x557ac5ada180 <col:13, col:14>

Indeed, the begin location of the FunctionDecl is shown as being on line 1, even though the FunctionDecl is written entirely on line 3.

We can think of the process of producing an AST from source code as consisting of two stages: preprocessing, which produces a stream of expanded tokens, and then the actual parsing which produces an AST from the expanded tokens.

In this file, the expanded tokens are void func() {}, and it’s from these tokens that the FunctionDecl AST node is created.

When you ask for the source range of the FunctionDecl AST node, the answer basically involves looking up what are the first and last token that make up the node. In this case, the first and last token are void and }, and it’s their locations which are printed.

Now, helpfully, clang does a meticulous job of keeping track of source locations in the presence of macros. When you ask for the location of the void token that makes up the return type of func, its location is neither on line 1 in the file, nor line 3 in the file – it’s something called a macro location, basically representing “inside the expansion of the macro VOID on line 3”.

Macro locations are associated with two different file locations:

  • where in the file the macro is expanded – this is called the expansion location
  • where in the file the tokens that make up the expansion come from – this is called the spelling location

In this case, the expansion location is on line 3 (where the use of the macro, VOID, is written), and the spelling location is on line 1 (where the token void that forms the expansion is written in the macro definition).

When code prints a macro location, it has to choose whether to print the spelling location or the expansion location. The implementation of -ast-dump chooses to print the spelling location. This is why the printed location is the one on line 1. (And in your original code, it’s inside stdbool.h because that’s where the definition of the bool macro is located.)

However, if you’re writing your own code that calls getSourceRange(), you can choose to work with the expansion location if you prefer. You can get it by calling SM.getExpansionLoc() on the original SourceLocation (e.g. Range.getBegin()), where SM is ASTContext::getSourceManager().

1 Like

Thanks very much for the detailed explanation.
It clears up my confusions.