Problem
In the AST and also within the AST dump, objects are identified by its memory address. The problem is that the memory address is indeterministic between two runs making it impossible to identify the same object for debugging or comparing of AST dumps.
Solution
There is a simple solution to solve the problem: introducing a 32-bit global id or object id that is stored within each object we want to identify. In my local implementation I added the global id to clang::Expr
, clang::Decl
and clang::Type
classes as a simple field. In general it would make sense to add it to every clang class supported by clang::DynTypedNode
.
The drawback is of course a higher memory consumption that comes along with a 4-byte overhead. The main benefit of the global id is for debugging. For that reason, I suggest to enable the global id field only when asserts are enabled (NDEBUG
not set). The only exception is the clang::Type
classes where it would make sense to enable it by default because there are only a limited number of Type instances and it has some additional benefits for AST dump (see example).
The global ids can then be used for AST dump in addition to or as a replacement of the object addresses.
A simple implementation could look like that:
// Header
class GlobalId {
unsigned Id;
static unsigned NextId;
public:
GlobalId();
unsigned getId() const { return Id; }
};
// Example to add it to clang::Type
class alignas(8) Type : public ExtQualsTypeCommonBase {
...
public:
const GlobalId GID;
...
}
// cpp file
size_t GlobalId::NextId = 1;
GlobalId::GlobalId() {
Id = NextId++;
}
Benefits
Having a unique global id stored within the objects has the following benefits:
- Every object has a deterministic id for identification.
- The AST dump can be made deterministic and therefore comparable between two runs.
- The global id is directly available during debugging by inspecting the value of the fields e.g. of clang::Type::GID::Id.
- It enables to find out where in the code an object with a certain id is created. Simple add a conditional breakpoint within GlobalId::GlobalId().
- It also enables to find out where an object is modified (if it is modified by a function). E.g. by setting a conditional breakpoint for a specific id within clang::Decl::setImplicit with e.g.
this->GID.Id == 123
- It is useful to identify and fix indeterministic executing during AST construction. Identify: ids within AST dumps between two runs differs. Fix: set conditional breakpoint on object creation for the first id that differs.
Example
For the following C++ code
template <class Ty>
Ty test(Ty x, Ty y) {
return x+y;
}
int x = test(10, 10);
the current AST output looks like
|-FunctionTemplateDecl 0x55d22de3d9c0 <<source>:2:1, line:5:1> line:3:4 test
| |-TemplateTypeParmDecl 0x55d22de3d6c0 <line:2:11, col:17> col:17 referenced class depth 0 index 0 Ty
| |-FunctionDecl 0x55d22de3d918 <line:3:1, line:5:1> line:3:4 test 'Ty (Ty, Ty)'
| | |-ParmVarDecl 0x55d22de3d790 <col:9, col:12> col:12 referenced x 'Ty'
| | |-ParmVarDecl 0x55d22de3d808 <col:15, col:18> col:18 referenced y 'Ty'
| | `-CompoundStmt 0x55d22de3db58 <col:21, line:5:1>
| | `-ReturnStmt 0x55d22de3db48 <line:4:5, col:14>
| | `-BinaryOperator 0x55d22de3db28 <col:12, col:14> '<dependent type>' '+'
| | |-DeclRefExpr 0x55d22de3dae8 <col:12> 'Ty' lvalue ParmVar 0x55d22de3d790 'x' 'Ty'
| | `-DeclRefExpr 0x55d22de3db08 <col:14> 'Ty' lvalue ParmVar 0x55d22de3d808 'y' 'Ty'
| `-FunctionDecl 0x55d22de3dea8 <line:3:1, line:5:1> line:3:4 used test 'int (int, int)'
| |-TemplateArgument type 'int'
| | `-BuiltinType 0x55d22ddf4af0 'int'
| |-ParmVarDecl 0x55d22de3dd18 <col:9, col:12> col:12 used x 'int':'int'
| |-ParmVarDecl 0x55d22de3dd90 <col:15, col:18> col:18 used y 'int':'int'
| `-CompoundStmt 0x55d22de3e178 <col:21, line:5:1>
| `-ReturnStmt 0x55d22de3e168 <line:4:5, col:14>
| `-BinaryOperator 0x55d22de3e148 <col:12, col:14> 'int':'int' '+'
| |-ImplicitCastExpr 0x55d22de3e118 <col:12> 'int':'int' <LValueToRValue>
| | `-DeclRefExpr 0x55d22de3e0d8 <col:12> 'int':'int' lvalue ParmVar 0x55d22de3dd18 'x' 'int':'int'
| `-ImplicitCastExpr 0x55d22de3e130 <col:14> 'int':'int' <LValueToRValue>
| `-DeclRefExpr 0x55d22de3e0f8 <col:14> 'int':'int' lvalue ParmVar 0x55d22de3dd90 'y' 'int':'int'
`-VarDecl 0x55d22de3db88 <line:7:1, col:20> col:5 x 'int' cinit
`-CallExpr 0x55d22de3e050 <col:9, col:20> 'int':'int'
|-ImplicitCastExpr 0x55d22de3e038 <col:9> 'int (*)(int, int)' <FunctionToPointerDecay>
| `-DeclRefExpr 0x55d22de3dfb8 <col:9> 'int (int, int)' lvalue Function 0x55d22de3dea8 'test' 'int (int, int)' (FunctionTemplate 0x55d22de3d9c0 'test')
|-IntegerLiteral 0x55d22de3dc38 <col:14> 'int' 10
`-IntegerLiteral 0x55d22de3dc58 <col:18> 'int' 10
In the AST dump, I added the global ids with #:
FunctionTemplateDecl #196 0x1c73640e148 lc 0x1c734a58048 <source:2:1, line:5:1> line:3:4 test
|-TemplateTypeParmDecl #188 0x1c7363d59b8 lc 0x1c73640e0a0 <line:2:11, col:17> col:17 referenced class depth 0 index 0 Ty
|-FunctionDecl #195 0x1c73640e0a0 lc 0x1c734a58048 <line:3:1, line:5:1> line:3:4
| |-ParmVarDecl #191 0x1c73640df00 lc 0x1c73640e0a0 <col:9, col:12> col:12 referenced x 'Ty'#190
| |-ParmVarDecl #192 0x1c73640df80 lc 0x1c73640e0a0 <col:15, col:18> col:18 referenced y 'Ty'#190
| `-CompoundStmt 0x1c73640e308 <col:21, line:5:1>
| `-ReturnStmt 0x1c73640e2f8 <line:4:5, col:14>
| `-BinaryOperator #200 0x1c73640e2d0 <col:12, col:14> '<dependent type>'#50 '+'
| |-DeclRefExpr #198 0x1c73640e280 <col:12> 'Ty'#190 lvalue ParmVar 0x1c73640df00 #191 'x' 'Ty'#190
| `-DeclRefExpr #199 0x1c73640e2a8 <col:14> 'Ty'#190 lvalue ParmVar 0x1c73640df80 #192 'y' 'Ty'#190
`-FunctionDecl #210 0x1c73640e680 lc 0x1c734a58048 <line:3:1, line:5:1> line:3:4 used test 'int (int, int)'#209
|-TemplateArgument type 'int'#7
| `-BuiltinType #7 0x1c734a58160 'int'
|-ParmVarDecl #206 0x1c73640e4e0 lc 0x1c73640e680 <col:9, col:12> col:12 used x 'int':'int'#205
|-ParmVarDecl #207 0x1c73640e560 lc 0x1c73640e680 <col:15, col:18> col:18 used y 'int':'int'#205
`-CompoundStmt 0x1c73640e988 <col:21, line:5:1>
`-ReturnStmt 0x1c73640e978 <line:4:5, col:14>
`-BinaryOperator #220 0x1c73640e950 <col:12, col:14> 'int'#7 '+'
|-ImplicitCastExpr #218 0x1c73640e910 <col:12> 'int':'int'#205 <LValueToRValue>
| `-DeclRefExpr #216 0x1c73640e8c0 <col:12> 'int':'int'#205 lvalue ParmVar 0x1c73640e4e0 #206 'x' 'int':'int'#205
`-ImplicitCastExpr #219 0x1c73640e930 <col:14> 'int':'int'#205 <LValueToRValue>
`-DeclRefExpr #217 0x1c73640e8e8 <col:14> 'int':'int'#205 lvalue ParmVar 0x1c73640e560 #207 'y' 'int':'int'#205
VarDecl #201 0x1c73640e338 lc 0x1c734a58048 <source:7:1, col:20> col:5 x 'int'#7 cinit
`-CallExpr #215 0x1c73640e840 <col:9, col:20> 'int':'int'#205
|-ImplicitCastExpr #214 0x1c73640e820 <col:9> 'int (*)(int, int)'#213 <FunctionToPointerDecay>
| `-DeclRefExpr #211 0x1c73640e790 <col:9> 'int (int, int)'#209 lvalue Function 0x1c73640e680 #210 'test' 'int (int, int)'#209 (FunctionTemplate 0x1c73640e148 #196 'test')
|-IntegerLiteral #203 0x1c73640e3f8 <col:14> 'int'#7 10
`-IntegerLiteral #204 0x1c73640e428 <col:18> 'int'#7 10
In addition to the regular AST dump, I further output in my branch all clang::Types
stored within clang::ASTContext
. That allows to see the structure behind complex types like 'int (int, int)'#209
. I think that could be activated by default for -ast-dump
or as an additional switch.
BuiltinType #2 0x1c734a580c0 'void'
BuiltinType #3 0x1c734a580e0 'bool'
BuiltinType #4 0x1c734a58100 'char'
BuiltinType #5 0x1c734a58120 'signed char'
BuiltinType #6 0x1c734a58140 'short'
BuiltinType #7 0x1c734a58160 'int'
...
RecordType #180 0x1c7363d5600 'std::__va_list'
`-CXXRecord 0x1c7363d5568 #179 '__va_list'
TemplateTypeParmType #189 0x1c7363d5a10 'type-parameter-0-0' dependent depth 0 index 0
TemplateTypeParmType #190 0x1c7363d5a40 'Ty' (canonical TemplateTypeParmType #189 0x1c7363d5a10) dependent depth 0 index 0
`-TemplateTypeParm 0x1c7363d59b8 #188 'Ty'
FunctionProtoType #193 0x1c73640dff0 'type-parameter-0-0 (type-parameter-0-0, type-parameter-0-0)' dependent cdecl
|-TemplateTypeParmType #189 0x1c7363d5a10 'type-parameter-0-0' dependent depth 0 index 0
|-TemplateTypeParmType #189 0x1c7363d5a10 'type-parameter-0-0' dependent depth 0 index 0
`-TemplateTypeParmType #189 0x1c7363d5a10 'type-parameter-0-0' dependent depth 0 index 0
FunctionProtoType #194 0x1c73640e030 'Ty (Ty, Ty)' (canonical FunctionProtoType #193 0x1c73640dff0) dependent cdecl
|-TemplateTypeParmType #190 0x1c7363d5a40 'Ty' (canonical TemplateTypeParmType #189 0x1c7363d5a10) dependent depth 0 index 0
| `-TemplateTypeParm 0x1c7363d59b8 #188 'Ty'
|-TemplateTypeParmType #190 0x1c7363d5a40 'Ty' (canonical TemplateTypeParmType #189 0x1c7363d5a10) dependent depth 0 index 0
| `-TemplateTypeParm 0x1c7363d59b8 #188 'Ty'
`-TemplateTypeParmType #190 0x1c7363d5a40 'Ty' (canonical TemplateTypeParmType #189 0x1c7363d5a10) dependent depth 0 index 0
`-TemplateTypeParm 0x1c7363d59b8 #188 'Ty'
SubstTemplateTypeParmType #205 0x1c73640e4a0 'int' (canonical BuiltinType #7 0x1c734a58160) sugar
|-TemplateTypeParmType #190 0x1c7363d5a40 'Ty' (canonical TemplateTypeParmType #189 0x1c7363d5a10) dependent depth 0 index 0
| `-TemplateTypeParm 0x1c7363d59b8 #188 'Ty'
`-BuiltinType #7 0x1c734a58160 'int'
FunctionProtoType #208 0x1c73640e5d0 'int (int, int)' cdecl
|-BuiltinType #7 0x1c734a58160 'int'
|-BuiltinType #7 0x1c734a58160 'int'
`-BuiltinType #7 0x1c734a58160 'int'
FunctionProtoType #209 0x1c73640e610 'int (int, int)' (canonical FunctionProtoType #208 0x1c73640e5d0) cdecl
|-SubstTemplateTypeParmType #205 0x1c73640e4a0 'int' (canonical BuiltinType #7 0x1c734a58160) sugar
| |-TemplateTypeParmType #190 0x1c7363d5a40 'Ty' (canonical TemplateTypeParmType #189 0x1c7363d5a10) dependent depth 0 index 0
| | `-TemplateTypeParm 0x1c7363d59b8 #188 'Ty'
| `-BuiltinType #7 0x1c734a58160 'int'
|-SubstTemplateTypeParmType #205 0x1c73640e4a0 'int' (canonical BuiltinType #7 0x1c734a58160) sugar
| |-TemplateTypeParmType #190 0x1c7363d5a40 'Ty' (canonical TemplateTypeParmType #189 0x1c7363d5a10) dependent depth 0 index 0
| | `-TemplateTypeParm 0x1c7363d59b8 #188 'Ty'
| `-BuiltinType #7 0x1c734a58160 'int'
`-SubstTemplateTypeParmType #205 0x1c73640e4a0 'int' (canonical BuiltinType #7 0x1c734a58160) sugar
|-TemplateTypeParmType #190 0x1c7363d5a40 'Ty' (canonical TemplateTypeParmType #189 0x1c7363d5a10) dependent depth 0 index 0
| `-TemplateTypeParm 0x1c7363d59b8 #188 'Ty'
`-BuiltinType #7 0x1c734a58160 'int'
PointerType #212 0x1c73640e7c0 'int (*)(int, int)'
`-FunctionProtoType #208 0x1c73640e5d0 'int (int, int)' cdecl
|-BuiltinType #7 0x1c734a58160 'int'
|-BuiltinType #7 0x1c734a58160 'int'
`-BuiltinType #7 0x1c734a58160 'int'
PointerType #213 0x1c73640e7f0 'int (*)(int, int)' (canonical PointerType #212 0x1c73640e7c0)
`-FunctionProtoType #209 0x1c73640e610 'int (int, int)' (canonical FunctionProtoType #208 0x1c73640e5d0) cdecl
|-SubstTemplateTypeParmType #205 0x1c73640e4a0 'int' (canonical BuiltinType #7 0x1c734a58160) sugar
| |-TemplateTypeParmType #190 0x1c7363d5a40 'Ty' (canonical TemplateTypeParmType #189 0x1c7363d5a10) dependent depth 0 index 0
| | `-TemplateTypeParm 0x1c7363d59b8 #188 'Ty'
| `-BuiltinType #7 0x1c734a58160 'int'
|-SubstTemplateTypeParmType #205 0x1c73640e4a0 'int' (canonical BuiltinType #7 0x1c734a58160) sugar
| |-TemplateTypeParmType #190 0x1c7363d5a40 'Ty' (canonical TemplateTypeParmType #189 0x1c7363d5a10) dependent depth 0 index 0
| | `-TemplateTypeParm 0x1c7363d59b8 #188 'Ty'
| `-BuiltinType #7 0x1c734a58160 'int'
`-SubstTemplateTypeParmType #205 0x1c73640e4a0 'int' (canonical BuiltinType #7 0x1c734a58160) sugar
|-TemplateTypeParmType #190 0x1c7363d5a40 'Ty' (canonical TemplateTypeParmType #189 0x1c7363d5a10) dependent depth 0 index 0
| `-TemplateTypeParm 0x1c7363d59b8 #188 'Ty'
`-BuiltinType #7 0x1c734a58160 'int'
Request for comment
Do you think the global id is valuable for clang?
Should I put the effort to bring it into one or more merge requests?