Another example would be a client that wanted to rename 'x'. How would it know all of the declarations for the same variable? There are lots of cases where clients, looking the AST for a translation unit, want to know which declarations refer to the same thing.
Hi,
For this example, I'd like to know why/when we care if all these
VarDecls refer to the same variable? Since each VarDecl is "extern",
it's the linkers job to bind the actual variable definition (which
isn't in this particular translation unit).Another example would be a client that wanted to rename 'x'. How
would it know all of the declarations for the same variable? There
are lots of cases where clients, looking the AST for a translation
unit, want to know which declarations refer to the same thing.
How would that client work? As far as I understand, each translation unit has its own AST, so this client needs some kind of linker that merges information from several ASTs. For example:
a.c:
int a = 4;
/* use a somewhere below in file*/
b.c:
extern int a;
/* use a somewhere below file */
c.c:
extern int a;
/* use a somewhere below in file*/
If the tool should rename `a` to `importantCache` (or whatever), it has to know which of a.c, b.c, and c.c belong to the same project and only rename the `a`s in there. More information in a single translation unit doesn't help here, some kind of linker is required.
Nico
Absolutely. Building a cross reference map of some kind is easy to do. For example, Sema has the following maps that track all declarations for a given method (within a particular translation unit):
/// Instance/Factory Method Pools - allows efficient lookup when typechecking
/// messages to “id”. We need to maintain a list, since selectors can have
/// differing signatures across classes. In Cocoa, this happens to be
/// extremely uncommon (only 1% of selectors are “overloaded”).
llvm::DenseMap<Selector, ObjCMethodList> InstanceMethodPool;
llvm::DenseMap<Selector, ObjCMethodList> FactoryMethodPool;
If maps like this are of general interest, we might consider adding API to Sema to extract some of the knowledge prior to its death? As an alternative, we could also post-process the AST’s and build whatever maps are needed. My gut says it will be a combination.
I think the point we are in violent agreement about is the need for “AST middleware” that sits between various clang AST clients and Sema. Some of this middleware may come directly from Sema (with a bit of refactoring).
snaroff
To rename across files we would need a higher-level API that can provide an entire "image" of the entire program/library. That would require some kind of symbol lookup. As a prerequisite, it seems to me that we would need the facilities to be able to do this within a single translation unit.
Another example would be a client that wanted to rename ‘x’. How would it know all of the declarations for the same variable? There are lots of cases where clients, looking the AST for a translation unit, want to know which declarations refer to the same thing.
Absolutely. Building a cross reference map of some kind is easy to do. For example, Sema has the following maps that track all declarations for a given method (within a particular translation unit):
/// Instance/Factory Method Pools - allows efficient lookup when typechecking
/// messages to “id”. We need to maintain a list, since selectors can have
/// differing signatures across classes. In Cocoa, this happens to be
/// extremely uncommon (only 1% of selectors are “overloaded”).
llvm::DenseMap<Selector, ObjCMethodList> InstanceMethodPool;
llvm::DenseMap<Selector, ObjCMethodList> FactoryMethodPool;If maps like this are of general interest, we might consider adding API to Sema to extract some of the knowledge prior to its death? As an alternative, we could also post-process the AST’s and build whatever maps are needed. My gut says it will be a combination.
It seems to me that Sema is already building much of this information and then just throwing it away. If we refactor those data structures to be outside of Sema, Sema could populate them for its own use and any client that wishes to retain them afterwards could do so. I agree that it will probably end up requiring a combination of extracting information from Sema and then (lazily) doing some post-processing.
I think the point we are in violent agreement about is the need for “AST middleware” that sits between various clang AST clients and Sema. Some of this middleware may come directly from Sema (with a bit of refactoring).
Exactly my feeling as well.