motivations for approaches to macros in Index/IndexingAction.cpp ?

[reposting after accidentally sending to the old cfe address]

Hi all,

I've been playing with the indexstore API and was a little surprised
to find that preprocessor macro definitions are apparently not
considered when generating the hash of a record
(https://github.com/apple/swift-clang/blob/stable/lib/Index/IndexRecordHasher.h).

Was this intentional? And if so, what was the motivation?

I ask because it seems like things are very nearly set up to
incorporate macro definitions/references in an IndexStore record (and
to produce both "symbols" and "occurrences" for macro definitions &
expansions by default when -index-store-path is given to the Clang
driver).

Eric Liu introduced some support for indexing macros
(https://github.com/apple/swift-clang/commit/8576df59f2fe52ae305fdb91b36cbed8946981a1)
but it doesn't seem to intersect with IndexStore "records", and I'm
not sure why...

I guess the idea was that people would only use this via libclang?
(But in that case my question becomes, "why wouldn't you want to
capture this as a regular part of a build?")

Currently, entities from the AST are captured by IndexASTConsumer, an
instance of which is set up by
IndexRecordActionBase::createIndexASTConsumer(), and it seems like
this is the point where there is an opportunity to call
clang::Preprocessor::addPPCallbacks() with a class that overrides:

PPCallbacks::MacroExpands()
PPCallbacks::MacroDefined()

... which would then forward macro definition/expansion events to the
IndexDataRecorder.

If anyone who has worked in this area could comment, that would be
most appreciated!

--James

Hi James,

The current implementation of this hashing function ignores some cases. We're in the process of upstreaming the whole index-while-building feature to LLVM project and decided to fix this by temporarily using trivial implementation - just hashing the full bitcode representation of index record. Eventually we need a new implementation that is both correct and performant - so far the idea was to explore the possibility of using the existing ODR violation hasher for this purpose.

HTH,

Jan

Ok, so... I guess "correct and performant" means "yes, we do want to
incorporate macro definitions & expansions in the future", in which
case I'm looking forward to it! (:

separate topic, under "performance": it seems like it would be good if
declarations & definitions of entities with external linkage were not
represented as "occurrences", because currently, if I want to get the
locations of all such decls, it seems like I need to visit all
"occurrences", which seems like a lot of extra work if the user ends
up visiting only two translation units. (And currently, doing this for
an index-store for the clang & llvm source code takes about 11.5
seconds on my macbook.)

In other words, upon loading the index-store, I need to know the
locations of all external decls & defs for the whole program (because
the user should be able to jump to any of those from the outset, from
any TU), but it seems like loading for all *other* occurrences could
be done more lazily.

I don't know where this topic fits in the pipeline of design
discussions for upstreaming index-while-building tho.

--James

+ Argyrios and Ben who might comment on current use of index-store.
+ clangd-dev in case anyone is interested.

+ Argyrios and Ben who might comment on current use of index-store.
+ clangd-dev in case anyone is interested.

>
>>
>> Hi James,
>>
>> The current implementation of this hashing function ignores some cases. We're in the process of upstreaming the whole index-while-building feature to LLVM project and decided to fix this by temporarily using trivial implementation - just hashing the full bitcode representation of index record. Eventually we need a new implementation that is both correct and performant - so far the idea was to explore the possibility of using the existing ODR violation hasher for this purpose.
>
> Ok, so... I guess "correct and performant" means "yes, we do want to
> incorporate macro definitions & expansions in the future", in which
> case I'm looking forward to it! (:
>
> separate topic, under "performance": it seems like it would be good if
> declarations & definitions of entities with external linkage were not
> represented as "occurrences", because currently, if I want to get the
> locations of all such decls, it seems like I need to visit all
> "occurrences", which seems like a lot of extra work if the user ends
> up visiting only two translation units. (And currently, doing this for
> an index-store for the clang & llvm source code takes about 11.5
> seconds on my macbook.)
>
> In other words, upon loading the index-store, I need to know the
> locations of all external decls & defs for the whole program (because
> the user should be able to jump to any of those from the outset, from
> any TU), but it seems like loading for all *other* occurrences could
> be done more lazily.

On second thought, I guess you want to be able to jump to any
namespace-scope decl from any file (or no file), i.e. you also want to
load all decls with internal linkage on startup. (That would still be
a significantly lighter load than "all occurrences" though.)

  • Argyrios and Ben who might comment on current use of index-store.
  • clangd-dev in case anyone is interested.

Hi James,

The current implementation of this hashing function ignores some cases. We’re in the process of upstreaming the whole index-while-building feature to LLVM project and decided to fix this by temporarily using trivial implementation - just hashing the full bitcode representation of index record. Eventually we need a new implementation that is both correct and performant - so far the idea was to explore the possibility of using the existing ODR violation hasher for this purpose.

Ok, so… I guess “correct and performant” means “yes, we do want to
incorporate macro definitions & expansions in the future”, in which
case I’m looking forward to it! (:

Not including macros is only because before enabling them we wanted to get performance metrics to see what impact they will have.

separate topic, under “performance”: it seems like it would be good if
declarations & definitions of entities with external linkage were not
represented as “occurrences”, because currently, if I want to get the
locations of all such decls, it seems like I need to visit all
“occurrences”, which seems like a lot of extra work if the user ends
up visiting only two translation units. (And currently, doing this for
an index-store for the clang & llvm source code takes about 11.5
seconds on my macbook.)

In other words, upon loading the index-store, I need to know the
locations of all external decls & defs for the whole program (because
the user should be able to jump to any of those from the outset, from
any TU), but it seems like loading for all other occurrences could
be done more lazily.

On second thought, I guess you want to be able to jump to any
namespace-scope decl from any file (or no file), i.e. you also want to
load all decls with internal linkage on startup. (That would still be
a significantly lighter load than “all occurrences” though.)

I’d recommend to take a look at using https://github.com/apple/indexstore-db for queries based on the index-store. That one is even more lazily than you suggest in that during loading it only reads the section that contains the Symbol info + roles, and avoids reading the occurrences, and only reads rest of the record if it wants to find the source locations of specific occurrences (e.g. we know this record contains the ‘definition’ occurrence of this USR, so read the record to find out its source location).