My suggestion, but others should jump in here too, is that you split the patch
up into two separate aims.
The first aim would be the reduction of SourceLocation information.
The second aim would be caching of previous macro expansions (from what I
understand of your patch, I think this is what you are attempting?).
I think this would be a benefit since it may be easier to get the patches
working if you work on the two parts independently. Plus you indicated that
the first still has a sizeable benefit. And I think the second will give you
problems (detailed below).
I'm afraid I am not able to help you much on the interactions of the
SourceManager, Lexer and Parser. There are others on this mailing list who
would be able to help you much better in this regard.
Regarding reduction of SourceLocation information, these are my thoughts.
Please note that I haven't assessed the practicality of this, and I'm writing
as I think so it may not even be coherent! Caveats aside, you talked
previously of having this as a compiler switch. I think this is a good idea
since the SourceLocation information is needed for debugging macro expansions.
I would suggest implementing this switch as a depth counter so that
SourceLocation information is only once a macro expansion reaches a certain
depth the SourceLocation information will be elided. In a sense, the
important information of a macro expansion is the SourceLocations of the
original tokens going into the macro and the SourceLocation of the final
scratch buffer. At this point, I'm wondering whether it is useful to have
SourceLocation information for concatenated tokens... Maybe someone else can
give there thoughts here.
The other thing you can think about here is that of actually releasing a
scratch buffer no longer in use. I think clang doesn't do this currently
(probably because of keeping all the SourceLocation information), but if you
looked at having a reference count on a scratch buffer then maybe it will be
possible to release it once its last use is complete. This has the benefit of
also freeing memory... and may also improve the speed of the compiler. Again,
others should give their thoughts here. Personally, I would be very
interested to see if this was a possible line of improvement since with
extensive macro expansions, the memory consumption of clang is huge.
In fact, thinking about this, this may be the first place to start -- it is
sort of linked to eliding SourceLocation information, but it has the advantage
that you are freeing the whole scratch buffer since it has been incorporated
into another higher-level scratch buffer, and so is no longer referenced
anywhere and its disappearance shouldn't (hopefully!) cause so many problems
in clang diagnostics, for example, which even so would need to be adapted to
skip these elided macro expansion steps.
Finally, regarding the caching of previous macro expansions, you need to be
very careful because macro definitions can change, yes even inside a macro
expansion too (think pragma pop_macro). For that matter, you need to think
about how to handle any _Pragma expansion since these can cause side effects.
This is why I would leave this side of your patch until last. It will be the
hardest to test and guarantee correctness.
Keep up the good work! :o)
Cheers,
Andy