Pre-processed "used only" mode - any suggestions on approaches?


For the analyzer development, we often rely on preprocessed files to get reproducible error reports.
However, on large projects those have many disadvantages:

1. They get HUGE (3MB+ is not uncommon), and creduce may take days to go through them.
2. Including all the headers may include OS-specific builtins, which makes the report less reproducible.

Most of the code in the headers is not used.
What I really would like to have is a “preprocessed used” mode,
which only includes the code in the headers reachable from the code in the main source file.
(where “reachable” means “the function can be transitively called” or “a type could be transitively used”).

Some notes:

- I don’t particularly care about function pointers and such, I think forming those for functions in the headers is rare.
- I would want to include/exclude only top-level objects: typedefs, classes, functions.

Has anyone done anything similar?

I imagine it would not be too hard to launch an analysis based on AST which would mark regions as live
(by computing a fixpoint of used code starting from the code in the .cpp file), and then use a rewriter to remove the “dead” code,
but I was wondering if there’s a simpler approach.
Perhaps reusing parts of creduce infrastructure?