Diagnostic IDs, parsing speed and how to generate lookup tables

Hi clangers,

while doing some parsing speed investigation I noted that a lot of diagnostics stuff is in the very hot path for parsing time. One thing that fell out of that is Benjamin’s recent patch that speeds up getting the diagnostic info. But, there’s more to be gotten - with a simple cache for the diagnotic classes, I was able to get another 1.5% parsing speedup (benchmarked over all google code).

Unfortunately the patch in its current state is not thread safe; the obvious solution to that problem would be to generate the diagnostic class table statically instead of writing to a cache at runtime.

This turns out to be surprisingly hard though - the diagnostic id start values are defined in an enum, while the actual diagnostics come from the .td files.

A first idea was that we might define the start values inside the .td files, and create the enum from that, instead of the other way around.

That led to the question posed by Dmitri on irc why there are start ranges in the first place - we could instead tablegen all diagnostics at once and let the tablegen take care of generating the appropriate enum values for the start of certain ranges.

So if anybody has better ideas for how to solve the lookup problem, or knows why the diagnostic ids have fixed start values, I’d be very interested to learn more about it :slight_smile:

Thanks!
/Manuel

I think it is just historical reasons. Having tblgen produce the enums makes perfect sense.

-Chris

The purpose of the fixed starting values is just compile-time efficiency: we’d like to be able to add/remove diagnostics (generally in one part of clang) without requiring a full recompile. As long as your scheme still allows this in most cases — maybe tblgen gets re-run for all diagnostics whenever any of them change, but tblgen rounds up to the next multiple of 50 between ranges so that adding a diagnostic in one range doesn’t usually force a full recompile — I think we’re fine.

John.

Right. Basically move the magic constants in DiagnosticIDs.h:

enum {
DIAG_START_DRIVER = 300,
DIAG_START_FRONTEND = DIAG_START_DRIVER + 100,
DIAG_START_SERIALIZATION = DIAG_START_FRONTEND + 100,
DIAG_START_LEX = DIAG_START_SERIALIZATION + 120,
DIAG_START_PARSE = DIAG_START_LEX + 300,
DIAG_START_AST = DIAG_START_PARSE + 400,
DIAG_START_COMMENT = DIAG_START_AST + 100,
DIAG_START_SEMA = DIAG_START_COMMENT + 100,
DIAG_START_ANALYSIS = DIAG_START_SEMA + 3000,
DIAG_UPPER_LIMIT = DIAG_START_ANALYSIS + 100
};

from the header file into tblgen.

-Chris

Of the Diagnostic*Kinds.td files the only one that gets updated fairly
regularly (every 1-2 days) is DiagnosticSemaKinds.td. Can't we just put it
last, and save the hassle of putting holes into the generated lookup
tables? (The next highest frequent change is ~4-5 times per month; in that
time frame a full rebuild is in order anyway).

Cheers,
/Manuel

It’s not really about the compile times for people periodically pulling the tree; changes on trunk regularly cause full or nearly-full rebuilds anyway. It’s that someone working on a patch to the driver/frontend/parser/lexer/whatever should be able to add and tweak their diagnostics without forcing a full rebuild every time.

Also, under the current system, the hassle of these holes to typical clang developers is basically zero, and even that will pretty much disappear completely if we get tblgen to manage them automatically.

John.