hi folks!
As I was working on implementation of [RFC] Add support for controlling diagnostics severities at file-level granularity through command line. I ended up re-using llvm::SpecialCaseList and it felt like things can be improved here a little bit to ease maintenance and re-use going forward.
Hence I’d like to perform some NFC refactorings in this area and wanted to see if people have concerns before attempting that.
I believe the major refactoring point is separating logic that parses a SpecialCaseList
input file and performs matching based on this.
The format is generic, and currently re-used by 3 major components:
- XRay, XRay Instrumentation — LLVM 20.0.0git documentation
- ProfileList, Clang Compiler User’s Manual — Clang 20.0.0git documentation
- Sanitizers, Sanitizer special case list — Clang 20.0.0git documentation
- Diagnostic suppression mappings, https://github.com/llvm/llvm-project/pull/112517
All of this use-cases parse the special-case-list files using the same logic and then customize bits and pieces in the matching logic (some needs line numbers, some don’t, some have very different matching criteria than others, some wants to verify certain information in parsed format).
As a result all of these implementations inherit llvm::SpecialCaseList
just to use their parser, and then tweak its matching logic to accommodate their use cases. Turning base implementation into a mess that’s really hard to reason about.
Having a shared parser and letting people use some standard matching logic separately should improve simplicity of code here.
The second bit I’d like to change is usage of StringMaps in the matching logic with BumpPtrAllocators.
Various entites like llvm-project/llvm/include/llvm/Support/SpecialCaseList.h at main · llvm/llvm-project · GitHub use a StringMap
solely to keep strings alive. Afterwards all the usages on this stringmap actually iterate over all the entries.
Hence I’d like to use a BumpPtrAllocator
for keeping strings alive and a std::vector
to store & iterate over all the entries.
This will have a slight behavior change. Currently sections with same names are “merged”, e.g:
[foo]
src:my_file.cc
[bar]
src:your_file.cc
[foo]
src:your_file.cc
will create only a single Section
for foo
on line 1
.
First of all Sanitizer special case list — Clang 20.0.0git documentation doesn’t mention anything about declaring the same section multiple times. So I believe any bets are off here. Moreover the new implementation will store these as two separate Section
s, one foo
on line 1, another foo
on line 3, which seems better.
In terms of matching behavior we might have changes again. Previously the example file above could fist match all the entries that belong to foo
section hence your_file.cc
would match foo
. Now it can match against bar
instead.
But because we were actually using StringMap
s all over the place, when there are multiple entities matching a query, it wasn’t guaranteed which one would match. Hence again, I think using std::vector
in these places are actually going to make matching behavior more “reasonable” by making sure we’re matching entries in the order user provided them.
If no one has concerns about these two points, I’d like to start implementing them as soon as https://github.com/llvm/llvm-project/pull/112517 lands.