I implemented a map for Record fields using StringMap. The map is created when the backend does a getAllDerivedDefinitions(), under the assumption that it will then access the values from the returned vector of Record's.
This resulted in a noticeable slowdown of the backends, on the order of 10%. The time to construct the maps appears to overwhelm any savings in accessing the field values.
I'm going to try some other tricks, but I think this is a dead end.
Have you profiled the result? For example, on a Mac you can use the Instruments tool that comes with Xcode. On linux you can use a variety of perf tools, on windows you can use Visual Studios profiler.
I wonder if you’re getting zero hits on the cache?
In addition to what Chris said, on Windows you can also use either ETW or VTune:
You'd need the latest Windows SDK as well, for viewing the trace with WPA (Windows Performance Analyzer): https://developer.microsoft.com/en-us/windows/downloads/windows-10-sdk/
After installing UIforETW, ensure you set "Tracing to file" on the dropbox on the right. The trace will record the entire system activity, not only a specific app. The advantage is that you can get the whole picture for all processes launched during a full rebuild of LLVM for example.
Then there's VTune Amplifier: https://software.intel.com/content/www/us/en/develop/tools/oneapi/components/vtune-profiler.html
You can use it like ETW, to record the whole system activity. Or just to profile a single app, like the Visual Studio profiler. Although VTune will give more fine-grained information than anything else. You'd be able to see very precisely where your bottlenecks are (most likely in the memory hierarchy in your case). VTune also allows for very fine sample intervals, which could be valuable.
Let me know if you would like help on this. Otherwise Bruce Dawson has quite a few blogs on using WPA (Windows Performance Analyzer).