Here are some statistics for those of you who like statistics.
Before embarking on a project to speed up access to record fields in TableGen, I thought I'd collect a little data.
Number of records built from the AMDGPU .td files: 55,539
Number of fields in those records: 2,877,918
Average number of fields per record: 52
So the average field lookup sequential scan is about 25 iterations
Number of field lookups performed by DAG ISel emitter: 6,294,426
Number of iterations = 6,294,426 x 25 = 157,000,000 (approx.)
How long does each iteration take? Can it be more than 10 instructions? It's 7 instructions on the X86. So perhaps about 3 ns.? (I may be off here.)
Time saved if the field access could be cut to 0 ns.: 472,000,000 ns.
A modern CPU frequently has three cache levels with each subsequent larger cache level having a slower access. Then there is memory beyond cache, and disk beyond memory with disk having a cache for frequently used disk data. In addition to where the data may be in this hierarchy is the way the data is obtained which is in blocks of a certain size.
This link gives a rough idea of how long it takes to access various data sources.
Beyond this is how the container is used. For example, using an unordered_map into pointers to another data location where the data is actually stored provides another path for delay.
To actually know if a particular container selection and design will work better will likely require benchmarking the alternatives under realistic conditions.
Along this line I had thought that NVME SSDs were substantially faster than SATA SSDs and then saw some careful benchmarking that only showed that NVME SSDs were marginally faster which explained their marginal difference in price.