At what point should I consider changing from a StringSwitch construct to something more efficient? TableGen has a few of them. In particular, the switch for looking up a bang operator has about 40 cases, all short strings.
One trivial improvement would be to split it into two StringSwitch's based on the first letter of the operator name.
If that improves performance, then it sounds like a bug/area for improvement in StringSwitch itself, perhaps?
StringSwitch just uses a series of Case functions that compare the target against each case string. Once the target is matched, the remaining Case functions bypass the comparison. One way or the other, it goes through all the Case's. I don't think there is any way to optimize this. Does anyone know if the compiler does something clever?
StringSwitch just uses a series of Case functions that compare the target against each case string.
Fair point - your proposal is to take advantage of sorting, essentially? Knowing which subset start with certain characters or not. Perhaps there could be a version of StringSwitch that only works on sorted input (a named variant, an extra flag, etc) - asserts if the inputs aren’t sorted, and then relies on that sorting to be able to take advantage, maybe make a short trie or the like.
But yeah, that’s probably a bigger design problem you might not be able to/interested in changing right now, coming back to your original question of “when should I use StringSwitch and when should I do something else (hash map, etc)”. I don’t know what the current tipping point is there.
This has come up on the list before . The suggestion at the end of that thread was to look at the optimized code produced by the compiler .
Yeah, years ago, the compiler did the right thing through a combination of jump threading and simple redundancy elimination. StringSwitch does a first character specialization IIRC.
Excellent. Then I shall not concern myself with the efficiency of StringSwitch.
This may still need a measurement. StringSwitch takes a StringRef
(const char *Data + size_t Length). The compiler cannot arbitrarily
reference Data[i] if i >= Length. It can do specialization but it
needs to know the valid range.
I agree - something that was true years ago is not necessarily still valid today.