This is related to [RFC] Compress Intrinsic Name Table - IR & Optimizations - LLVM Discussion Forums but independent. I am implementing a string pool to cache intrinsic names (as std::string) and then vend out the backing store (string::data()) assuming it will be persistent. if I implement the pool using a DenseMap, it seems when the map is resized, the existing objects are copied (vs moved), so the earlier pointers to the backing data are now invalid. However, if it is implemented using std::unorered_map, it seems the existing objects are moved, so the vended-out pointers are still valid.
Is there an easy way to find which LLVM ADTs support this vs which do not? My understanding is that the difference being when the map is resized as a result of new entries being added or deleted, whether existing objects are moved to their new place or copied over. Or is it the case that all LLVM maps are unsafe for this kind of use and we should revert to std::map or std::unordered_map?
I don’t have a direct answer to your question, but I wanted to point out that std::unordered_map is also unsafe for this purpose because of the small string optimization. Small strings are stored directly in the std::string object instead of being heap-allocated, so even a move will invalidate pointers to small strings.
“References and pointers to either key or data stored in the container are only invalidated by erasing that element, even when the corresponding iterator is invalidated.”
Yeah it’s not a copy vs move thing, it seems more like “iterator invalidation” kind of problem.
In general you have options to use DenseMap in such situation by (top of my head):
Storing the elements as unique_ptr<std::string> (this is useful with std::vector as well or any similar container where you need to workaround the iterator-invalidation problem by preserving pointer stability to the elements).
If you can know the size (or max size) before starting escape the pointer to the elements, you can use reserve() on the map.
You may use a separate storage for appending the strings themselves, and keep the map entries as non-owning (using DenseMap<StringRef, StorageEntry *> for example).
We should still update the programmer guide with information about iterator invalidation and these trade-offs, if it does not yet contain the information.
Thanks all. @mehdi_amini I do recall using a std::map<std::unique_ptr> to guard against this in some other case a while back and then undoing the std::unique_ptr part since it seemed unnecessary as things worked without the extra indirection, due to iterator validity of std::map.
It seems though that there is a slight difference between iterator validity across a container mutation vs validity of the object pointer. For example, would it be possible for iterators to stay valid but objects still being copied over as the container mutates (and hence references invalidated)? I guess @topperc’s mention of std::unordered_map documentation suggests that they are different: “References and pointers to either key or data stored in the container are only invalidated by erasing that element, even when the corresponding iterator is invalidated”.
For C++, a quick Google search showed: Iterator Invalidation Rules (C++0x) (kera.name). Note the difference between std::map and std::unordered_map. Both have references unaffected, but one invalidates iterators and other does not. Note that there is no C++ container that has “iterators unaffected but references invalidated”, but could be hypothetically possible.
For a persistent cache like use case, we just want “references unaffected”, so either a std:map or std::unordered_map works.
I agree that the programmer guide and the ADT headers as well can use a comment about this. ADT headers can just comment about iterator validity, and the programmers guide could have a line or 2 summarizing @mehdi_amini response (but that may be too specific a problem to be in the programmer guide). If we can build a table like above for C++, that would be great as well.
Thanks, though for this use case, it may not be appropriate as here the key is not a string but an Intrinsic::ID and value is the string. That being said, I’ll update the issue I filed to call out object reference stability for both key and value for associative containers explicitly.