Trying to use unordered_map

Okay, I'm at a loss to understand what I'm doing wrong with the unordered_map class. Thank you for any help you can give.

I have two data structures:

  SmallVector<RecordVal, 0> Values;
  std::unordered_map<const Init *, const RecordVal *> ValueMap;

I just added the unordered_map to experiment with faster lookups for the fields in TableGen records. The key is a StringInit and the corresponding value is a RecordVal. I made it a pointer to the RecordVal since the existing SmallVector holds the actual RecordVal instance.

Here is the code that adds a field to the structures:

  void addValue(const RecordVal &RV) {
    Values.push_back(RV);
    ValueMap[RV.getNameInit()] = &RV;

The RecordVal is push_backed onto the vector and inserted into the map by assignment.

Here is the code that looks up an entry. First the new code and then the old code.

  const RecordVal *getValue(const Init *Name) const {
    auto It = ValueMap.find(Name);
    if (It != ValueMap.end()) {
      return It->second;
    }
    return nullptr;

    for (const RecordVal &Val : Values)
      if (Val.Name == Name) {
        return &Val;
      }
    return nullptr;
  }

As far as I can tell, 'return It->second' is not returning the RecordVal. However, I'm thoroughly confused because I threw in a ton of prints to check it. Using It->Second I can print the name and value of the RecordVal and they are correct. I even printed the addresses of the name and value data structures from both It-Second and Val and they are the same. But when I print the address It-Second and the address &Val, they are different.

(Indeed, I should use Visual Studio to look at this, but that is a lesson for another day.)

void addValue(const RecordVal &RV) {
   Values.push_back(RV);
   ValueMap[RV.getNameInit()] = &RV;

I think the last line should be
ValueMap[RV.getNameInit()] = &Values.back();

Are you trying to look them up in "Values"?

Actually, even this would be wrong, because vector can reallocate storage. If you want to look up the values in the vector, store indices in the map, not pointers.

Yep, as Krzysztof mentioned - pointer invalidation when inserting elements into a SmallVector (std::vector has similar guarantees here). Adding new elements can require a reallocation - moving all the objects over into the new allocation, so any pointers pointing to the original elements would be invalid. (though, yeah, before you get to that point - you already have dangling references because you’re pointing to the RV temporary, uinstead of the copy of RV in the vector)

You could use indexes as Krzysztof mentioned - other options include using non-invalidating data structures (like std::list or std::deque) or indirection (a SmallVector of unique_ptr - so that pointers remain stable/valid) (though both those solutions involve extra memory/allocation overhead) or putting the RV in the map and pointing to it from the vector (unordered_map has pointer stability through insertion and removal - though llvm’s DenseMap does not have such a guarantee). Or maybe llvm’s MapVector would abstract you from the complexities of all these options?

The RecordVals are copied into the SmallVector. Of course! I love this crowd.

Eventually I hope to eliminate the SmallVector and just use the map. But I have both while I experiment with the map to see if it speeds up TableGen. So for now I will change the map so it maps the name Init to the vector index. Then when it's time for a real implementation, I will use the never-before-known-to-me MapVector, since I do need the the fields in order for certain applications.

Thanks, folks!

MapVector is exactly what I want, except for one feature. Is there a reason why it does not have a function to return an ArrayRef to the vector, so I can iterate over it without using the class's iterators? It does have a function takeVector() that clears the map and moves the vector.

I realize that modifying the returned vector would cause problems, but is that a reason not to have the function?

I'm asking because TableGen has a function to return the vector of record values so that the caller can iterate over them. This function is used both in TableGen itself and in backends.

MapVector has begin/end that return the range of the embedded vector. You can create ArrayRef (or a copy of the vector) from that range. Would that work for you?