Bug in MapVector::erase ?

To avoid duplications, MapVector stores Values inside the vector only, while the map caches key->indices to the vector.

When a value is erased from the MapVector, it is removed from the vector, invalidating the cached vector indices stored in the map, so the map points to wrong locations and the MapVector is corrupted.

Yaron

+llvmdev +Oliver. MapVector::erase was added in r211273.

To avoid duplications, MapVector stores Values inside the vector only, while the map caches key->indices to the vector.

Do you have a reproducer for this problem? MapVector::erase removes the value from both the Vector and the Map, so it should not suffer from this problem.

Oliver

Hi Oliver,

There would be no problem if both the Map and the Vector indeed contained (Key,Value) pairs.

To save space, Map entries do not contain Value but instead an unsigned index into the Vector:

/// The values are kept in a std::vector and the
/// mapping is done with DenseMap from Keys to indexes in that vector.

After an element is erased from the Vector all indices greater than the removed index are one too large.

That’s probably the reason why Mapvector did not have erase before. See

http://llvm.org/docs/ProgrammersManual.html#llvm-adt-mapvector-h

…and it doesn’t support removing elements.

To correct this erase needs to decrement all vector indices in the Map which are larger than the removed index.

Note that pop_back() does not have this problem as there are no indices following the last one.

Yaron

Sounds pretty clearly buggy, and against the original design of the
ADT (as pointed out by the documentation quotation). When was erase
functionality added to MapVector? Can/should it be removed (and the
use case changed to use some other container)

Making erase linear time sounds... not ideal, but possibly its
sufficient for some/current use cases. Or we could consider other
solutions to the use cases.

I see this was added in r211273 for the purposes of MCDwarf section
range handling. Perhaps instead of removing empty ranges, they could
just be omitted when emitting ranges. finalizeDwarfRanges could return
the count of non-empty ranges, and use that to decide whether to emit
any ranges, whether to emit it as ranges or just low_pc/high_pc in
MCGenDwarfInfo::Emit, and add a conditional (checking for a non-null
end symbol) in EmitGenDwarfRanges.

Further: I'd really like to see the range building/emission code
refactored into a common utility between MCDwarf and AsmPrinter DWARF
handling (similar to what I've done with the line table
emission/handling). I'm not sure if this would address the problem
too.

Heh -- it's already linear. It erases from the middle of a vector.

At the least, if erasing is kept (and fixed), someone needs to update the
docs and add a unit test.

One way of speeding this up would be to erase a set of values in bulk. I
think we could easily add a `remove_if()` function that deleted a set of
matching items in linear time, using a linear-size auxiliary array of
indices to keep state.

I'd be happy to implement this if there's a need (if you can easily design
it out, that's better).

Yeah, at a glance it doesn't seem especially painful to design out -
maybe 10 lines of code.

Great. The `remove_if()` would probably be about the same (excluding tests), so
if this is ever the "right" approach, it might still be worth it.

After attempting, this gets a bit more insidious and (perhaps
unsurprisingly) my estimate turned out to be rather optimistic.

(for the record, it gets ugly in EmitGenDwarfAranges which currently
computes the length ahead of time (so, it does things like take the
length of the ranges collection - so if we leave dead entries in there
it'd have to iterate through to count, for example - or just use a
label difference))

If you're happy to fix/test a MapVector::remove_if, that'd be swell.
It might be a while before I find the time to refactor this whole
thing into something common (& /maybe/ in the process, avoid this
particular removal quirk) between DwarfDebug and MCDwarf... :confused:

- David

Here's a patch with the fixes (totally untested). I'll commit
incrementally as I write unit tests, but you can use this locally if
you want to try out changes to the MC code in the meantime.

+chandlerc: as code owner of ADT, any problem with me adding this API
to MapVector?

mapvector.patch (1.95 KB)

Can we explicitly delete the erase method or do something else to document the fact that it is unsupported? It was added incidentally in r211350, even though it was added and removed by Doug back in r175538 / r175449.

I'm happy with it deleted or fixed (see WIP patch that fixes it w/o
tests). For now, I'll fix it, and then David (or someone else) can
migrate the code to `remove_if()`.

FWIW, when it's fixed, it doesn't have to be "unsupported" -- it's
just *slow*.

mapvector.patch (1.95 KB)

Committed a whitespace change in r213082, fixed `erase()` in r213084, and added `remove_if()` in r213090.

I'll leave it up to David how to move forward once the MCDwarf stuff
migrates to `remove_if()`.

If re-deleting `erase()` is the plan, someone should update the comments
and the docs right away to say so (I changed the text to document its
complexity, rather than lack of support).

Doubt I'll get to making any changes immediately (including switching
over to remove_if - perhaps Oscar can send a CR for that) but will
keep it all in mind :slight_smile:

Thanks again!

- Dave

It’s worth keeping erase() with the complexity comment, same as we have linear-time vector find().

All ADT have tradeoffs. We could have constant time erase with a MapList instead of MapVector but the random access iterator would be linear-time.

Yaron