uweigand accepted this revision.
uweigand added a comment.
This revision is now accepted and ready to land.
I'll defer to your expertise on that. Patch looks good to me.
I guess I'm not fully familiar with some of the C++ language details here. Would you mind explaining in more detail what the problem is specifically, and what we should be doing to avoid this issue in the future? Using something like "&*I" in the LDCleanup file wouldn't have naturally occurred to me ...
See PR26753 and "ilist/iplist are broken (maybe I'll fix them?)":
I thought I was "finished" back in November, but I'd forgotten about MachineInstrBundleIterator (MachineBasicBlock::iterator) and then didn't find time to continue until the last few weeks.
Let me summarize the big picture (adding llvm-dev since I haven't talked about this in a while):
- List iterators work by having a pointer to a list node and walking "node->next" every time operator++ is called.
- Lists need a sentinel of some sort to represent the "list.end()" iterator.
- The way to do this efficiently and safely is to have the sentinel be some un-templated "list_node_base" class that has a next and prev pointer. The next/prev pointers point to "list_node_base", and "list_iterator<T>" downcasts to "T" on dereference.
- However, ilist<T> assumes the sentinel is a full-bodied "T" and uses "T*" for the "next" pointers. In practice consumers override the sentinel traits so that the sentinel is only a "ilist_half_node" (to save space). This means that operator++ invokes UB every time you increment to the end iterator and downcast to "T*" (badness).
- MachineInstrBundleIterator::operator MachineInstr*() relies on the member of ilist_iterator<MachineInstr> being an already-downcast MachineInstr*, indirectly relying on the UB in operator++. I can't safely refactor ilist/ilist_node/etc. until code stops (indirectly) relying on this UB.
What's `&*I` all about?
- An iterator<T> does not necessarily point at a valid T.
- It could just be a sentinel object that T inherits from (in the case of lists), or an unallocated memory location (in the case of arrays).
- If you really want a T*, you should dereference iterator<T> and then take the address.
- Everyone (?) knows that it's undefined behaviour to dereference an iterator<T> that is not pointing at a valid T. `&*I` makes it clear at the call site that a pre-condition to getting a `T*` is for `I` to be a valid iterator.
- I will be making the implicit operator illegal promptly when there's no code using it. Just waiting on a few backends.
- Your code will be nicer to read -- particularly after my changes -- if you use references (T&) instead of pointers (T*) whenever variables/parameters/etc. cannot be nullptr. In many cases I've updated variables/parameters/etc. in this way, but sometimes I took the easy way out of just saying `&*I` (and sometimes that's the best option). I encourage following up with more reference-based logic!