Choosing the appropriate iterator for the job

At the moment I am in the process of getting our out-of-tree compiler up to date with respect to the v4.0 branch, and thankfully all is going well. However, one of the most common changes I have had to make has to do with iterations over machine instructions and basic blocks.

These are changes I have had to make incrementally each time we catch up with a release of LLVM, and basically the issue is that I have to replace something like:

void foo(MachineInstruction* x);

MachineBasicBlock::iterator index = …;




This pattern of having to make this edit (adding the ‘&*’) looks really scary to me, I am always nervous of expressions that look like this, and over the past three or so releases we have had to make similar changes (usually, but not always to various MI/MBB iterators).

So my guess is that we are using the wrong iterator, or an inappropriate iterator; but there are so many iterators now that I wonder if there is guidance on which iterator is most appropriate to a particular context or usage.

The main types of iteration we need are over the instructions in a basic-block, sometimes including the MI contents of bundles, and sometimes over the bundle (treating them as a single MI). And of course the ‘const’ versus editable iterations.

Does anybody have pointers to existing developer documentation advising which iterators to use, when and how? Or have advice they can give here in this forum if such documentation does not exist (I haven’t found it if it does exist)?

I think it would be really useful information in general, and since our code is originally derived from LLVM v2.7, as LLVM modernises, we often have to make tweaks to code built a long time ago to get it working again. The trick of adding ‘&*’ works, but is deeply inelegant and error-prone, and I’m pretty sure that it can be made a lot cleaner if we use the right iterator for the job.

Thanks for your advice,


I don’t have an answer, but I’ve just been through exactly the same process. A huge number of the build failures after my last merge from upstream were fixed by adding &* in front of an iterator. Like you, this made me slightly nervous (&*x and x having different types in general makes me slightly unhappy, but needing to explicitly take advantage of this fact is a lot worse). Is there a better fix, or should we be adding user-defined conversion operators for implicitly casting the iterator to the pointer type?


IIRC, a lot of these are where implicit operators were removed (in part because they triggered in places they shouldn’t and caused bugs).
Some of the others are simply because we now have api inconsistencies where some things are ilists (which have iterators that give references) and some iplists (which have iterators that give pointers)
It’d be nice to use pointer_iterator to clean those up so the API was consistent.

There’s at least one case i know of where the inconsistency is caused by a bit of design we’d have to clean up:

It’s not possible to have two iplists for the same type, one owning, one not, because ilist_traits is per-type, not per-tag.

At least in the case of MachineInstr: The '&’ can be found in a number of places in CodeGen now.
You can try to reduce the instances by streamlining your APIs to switch less between MachineBasicBlock::iterator and MachineInstr
; After all you can convert between the two at zero runtime costs with either &* or MI->getIterator(). However I would expect that some *& instances remain no matter what and that this is not really a problem.

  • Matthias

I don't think there is documentation or broader consensus/patterns throughout the CodeGen library.
As for myself: I try to use MachineInstr* as much as possible: Using range based for esp. with make_range let's you mostly avoid MachineBasicBlock::iterator. I tend to only use the iterator when I want to indicate a position inside a basic block (for inserting instructions before/behind it).

- Matthias