Odd behavior in SlotIndex::getInstrDistance()

Currently when calculating the instruction distance between two different SlotIndices using the the getInstrDistance function there is some behavior that seems like it could be somewhat lacking in documentation. The current implementation of this function looks like the following:

int getInstrDistance(SlotIndex other) const {
      return (other.listEntry()->getIndex() - listEntry()->getIndex())
        / Slot_Count;

However, this function will currently return four times the actual instruction distance as the distance between slot indices in adjacent instructions is set to 4 * Slot_Count (actual variable is SlotIndex::InstrDistance defined in SlotIndexes.h). There is some nuance as sometimes the slot indices will be renumbered and there will be a distance of SlotIndex::InstrDistance / 2 rather than just the full value, but this is still twice the value of just Slot_Count.

The seemingly obvious correct implementation to me would be to divide the index difference by SlotIndex::InstrDistance which will be correct without renumbering. So while not correct all the time, it seems like it would get pretty close most of the time given how often I’ve seen renumbered SlotIndices that got put into the list with an index offset of eight. However, it seems like something more complicated is being done here. Does anyone have any idea what is going on?

I tested changing the function to calculate the distance using SlotIndex::InstrDistance and ran some benchmarks as this function is used in the greedy register allocator to assign priorities to various live ranges, and the performance seems to range from slightly better (0.4% average reduction in loads over the microbenchmarks in the llvm test suite) to about even (0.02% average reduction in loads over Chromium performance tests), but that seems more likely to be due to just shuffling around the priorities to something that happens to work better rather than actually fixing any functionality.

Finally, the commit that introduced this patch only mentions reducing the value only to save bits, suggesting that the function is implemented correctly, but the reasoning given in the commit message doesn’t really make a lot of sense as it has the exact same return type (int) as distance().

Any info on why this is written the way it is will be greatly appreciated.

@atrick - original author of the patch and could probably shed some light on its functionality.

@atrick Any info on this? I’d at least like to be able to open a CL better documenting this function (or perhaps refactoring it).

I didn’t write this code, but I think it is reasonable as-is.

“Instruction distance” is only used as a heuristic (and a pretty unreliable one at that) to compare things like the lengths of live ranges. So in a sense it doesn’t matter too much what the units are as long as they can be compared consistently.

Dividing by Slot_Count removes the detail that each instruction has multiple slots, since we don’t care about that, and leaves you with the number of instructions assuming they were numbered as densely as possible. Actually as you pointed out we initially number them sparsely (that’s what SlotIndex::InstrDistance is for) but there is always the chance that new instructions get added to fill in the gaps in the numbering. So dividing by InstrDistance would be too much, because it would throw away some detail in the case that the instructions were densely numbered.

Of course if you’re comparing the lengths of two ranges, and one of them is sparsely numbered and the other is densely numbered, then you will not get reliable results anyway. That’s why I say it’s an unreliable heuristic. But it is still better than nothing.

Thank you very much for the response.

I’m not trying to say that functionally it is unreasonable. As a distance heuristic it definitely makes sense (although the magnitude of the difference between that and just using SlotIndex::distance doesn’t seem like it would be too great for this purpose).

It doesn’t matter exactly what units we’re using, but the current implementation of the greedy register allocator seem to be comparing different units depending upon some different circumstances handled in the greedy register allocator (mainly switching between LiveInterval::getSize(), and SlotIndex::InstrDistance). Changing to getInstrDistance all those years ago didn’t seem to induce a functional change at that point in time (no tests were changed), but swapping the units now does induce a functional change. Not totally sure why this is. Doesn’t seem to be due to any obvious change in the implementation of the containing function.

The function is correct assuming that the instructions are packed as densely as possible, but I don’t believe this will ever be the case in practice. They are by default spaced SlotIndex::InstrDistance and the SlotIndexes::renumberIndexes function still spaces them half that apart.

Definitely won’t give reliable results in a lot of cases, but the accurate alternative would be a pretty expensive function for something that is called many times per invocation.

The original commit message still doesn’t make a whole lot of sense, but elements of the priority calculation are definitely starting to make a bit more sense now.

It can happen. See SlotIndexes::insertMachineInstrInMaps. It will pick a new number for the inserted instruction, which is half way between two existing instructions. Only if it fails (i.e. if those instructions were already densely numbered) will it call renumberIndexes to renumber a bunch of instructions.

Perhaps renaming the function to something like getApproxInstrDistance and summarizing in the doc comment this discussion (even referencing it!) would be a reasonable step forward?

Refactoring change has landed in this commit.