Interest in fast BitVector?

Some time ago I developed a fast BitVector class to use in some research
here. It uses expression templates to fuse operation loops and runs
much faster than the existing BitVector for some important use-cases.
It also has the ability to efficiently report if a BitVector's contents
changed after some operation. For example:

ETBitVector A = ...
ETBitVector B = ...
ETBitVector C = ...

bool Changed = A.assign(A & B | ~C); // Bit operation loops fused

Using this resulted in 100x or more speedup over BitVector. Obviously,
the speedup is greater the more operations are chained together. For a
simple A.assign(A & B) it wouldn't be terribly faster. But the ability
to report a change efficiently can be useful even for simple operations.

I based this off an old version of BitVector. At the time it was a
drop-in replacement but it would need some updating to conform to the
current BitVector interface.

Before I do that work, is there any interest in this?

                               -David

Do you see a lot of opportunities for fusing bit vector operations in the LLVM code? Bit vectors are very convenient, so having them be fast is certainly worthwhile.

-Krzysztof

Krzysztof Parzyszek via llvm-dev <llvm-dev@lists.llvm.org> writes:

Before I do that work, is there any interest in this?

Do you see a lot of opportunities for fusing bit vector operations in
the LLVM code? Bit vectors are very convenient, so having them be
fast is certainly worthwhile.

I honestly don't know. I developed this for some very specific work,
not to fix a problem in existing code. I would say there are two
situations where this could be useful:

1. Chained bitvector operations (even two operations would be a win)
2. Desire to know if an operation changes a bitvector's value

A quick search through the code turned up this in
lib/CodeGen/SafeStackColoring.cpp:

  bool changed = true;
  while (changed) {
    changed = false;

    for (BasicBlock *BB : depth_first(&F)) {
      BlockLifetimeInfo &BlockInfo = BlockLiveness[BB];

      // Compute LiveIn by unioning together the LiveOut sets of all preds.
      BitVector LocalLiveIn;
      for (auto *PredBB : predecessors(BB)) {
        LivenessMap::const_iterator I = BlockLiveness.find(PredBB);
        // If a predecessor is unreachable, ignore it.
        if (I == BlockLiveness.end())
          continue;
        LocalLiveIn |= I->second.LiveOut;
      }

      // Compute LiveOut by subtracting out lifetimes that end in this
      // block, then adding in lifetimes that begin in this block. If
      // we have both BEGIN and END markers in the same basic block
      // then we know that the BEGIN marker comes after the END,
      // because we already handle the case where the BEGIN comes
      // before the END when collecting the markers (and building the
      // BEGIN/END vectors).
      BitVector LocalLiveOut = LocalLiveIn;
      LocalLiveOut.reset(BlockInfo.End);
      LocalLiveOut |= BlockInfo.Begin;

      // Update block LiveIn set, noting whether it has changed.
      if (LocalLiveIn.test(BlockInfo.LiveIn)) {
        changed = true;
        BlockInfo.LiveIn |= LocalLiveIn;
      }

      // Update block LiveOut set, noting whether it has changed.
      if (LocalLiveOut.test(BlockInfo.LiveOut)) {
        changed = true;
        BlockInfo.LiveOut |= LocalLiveOut;
      }
    }
  } // while changed.

At minimum, this does a loop to test if two BitVectors differ and then
another loop to union the two. ETBitVector could do this in one loop.

I could also imagine the code being refactored to gather all predecessor
bitvectors first and then using operation chaining to fuse the union
operation rather than looping over predessors and doing individual union
operations. C++17 fold expressions would make this pretty slick. I
could add an ETBitVector utility to make such refactoring easier.

                          -David