BitstreamReader::GetCurrentBitNo() counterintuitive

As the comments said, BitstreamReader::GetCurrentBitNo() should return the bit # of the bit we are reading. But the current implementation actually return 4 more bytes than the bit we are reading. It calculate the bit number by

(NextChar-FirstChar)*8 + ((32-BitsInCurWord) & 31);

But ‘NextChar’ is already after the word that we are reading. I propose to fix this by the following patch.

Index: include/llvm/Bitcode/BitstreamReader.h

As the comments said, BitstreamReader::GetCurrentBitNo() should return the bit # of the bit we are reading. But the current implementation actually return 4 more bytes than the bit we are reading. It calculate the bit number by

(NextChar-FirstChar)*8 + ((32-BitsInCurWord) & 31);

But 'NextChar' is already after the word that we are reading. I propose to fix this by the following patch.

Hi Zhongxing,

This is ok as long as it doesn't cause any regressions in the *llvm* dejagnu regression tests, thanks!

-Chris

Part of the problem is that "Deserialize.cpp" (in llvm) uses GetCurrentBitNo() in a whole bunch of places. AST serialization/deserialization is the only client of the serialization library, and there aren't that many test cases in the clang test suite for AST serialization because it just hasn't been completed yet. I have a feeling that changing the behavior of GetCurrentBitNo() will break things.

I'm actually a little surprised that GetCurrentBitNo() needs to be changed. While the AST serialization work is incomplete, it has worked for non-trivial cases. Since it uses GetCurrentBitNo() to jump around the bitcode file, I'm suspicious that anything needs to be changed.

Also, what happens if NextChar == FirstChar? Wouldn't (NextChar-FirstChar-4)*8 be a negative number?

OK.

I don’t know. Let’s leave it as it is now.

It will be negative. But in that case, BitsInCurWord will be 0, the final result is 0.

Okay. If GetCurrentBitNo() is broken then it should be fixed. I personally haven’t looked at it in detail recently, so I’m not certain what the ramifications would be if we changed it or why it seems to work correctly now (or maybe it doesn’t?).

I do think GetCurrentBitNo() is broken, because it is not returning the bit # we are reading. But it works now, partially because we have code like this:

if (WordBitNo) {

  • NextChar -= 4;
    Read(static_cast(WordBitNo));

I don’t know if there is other code doing the similar compensate work.

Hmm. This is definitely worth investigating further. At the very least there should be more comments documenting the invariants/assumptions in the code with regards to the bit location.