On DataExtractor isValidOffset* interface

Hi Ben,

Me and David were discussing ways to use DataExtractor recently and I
found one thing in the comments that seems wrong.

From include/llvm/Support/DataExtractor.h:

336 /// Test the validity of \a offset.
337 ///
338 /// @return
339 /// \b true if \a offset is a valid offset into the data in this
340 /// object, \b false otherwise.
341 bool isValidOffset(uint32_t offset) const { return Data.size() > offset; }
342
343 /// Test the availability of \a length bytes of data from \a offset.
344 ///
345 /// @return
346 /// \b true if \a offset is a valid offset and there are \a
347 /// length bytes available at that offset, \b false otherwise.
348 bool isValidOffsetForDataOfSize(uint32_t offset, uint32_t length) const {
349 return offset + length >= offset && isValidOffset(offset + length - 1);
350 }

The "if \a offset is a valid offset" part of the comment for
isValidOffsetForDataOfSize doesn't seem right, as it doesn't actually
call isValidOffset(offset). Specifically,
"isValidOffsetForDataOfSize(x, 0)" is equivalent to "isValidOffset(x -
1)" rather than "isValidOffset(x)", which is surprising (at least if
you don't look at the implementation).

The other thing we've discussed is how to check DE for "everything
we've read so far was at valid offsets". The first solution that
comes to mind is "isValidOffset(NextByte - 1)"... but I wonder if
there are any typical usage patterns that should be addressed in a
more natural way. Like "hasEverReadPastEndOfBuffer()" flag maybe?

––
Thanks,
Tim

Hi Ben,

Me and David were discussing ways to use DataExtractor recently and I
found one thing in the comments that seems wrong.

From include/llvm/Support/DataExtractor.h:
336 /// Test the validity of \a offset.
337 ///
338 /// @return
339 /// \b true if \a offset is a valid offset into the data in this
340 /// object, \b false otherwise.
341 bool isValidOffset(uint32_t offset) const { return Data.size() > offset; }
342
343 /// Test the availability of \a length bytes of data from \a offset.
344 ///
345 /// @return
346 /// \b true if \a offset is a valid offset and there are \a
347 /// length bytes available at that offset, \b false otherwise.
348 bool isValidOffsetForDataOfSize(uint32_t offset, uint32_t length) const {
349 return offset + length >= offset && isValidOffset(offset + length - 1);
350 }

The "if \a offset is a valid offset" part of the comment for
isValidOffsetForDataOfSize doesn't seem right, as it doesn't actually
call isValidOffset(offset). Specifically,
"isValidOffsetForDataOfSize(x, 0)" is equivalent to "isValidOffset(x -
1)" rather than "isValidOffset(x)", which is surprising (at least if
you don't look at the implementation).

length=0 is an edge case for this function, it's designed to validate a read of a non-zero size from a specific offset. If you have a use case you can just make it call isValidOffset(Offset) in that case. This also explains the 'surprising' -1 behavior as for a read of 1 byte you only have to check if the offset is valid.

DataExtractor was initially designed after a LLDB class of the same name, this function seems to have diverged a lot since :frowning:

The other thing we've discussed is how to check DE for "everything
we've read so far was at valid offsets". The first solution that
comes to mind is "isValidOffset(NextByte - 1)"... but I wonder if
there are any typical usage patterns that should be addressed in a
more natural way. Like "hasEverReadPastEndOfBuffer()" flag maybe?

Adding flags to DataExtractor wouldn't be nice as it's designed to have as little mutable state as possible, better add your own class on top of DE if you need that behavior.

- Ben

While equivalent, would it be easier to understand if the two functions were flipped around?

bool isValidOffset(uint32_t offset) const { return isValidOffsetForDataOfSize(offset, 1); }
bool isValidOffsetForDataOfSize(uint32_t offset, uint32_t length) const { return offset + length >= offset && Data.size() >= offset+length; }