Bitcode abbreviations for something that's not a record

Hi all!

Fuzzing llvm’s bitcode reader, I found a problem where the reader assumes that the first field in an abbreviation will not be an array or a blob (and asserts otherwise).

I don’t know if this is expected (but not documented) or not. The documentation, to me, reads like it doesn’t disallow it, but we might be assuming all abreviations start with a full record, which would make the first operand never be an array or a blob.

The bug comes from r181639 (http://llvm.org/klaus/llvm/commit/1197e38f3338b8db76f0fa38c2687c65b2bcea5c/), which took the code to read the first argument and put it outside of the loop, but didn’t take the Array/Blob verification + reading code too (It’s a bug because that commit was supposed to not have changed functionality :slight_smile: ).

This could be “fixed” with, either a report_fatal_error (if we eventually have better error handling on that code, we can make that non-fatal and report to the caller), or by hoisting the Array/Blob reading code out of the loop too (actually, write a helper function).

What should be done about this?

Thanks,

Filipe

The restriction looks reasonable: A record starts with a code. The code can be encoded as a literal or be part of the abbreviation.

There is probably not a lot of value in supporting a code embedded in the first element of an array or blob.

Ok, I’ll submit a patch to turn that into a report_fatal_error saying you can’t start an abbrev with an array or blob.

Thanks,

Filipe

In general, are we OK with suppressing AFL assertion failures by calling report_fatal_error? I don’t think we should exit(1) when encountering invalid bitcode. The caller might want to do something else. The bitcode reader should ideally be architected to fail safely.

It’s strictly better than leaving an assert/unreachable, since it’s user input, not bad API usage.
I’m not very familiar with BitcodeReader’s API, so in this case I would prefer to fix it right away, and later we could figure out how to make the API recoverable.

Although:
In this case, we can return something like -1, for now, making some callers skip over the thing or returning an error. The problem is: this is supposed to be an abbrev record. If you can’t read one of its parts, what do you do?
Otherwise, I can make readRecord return an std::error_code, since most of its callers do the same, and propagate the error, since all of its callers (AFAICT) return an error_code too.

It's strictly better than leaving an assert/unreachable, since it's user
input, not bad API usage.
I'm not very familiar with BitcodeReader's API, so in this case I would
prefer to fix it right away, and later we could figure out how to make the
API recoverable.

I agree with both of you :slight_smile:
It is an improvement, but not ideal. If the ideal is not obvious we can do
it in two steps and get the test passing for now.

Although:
In this case, we can return something like -1, for now, making some
callers skip over the thing or returning an error. The problem is: this is
supposed to be an abbrev record. If you can't read one of its parts, what
do you do?
Otherwise, I can make readRecord return an std::error_code, since most of
its callers do the same, and propagate the error, since all of its callers
(AFAICT) return an error_code too.

In the case of bit code, the error handling strategy is somewhat clear. I
think what would be needed is:

* Pass a DiagnosticHandlerFunction down to the constructor. The
BitcodeReader can then use the one from the BitstreamReader.
* Have functions that can fail diagnose the failure and return an
error_code or ErrorOr<something>

It shouldn't be hard, but is non trivial, so I would be OK with
report_fatal_error first.

Cheers,
Rafael