LLVM asserts

assert and abort should never happen, just like a null pointer or garbage pointer dereference should never happen. How are garbage pointer and asserts different?

-Chris

Chris Lattner <clattner@apple.com> writes:

It's not about recovery, it's about message reporting. Right now
the LLVM
abort messages are formatted completely differently than aborts from
other
parts of our compiler. That's not friendly to users.

This is what ErrorReporting.h is all about.

But asserts and aborts don't go through that, right? What's needed is
a way to get those assert and abort messages out to other compiler
components so they have a chance to report them nicely to the user.

assert and abort should never happen, just like a null pointer or
garbage pointer dereference should never happen. How are garbage
pointer and asserts different?

Well, an assert says the compiler is broken, but the compiler may be
just another component of a large application. You are thinking on terms
of a traditional edit-run-crash compiler and on this scenario you are
right. No so for a JIT embedded on a server, where it plays a secondary
role and it is not allowed to bring down the server because some random
piece of code uncovers a bug on the compiler or the LLVM backend.

As LLVM is really a library it should behave like one and not just abort the whole process on an assert.

Ideally we should be able to implement so set of options where an assert can cause one of a number of error mechanisms to be used from SIGABT to throwing and execption object containing line number file and the error message. This would require replacing asserts with say an llvm_assert( bool, char*) style preprocessor call that can generate the desired response, throw an error to be catched by a handler or doing a C++ throw, depending on how it is defined.

I was intending to look into this after implementing the COFF Writer. It should not be too much work to convert the asserts into an llvm_assert, iirc there are about 500 of them.

Aaron

2009/8/20 Óscar Fuentes <ofv@wanadoo.es>

So what should we use to validate input, such as constructing an
APFloat from a string? llvm_report_error? Or should I change my string
parsing function into a classmethod that returns pointer to an APFloat
or null on failure?

Nothing in ADT should be using llvm_report_error; if something is
expected to fail, it should return an error code, and otherwise it
should assert.

-Eli

This isn't Java; if we hit an assert, it's likely memory is in an
inconsistent state, so we don't want to try to recover.

-Eli

Eli Friedman <eli.friedman@gmail.com> writes:

As LLVM is really a library it should behave like one and not just abort the
whole process on an assert.

This isn't Java; if we hit an assert, it's likely memory is in an
inconsistent state, so we don't want to try to recover.

This over-used argument about "memory likely is in an inconsistent state
so-it-is-no-hope" surprises me every time I see it.

Memory is *likely* in an incosistent state, and even if it actually is,
why shouldn't we *try* to do something useful before throwing the whole
thing into a black hole? You may want to log something helpful for
diagnosing the problem (think on an IDE that implements a REPL: knowing
which code broke the compiler is a very interesting piece of
information) or even you may report the problem and try to keep going:
if the process is really trashed, it will die anyway or it will be
killed by the operator (there are scenarios where this is a very
reasonable policy).

Besides, in my experience all LLVM asserts I see are Type mismatches for
instructions and similar cases, i.e., errors about my compiler producing
wrong LLVM "code" through the API, plus some occasional bug on LLVM,
sometimes about one layer passing the wrong thing to the next layer,
sometimes the bug being the existence of the assert check itself.

And finally, as Aaron points out, a robust library reports failures, but
shouldn't kill the process.

Eli Friedman <eli.friedman@gmail.com> writes:

Besides, in my experience all LLVM asserts I see are Type mismatches for
instructions and similar cases, i.e., errors about my compiler producing
wrong LLVM "code" through the API, plus some occasional bug on LLVM,
sometimes about one layer passing the wrong thing to the next layer,
sometimes the bug being the existence of the assert check itself.

Failures like this are common in development, but should not occur in production quality code. They mean that your compiler is broken and you should fix it. The Verifier pass is one way to catch some classes of malformed IR before you toss it in to poor unsuspecting optimizers.

And finally, as Aaron points out, a robust library reports failures, but
shouldn't kill the process.

Again, an assertion failure is no different than dereferencing a dangling or garbage pointer. This is a non-recoverable error. llvm_unreachable is the same class of problem. It is semantically the same as assert(0) - an unrecoverable failure.

To say it again:
    an assert is not recoverable

Remember that assertions happen when an invariant is broken. Invariants, by definition, don't vary. Assertions are also disabled in certain build modes, and this is important because the whole idea is that they do checking in debug builds that disappear in production builds. It is really not feasible or desirable for them to cause a "recoverable error", even ignoring the fact that there is no good recovery mechanism.

I want to reiterate what Eli said, it is worth repeating:

"

So what should we use to validate input, such as constructing an
APFloat from a string? llvm_report_error? Or should I change my string
parsing function into a classmethod that returns pointer to an APFloat
or null on failure?

Nothing in ADT should be using llvm_report_error; if something is
expected to fail, it should return an error code, and otherwise it
should assert."

In other words, we should define APIs that have strict invariants on the callers. If an API naturally has a failure mode, it should report it by returning an error code. Only in situations where it is really really really unreasonable to do this should the error be reported with llvm_report_error. The cases that fall into this category are things like malformed inline asm constraints etc. It is very difficult for the front-end to fully validate all of these constraints, and so the code generator currently has to report some errors.

To be fully clear, I'm not a big fan of the current llvm_report_error approach (though it is definitely better than what we had before!). In my ideal little world, when the backend invokes llvm_report_error, it should *guarantee* that it can continue on and return back. For example, in an inline asm error, I think the code generator should report the error with llvm_report_error and then discard the invalid inline asm and continue code generation.

This approach gives clients flexibility over how to handle the problem: if they want to immediately abort they can (and this should be the default behavior). However, if the code generator is embedded into a larger app, the app should be able to install a handler, do code generation and when it completes, it could check for an error and report it however desired. Right now the only reasonable implementation of an error handler ends in exit(1) which is not optimal.

-Chris

David Greene wrote:

To be fully clear, I'm not a big fan of the current llvm_report_error
approach (though it is definitely better than what we had before!).
In my ideal little world, when the backend invokes llvm_report_error,
it should *guarantee* that it can continue on and return back. For
example, in an inline asm error, I think the code generator should
report the error with llvm_report_error and then discard the invalid
inline asm and continue code generation.

On one hand, the same thing could occur with any malformed
construct... a type mismatch in the values fed to CreateCall, for
instance, wouldn't corrupt memory; the error could be reported and the
CallInst discarded. On the other hand, of course, if the client is
expected to check for that stuff, it would slow things down to check
for it again on each CreateXXX call, so I certainly see the logic in
keeping that the way it is.

I think it would be a big improvement for the asserts in those cases
to print out what values it expected and what values it got, say by
dumping the offending Value or Type objects to the console along with
the line in Instructions.cpp where the assert happens. Would y'all
look kindly on a patch for that after the release? I think it would
speed up every client development project. I know as I develop my
compiler, I would love to have a way to print out that information if,
and only if, something is wrong with the values I'm trying to feed
CreateXXX.

Maybe those asserts could be replaced with an optional failure reporting
code?
By default it would assert, as currently, and if enabled it can report
the exact error to you.

Regarding the corrupted memory issue: we have LLVMContext now, just
because LLVM/an LLVM module is corrupted
in one Context it shouldn't prevent the use of LLVM in other contexts.
I think the cases where memory corruption outside LLVMContext triggers
an assert is rare.
Perhaps asserts could be turned into something else if the user desires
via a compile-time switch?

Thoughts?

P.S.: after 2.6 is released of course

Best regards,
--Edwin

And programs should never have bugs either.

The reality is, they do. And when we screw up, we owe it to our users
to give them as much information as possible and gracefully exit. We
can't do that right now with the current LLVM assert/abort scheme.

                                 -Dave

Hi Dave,

These asserts are designed to be disabled entirely in a release build of llvm. Why is it so important to you to get user-friendly diagnostics for them when they aren't guaranteed to be there at all, and you'll likely be getting things like segfaults instead in a release build?

Is your primary goal to be able to present a more consistent and friendly face to the user even when your application goes completely bonkers, does nasty stuff and needs to shut down? Or are you trying to use them as user-level diagnostics and continue on with broader processing?

If the former, you should probably be catching things like SIGBUS and SIGSEGV already. Those errors are equivalent in severity to the sorts of things the debug asserts are catching. I'd suggest just adding SIGABRT to that list.

If the latter, you're asking for a pretty fundamental change in the nature of things, and is why you're getting a ton of resistance.

Regards,
   -Jim

These asserts are designed to be disabled entirely in a release build
of llvm. Why is it so important to you to get user-friendly
diagnostics for them when they aren't guaranteed to be there at all,
and you'll likely be getting things like segfaults instead in a
release build?

We don't disable them in our release build because an abort message is
at least a little more friendly than a fault with no information at all.

Is your primary goal to be able to present a more consistent and
friendly face to the user even when your application goes completely
bonkers, does nasty stuff and needs to shut down? Or are you trying to
use them as user-level diagnostics and continue on with broader
processing?

The former. We certainly don't expect to continue processing.

If the former, you should probably be catching things like SIGBUS and
SIGSEGV already. Those errors are equivalent in severity to the sorts
of things the debug asserts are catching. I'd suggest just adding
SIGABRT to that list.

We catch all of these things, but what we can't capture is the messages
that go along with the asserts and aborts. They get dumped out rather
randomly from the perspective of the user since they're output outside of
the signal handling mechanism. I shouldn't think it would be too hard to
design a more robust failure mechanism that provides some hooks for 3rd
party messaging systems. Perhaps I'll work on that someday... :slight_smile:

                               -Dave