[RFC] Error handling in LLVM libraries.

Hi All,

I’ve been thinking lately about how to improve LLVM’s error model and error reporting. A lack of good error reporting in Orc and MCJIT has forced me to spend a lot of time investigating hard-to-debug errors that could easily have been identified if we provided richer error information to the client, rather than just aborting. Kevin Enderby has made similar observations about the state of libObject and the difficulty of producing good error messages for damaged object files. I expect to encounter more issues like this as I continue work on the MachO side of LLD. I see tackling the error modeling problem as a first step towards improving error handling in general: if we make it easy to model errors, it may pave the way for better error handling in many parts of our libraries.

At present in LLVM we model errors with std::error_code (and its helper, ErrorOr) and use diagnostic streams for error reporting. Neither of these seem entirely up to the job of providing a solid error-handling mechanism for library code. Diagnostic streams are great if all you want to do is report failure to the user and then terminate, but they can’t be used to distinguish between different kinds of errors, and so are unsuited to many use-cases (especially error recovery). On the other hand, std::error_code allows error kinds to be distinguished, but suffers a number of drawbacks:

  1. It carries no context: It tells you what went wrong, but not where or why, making it difficult to produce good diagnostics.
  2. It’s extremely easy to ignore or forget: instances can be silently dropped.
  3. It’s not especially debugger friendly: Most people call the error_code constructors directly for both success and failure values. Breakpoints have to be set carefully to avoid stopping when success values are constructed.

In fairness to std::error_code, it has some nice properties too:

  1. It’s extremely lightweight.
  2. It’s explicit in the API (unlike exceptions).
  3. It doesn’t require C++ RTTI (a requirement for use in LLVM).

To address these shortcomings I have prototyped a new error-handling scheme partially inspired by C++ exceptions. The aim was to preserve the performance and API visibility of std::error_code, while allowing users to define custom error classes and inheritance relationships between them. My hope is that library code could use this scheme to model errors in a meaningful way, allowing clients to inspect the error information and recover where possible, or provide a rich diagnostic when aborting.

The scheme has three major “moving parts”:

  1. A new ‘TypedError’ class that can be used as a replacement for std::error_code. E.g.

std::error_code foo();

becomes

TypedError foo();

The TypedError class serves as a lightweight wrapper for the real error information (see (2)). It also contains a ‘Checked’ flag, initially set to false, that tracks whether the error has been handled or not. If a TypedError is ever destructed without being checked (or passed on to someone else) it will call std::terminate(). TypedError cannot be silently dropped.

  1. A utility class, TypedErrorInfo, for building error class hierarchies rooted at ‘TypedErrorInfoBase’ with custom RTTI. E.g.

// Define a new error type implicitly inheriting from TypedErrorInfoBase.
class MyCustomError : public TypedErrorInfo {
public:
// Custom error info.
};

// Define a subclass of MyCustomError.
class MyCustomSubError : public TypedErrorInfo<MyCustomSubError, MyCustomError> {
public:
// Extends MyCustomError, adds new members.
};

  1. A set of utility functions that use the custom RTTI system to inspect and handle typed errors. For example ‘catchAllTypedErrors’ and ‘handleTypedError’ cooperate to handle error instances in a type-safe way:

TypedError foo() {
if (SomeFailureCondition)
return make_typed_error();
}

TypedError Err = foo();

catchAllTypedErrors(std::move(Err),
handleTypedError(
(std::unique_ptr E) {
// Handle the error.
return TypedError(); // ← Indicate success from handler.
}
)
);

If your initial reaction is “Too much boilerplate!” I understand, but take comfort: (1) In the overwhelmingly common case of simply returning errors, the usage is identical to std::error_code:

if (TypedError Err = foo())
return Err;

and (2) the boilerplate for catching errors is usually easily contained in a handful of utility functions, and tends not to crowd the rest of your source code. My initial experiments with this scheme involved updating many source lines, but did not add much code at all beyond the new error classes that were introduced.

I believe that this scheme addresses many of the shortcomings of std::error_code while maintaining the strengths:

  1. Context - Custom error classes enable the user to attach as much contextual information as desired.

  2. Difficult to drop - The ‘checked’ flag in TypedError ensures that it can’t be dropped, it must be explicitly “handled”, even if that only involves catching the error and doing nothing.

  3. Debugger friendly - You can set a breakpoint on any custom error class’s constructor to catch that error being created. Since the error class hierarchy is rooted you can break on TypedErrorInfoBase::TypedErrorInfoBase to catch any error being raised.

  4. Lightweight - Because TypedError instances are just a pointer and a checked-bit, move-constructing it is very cheap. We may also want to consider ignoring the ‘checked’ bit in release mode, at which point TypedError should be as cheap as std::error_code.

  5. Explicit - TypedError is represented explicitly in the APIs, the same as std::error_code.

  6. Does not require C++ RTTI - The custom RTTI system does not rely on any standard C++ RTTI features.

This scheme also has one attribute that I haven’t seen in previous error handling systems (though my experience in this area is limited): Errors are not copyable, due to ownership semantics of TypedError. I think this actually neatly captures the idea that there is a chain of responsibility for dealing with any given error. Responsibility may be transferred (e.g. by returning it to a caller), but it cannot be duplicated as it doesn’t generally make sense for multiple people to report or attempt to recover from the same error.

I’ve tested this prototype out by threading it through the object-creation APIs of libObject and using custom error classes to report errors in MachO headers. My initial experience is that this has enabled much richer error messages than are possible with std::error_code.

To enable interaction with APIs that still use std::error_code I have added a custom ECError class that wraps a std::error_code, and can be converted back to a std::error_code using the typedErrorToErrorCode function. For now, all custom error code classes should (and do, in the prototype) derive from this utility class. In my experiments, this has made it easy to thread TypedError selectively through parts of the API. Eventually my hope is that TypedError could replace std::error_code for user-facing APIs, at which point custom errors would no longer need to derive from ECError, and ECError could be relegated to a utility for interacting with other codebases that still use std::error_code.

So - I look forward to hearing your thoughts. :slight_smile:

Cheers,
Lang.

Attached files:

typed_error.patch - Adds include/llvm/Support/TypedError.h (also adds anchor() method to lib/Support/ErrorHandling.cpp).

error_demo.tgz - Stand-alone program demo’ing basic use of the TypedError API.

libobject_typed_error_demo.patch - Threads TypedError through the binary-file creation methods (createBinary, createObjectFile, etc). Proof-of-concept for how TypedError can be integrated into an existing system.

typed_error.patch (22.2 KB)

error_demo.tgz (1.25 KB)

thread_typederror_through_object_creation.patch (102 KB)

From: "Lang Hames via llvm-dev" <llvm-dev@lists.llvm.org>
To: "LLVM Developers Mailing List" <llvm-dev@lists.llvm.org>
Sent: Tuesday, February 2, 2016 7:29:45 PM
Subject: [llvm-dev] [RFC] Error handling in LLVM libraries.

Hi All,

I've been thinking lately about how to improve LLVM's error model and
error reporting. A lack of good error reporting in Orc and MCJIT has
forced me to spend a lot of time investigating hard-to-debug errors
that could easily have been identified if we provided richer error
information to the client, rather than just aborting. Kevin Enderby
has made similar observations about the state of libObject and the
difficulty of producing good error messages for damaged object
files. I expect to encounter more issues like this as I continue
work on the MachO side of LLD. I see tackling the error modeling
problem as a first step towards improving error handling in general:
if we make it easy to model errors, it may pave the way for better
error handling in many parts of our libraries.

At present in LLVM we model errors with std::error_code (and its
helper, ErrorOr) and use diagnostic streams for error reporting.
Neither of these seem entirely up to the job of providing a solid
error-handling mechanism for library code. Diagnostic streams are
great if all you want to do is report failure to the user and then
terminate, but they can't be used to distinguish between different
kinds of errors, and so are unsuited to many use-cases (especially
error recovery). On the other hand, std::error_code allows error
kinds to be distinguished, but suffers a number of drawbacks:

1. It carries no context: It tells you what went wrong, but not where
or why, making it difficult to produce good diagnostics.

Generically, I like this idea.

However, regarding context, I wonder about the best model. When we designed the diagnostic reporting interface (by which I mean the bits in include/llvm/IR/DiagnosticInfo.h and include/llvm/IR/DiagnosticPrinter.h), the ability to carry context was very important. There, however, because the objects are being passed via a callback to the user-installed handler, they can carry pointers/references to objects (Values, Functions, etc.) that will go away once the object that detected the error is destroyed. In the model you're proposing, all of the context must be contained within the error object itself (because, by the time the context is useful, an arbitrary amount of the call stack to the error-detection point has already been unwound). This greatly limits the amount of information that can be efficiently stored as context in the error object. Depending on the use cases, it might be better to pass the context to some kind of error-handler callback than to try to pack it all into a long-lived error object. Thoughts?

Thanks again,
Hal

Regarding one point in particular:

  1. Difficult to drop - The ‘checked’ flag in TypedError ensures that it can’t be dropped, it must be explicitly “handled”, even if that only involves catching the error and doing nothing.

It seems to me that “[[clang::warn_unused_result]] class TypedError” is probably sufficient for ensuring people check a status return value; I’m not sure runtime checking really brings much additional value there.

Hi James,

It seems to me that “[[clang::warn_unused_result]] class TypedError” is probably sufficient for ensuring people check a status return value; I’m not sure runtime checking really brings much additional value there.

I see the attribute as complimentary. The runtime check provides a stronger guarantee: the error cannot be dropped on any path, rather than just “the result is used”. The attribute can help you catch obvious violations of this at compile time.

  • Lang.

Hi Lang,

I’m glad someone tackle this long lived issue :slight_smile:
I’ve started to think about it recently but didn’t as far as you did!

Hi All,

I’ve been thinking lately about how to improve LLVM’s error model and error reporting. A lack of good error reporting in Orc and MCJIT has forced me to spend a lot of time investigating hard-to-debug errors that could easily have been identified if we provided richer error information to the client, rather than just aborting. Kevin Enderby has made similar observations about the state of libObject and the difficulty of producing good error messages for damaged object files. I expect to encounter more issues like this as I continue work on the MachO side of LLD. I see tackling the error modeling problem as a first step towards improving error handling in general: if we make it easy to model errors, it may pave the way for better error handling in many parts of our libraries.

At present in LLVM we model errors with std::error_code (and its helper, ErrorOr) and use diagnostic streams for error reporting. Neither of these seem entirely up to the job of providing a solid error-handling mechanism for library code. Diagnostic streams are great if all you want to do is report failure to the user and then terminate, but they can’t be used to distinguish between different kinds of errors

I’m not sure to understand this claim? You are supposed to be able to extend and subclass the type of diagnostics? (I remember doing it for an out-of-tree LLVM-based project).

, and so are unsuited to many use-cases (especially error recovery). On the other hand, std::error_code allows error kinds to be distinguished, but suffers a number of drawbacks:

  1. It carries no context: It tells you what went wrong, but not where or why, making it difficult to produce good diagnostics.
  2. It’s extremely easy to ignore or forget: instances can be silently dropped.
  3. It’s not especially debugger friendly: Most people call the error_code constructors directly for both success and failure values. Breakpoints have to be set carefully to avoid stopping when success values are constructed.

In fairness to std::error_code, it has some nice properties too:

  1. It’s extremely lightweight.
  2. It’s explicit in the API (unlike exceptions).
  3. It doesn’t require C++ RTTI (a requirement for use in LLVM).

To address these shortcomings I have prototyped a new error-handling scheme partially inspired by C++ exceptions. The aim was to preserve the performance and API visibility of std::error_code, while allowing users to define custom error classes and inheritance relationships between them. My hope is that library code could use this scheme to model errors in a meaningful way, allowing clients to inspect the error information and recover where possible, or provide a rich diagnostic when aborting.

The scheme has three major “moving parts”:

  1. A new ‘TypedError’ class that can be used as a replacement for std::error_code. E.g.

std::error_code foo();

becomes

TypedError foo();

The TypedError class serves as a lightweight wrapper for the real error information (see (2)). It also contains a ‘Checked’ flag, initially set to false, that tracks whether the error has been handled or not. If a TypedError is ever destructed without being checked (or passed on to someone else) it will call std::terminate(). TypedError cannot be silently dropped.

I really like the fact that not checking the error triggers an error (this is the "hard to misuse” part of API design IMO).
You don’t mention it, but I’d rather see this “checked” flag compiled out with NDEBUG.

  1. A utility class, TypedErrorInfo, for building error class hierarchies rooted at ‘TypedErrorInfoBase’ with custom RTTI. E.g.

// Define a new error type implicitly inheriting from TypedErrorInfoBase.
class MyCustomError : public TypedErrorInfo {
public:
// Custom error info.
};

// Define a subclass of MyCustomError.
class MyCustomSubError : public TypedErrorInfo<MyCustomSubError, MyCustomError> {
public:
// Extends MyCustomError, adds new members.
};

  1. A set of utility functions that use the custom RTTI system to inspect and handle typed errors. For example ‘catchAllTypedErrors’ and ‘handleTypedError’ cooperate to handle error instances in a type-safe way:

TypedError foo() {
if (SomeFailureCondition)
return make_typed_error();
}

TypedError Err = foo();

catchAllTypedErrors(std::move(Err),
handleTypedError(
(std::unique_ptr E) {
// Handle the error.
return TypedError(); // ← Indicate success from handler.

What does success or failure means for the handler?

}
)
);

If your initial reaction is “Too much boilerplate!” I understand, but take comfort: (1) In the overwhelmingly common case of simply returning errors, the usage is identical to std::error_code:

if (TypedError Err = foo())
return Err;

and (2) the boilerplate for catching errors is usually easily contained in a handful of utility functions, and tends not to crowd the rest of your source code. My initial experiments with this scheme involved updating many source lines, but did not add much code at all beyond the new error classes that were introduced.

I believe that this scheme addresses many of the shortcomings of std::error_code while maintaining the strengths:

  1. Context - Custom error classes enable the user to attach as much contextual information as desired.

  2. Difficult to drop - The ‘checked’ flag in TypedError ensures that it can’t be dropped, it must be explicitly “handled”, even if that only involves catching the error and doing nothing.

  3. Debugger friendly - You can set a breakpoint on any custom error class’s constructor to catch that error being created. Since the error class hierarchy is rooted you can break on TypedErrorInfoBase::TypedErrorInfoBase to catch any error being raised.

  4. Lightweight - Because TypedError instances are just a pointer and a checked-bit, move-constructing it is very cheap. We may also want to consider ignoring the ‘checked’ bit in release mode, at which point TypedError should be as cheap as std::error_code.

Oh here you mention compiling out the “checked” flag :slight_smile:

  1. Explicit - TypedError is represented explicitly in the APIs, the same as std::error_code.

  2. Does not require C++ RTTI - The custom RTTI system does not rely on any standard C++ RTTI features.

This scheme also has one attribute that I haven’t seen in previous error handling systems (though my experience in this area is limited): Errors are not copyable, due to ownership semantics of TypedError. I think this actually neatly captures the idea that there is a chain of responsibility for dealing with any given error. Responsibility may be transferred (e.g. by returning it to a caller), but it cannot be duplicated as it doesn’t generally make sense for multiple people to report or attempt to recover from the same error.

I’ve tested this prototype out by threading it through the object-creation APIs of libObject and using custom error classes to report errors in MachO headers. My initial experience is that this has enabled much richer error messages than are possible with std::error_code.

To enable interaction with APIs that still use std::error_code I have added a custom ECError class that wraps a std::error_code, and can be converted back to a std::error_code using the typedErrorToErrorCode function. For now, all custom error code classes should (and do, in the prototype) derive from this utility class. In my experiments, this has made it easy to thread TypedError selectively through parts of the API. Eventually my hope is that TypedError could replace std::error_code for user-facing APIs, at which point custom errors would no longer need to derive from ECError, and ECError could be relegated to a utility for interacting with other codebases that still use std::error_code.

So - I look forward to hearing your thoughts. :slight_smile:

Is your call to catchAllTypedErrors(…) actually like a switch on the type of the error? What about a syntax that looks like a switch?

switchErr(std::move(Err))
.case< MyCustomError>( () { /* … / })
.case< MyOtherCustomError>([] () { /
/ })
.default([] () { /
… */ })

Hi Hal,

However, regarding context, I wonder about the best model. When we designed the diagnostic reporting interface (by which I mean the bits in include/llvm/IR/DiagnosticInfo.h and include/llvm/IR/DiagnosticPrinter.h), the ability to carry context was very important. There, however, because the objects are being passed via a callback to the user-installed handler, they can carry pointers/references to objects (Values, Functions, etc.) that will go away once the object that detected the error is destroyed. In the model you’re proposing, all of the context must be contained within the error object itself (because, by the time the context is useful, an arbitrary amount of the call stack to the error-detection point has already been unwound). This greatly limits the amount of information that can be efficiently stored as context in the error object. Depending on the use cases, it might be better to pass the context to some kind of error-handler callback than to try to pack it all into a long-lived error object. Thoughts?

I think this is one of the trickiest problems when designing an error return for C++. In garbage-collected languages you can attach all sorts of useful things and the reference from the error value will keep them alive. In C++ any non-owning reference or pointer type could be pointing into space by the time you reach the error handler (from the library’s point of view).

I think the best you can do here is document the pitfalls and provide guidelines for designing error types. Kevin and I noted the following two relevant guidelines while discussing exactly this problem:

(1) Errors should not contain non-owning references or pointers. (Preferably, errors should only contain value types)
(2) New error types should only be introduced to model errors than clients could conceivably recover from, or where different clients may want to format the error messages differently. Any error that is only useful for diagnostic purposes should probably use a class along the lines of:

class DiagnosticError … {
std::string Msg;
void log(ostream &os) { os << Msg; }
};

Given the parallels with exceptions, I suspect many of the same design guidelines would apply here.

I’m also not averse to mixing diagnostic streams with my system where they make sense - I think it’s always a matter of choosing the right tool for the job. I just need a solution for the error recovery problem, and diagnostic streams don’t provide one. :slight_smile:

Cheers,
Lang.

Hi Mehdi,

I’m not sure to understand this claim? You are supposed to be able to extend and subclass the type of diagnostics? (I remember doing it for an out-of-tree LLVM-based project).

You can subclass diagnostics, but subclassing (on its own) only lets you change the behaviour of the diagnostic/error itself. What we need, and what this patch supplies, is a way to choose a particular handler based on the type of the error. For that you need RTTI, so this patch introduces a new RTTI scheme that I think is more suitable for errors types*, since unlike LLVM’s existing RTTI system it doesn’t require you to enumerate the types up-front.

  • If this RTTI system is considered generically useful it could be split out into its own utility. It’s slightly higher cost than LLVM’s system: One byte of BSS per type, and a walk from the dynamic type of the error to the root of the type-hierarchy (with possible early exit) for each type check.

Hi Mehdi,

I’m not sure to understand this claim? You are supposed to be able to extend and subclass the type of diagnostics? (I remember doing it for an out-of-tree LLVM-based project).

You can subclass diagnostics, but subclassing (on its own) only lets you change the behaviour of the diagnostic/error itself. What we need, and what this patch supplies, is a way to choose a particular handler based on the type of the error.

If you subclass a diagnostic right now, isn’t the RTTI information available to the handler, which can then achieve the same dispatching / custom handling per type of diagnostic?
(I’m not advocating the diagnostic system, which I found less convenient to use than what you are proposing)

For that you need RTTI, so this patch introduces a new RTTI scheme that I think is more suitable for errors types*, since unlike LLVM’s existing RTTI system it doesn’t require you to enumerate the types up-front.

It looks like I’m missing a piece of something as it is not clear why is this strictly needed. I may have to actually look closer at the code itself.

  • If this RTTI system is considered generically useful it could be split out into its own utility. It’s slightly higher cost than LLVM’s system: One byte of BSS per type, and a walk from the dynamic type of the error to the root of the type-hierarchy (with possible early exit) for each type check.

Sure, this case shows “success” of the handler, now what is a failure of the handler and how is it handled?

Oh I was seeing it as a “first match as well”, just bike shedding the syntax as the function calls with a long flat list of lambdas as argument didn’t seem like the best we can do at the first sight.

Hi Mehdi,

If you subclass a diagnostic right now, isn’t the RTTI information available to the handler, which can then achieve the same dispatching / custom handling per type of diagnostic?

(I’m not advocating the diagnostic system, which I found less convenient to use than what you are proposing)

I have to confess I haven’t looked at the diagnostic classes closely. I’ll take a look and get back to you on this one. :slight_smile:

For that you need RTTI, so this patch introduces a new RTTI scheme that I think is more suitable for errors types*, since unlike LLVM’s existing RTTI system it doesn’t require you to enumerate the types up-front.
It looks like I’m missing a piece of something as it is not clear why is this strictly needed. I may have to actually look closer at the code itself.

TypedError bar() {
TypedError Err = foo;
if (auto E2 =
catchTypedErrors(std::move(Err),
handleTypedError([&](std::unique_ptr M) {
// Deal with ‘M’ somehow.
return TypedError();
}))
return E2;

// Proceed with ‘bar’ if ‘Err’ is handled.
}

The new RTTI system uses something closer to LLVM’s Pass IDs…

I should stress that all of this is totally opaque to clients: all you have to do to define your own error type is extend the TypedErrorInfo template, as below for example, and it will take care of the RTTI for you:

// A minimal new error class:
class MyError : public TypedErrorInfo {
};

// Subclassing ‘MyError’:
class MySubError : public TypedErrorInfo<MySubError, MyError> {
};

  • Lang.

I’ve had some experience dealing with rich error descriptions without exceptions before. The scheme I used was somewhat similar to what you have. Here are some items to consider.

  • How will the following code be avoided? The answer may be compile time error, runtime error, style recommendations, or maybe something else.

TypedError Err = foo();
// no checking in between
Err = foo();

  • How about this?

TypedError Err = foo();
functionWithHorribleSideEffects();
if(Err) return;

  • Do you anticipate giving these kinds of errors to out of tree projects? If so, are there any kind of binary compatibility guarantee?

  • What about errors that should come out of constructors? Or destructors?

  • If a constructor fails and doesn’t establish it’s invariant, what will prevent the use of that invalid object?

  • How many subclasses do you expect to make of TypedError? Less than 10? More than 100?

  • How common is it to want to handle a specific error code in a non-local way? In my experience, I either want a specific error handled locally, or a fail / not-failed from farther away. The answer to this question may influence the number of subclasses you want to make.

  • Are file, line number, and / or call stack information captured? I’ve found file and line number information to be incredibly useful from a productivity standpoint.

I agree the runtime check can catch something additional, it just doesn’t feel to me like the extra complexity has been justified.

Or, at least, I’d have imagined a much simpler and straightforward interface would be fully sufficient. E.g., instead of the all the catch/handle stuff, if you want to handle a particular class specially, how about just using an if?

TypedError err = somethingThatCanFail();
if (err) {
if (err.isClassOf(…)) {

whatever;
else
return err;
}

Hi Mehdi,

If you subclass a diagnostic right now, isn’t the RTTI information available to the handler, which can then achieve the same dispatching / custom handling per type of diagnostic?

(I’m not advocating the diagnostic system, which I found less convenient to use than what you are proposing)

I have to confess I haven’t looked at the diagnostic classes closely. I’ll take a look and get back to you on this one. :slight_smile:

For that you need RTTI, so this patch introduces a new RTTI scheme that I think is more suitable for errors types*, since unlike LLVM’s existing RTTI system it doesn’t require you to enumerate the types up-front.
It looks like I’m missing a piece of something as it is not clear why is this strictly needed. I may have to actually look closer at the code itself.

For a generic error class it is not an option indeed, but I was looking at it in the context of LLVM internal use, so just like our RTTI is not an option for “generic RTTI” but fits our need, we could (not should) do the same with ErrorHandling.

Nice, and since this is on the error path we don’t care if it is not “as fast as” the custom LLVM RTTI.

TypedError bar() {
TypedError Err = foo;
if (auto E2 =
catchTypedErrors(std::move(Err),
handleTypedError([&](std::unique_ptr M) {
// Deal with ‘M’ somehow.
return TypedError();
}))
return E2;

// Proceed with ‘bar’ if ‘Err’ is handled.
}

OK got it now, the “empty” TypedError()is the key :slight_smile:
(and I was using success/failure terminology reversed compare to you)

Hi Lang,

I expect to encounter more issues like this as I continue work on the MachO side of LLD. I see tackling the error modeling problem as a first step towards improving error handling in general: if we make it easy to model errors, it may pave the way for better error handling in many parts of our libraries.

+ 1, I'd like to use this throughout lib/ProfileData. It's a bit frustrating to see crashers which simply state: "Malformed profile data". It'd great to know _where_ the issue actually is.

vedant

Hi Craig,

TypedError Err = foo();
// no checking in between
Err = foo();

This will cause an abort - the assignment operator for TypedError checks that you’re not overwriting an unhanded error.

TypedError Err = foo();
functionWithHorribleSideEffects();
if (Err) return;

This is potentially reasonable code - it’s impossible to distinguish in general from:

TypedError Err = foo();
functionWithPerfectlyReasonableSideEffects();
if (Err) return;

That said, to avoid problems related to this style we can offer style guidelines. Idiomatic usage of the system looks like:

if (auto Err = foo())
return Err;
functionWithHorribleSideEffects();

This is how people tend to write error checks in most of the LLVM code I’ve seen to date.

Do you anticipate giving these kinds of errors to out of tree projects? If so, are there any kind of binary compatibility guarantee?

Out of tree projects can use the TypedError.h header and derive their own error classes. This is all pure C++, I don’t think there are binary compatibility issues.

What about errors that should come out of constructors? Or destructors?

TypedError can’t be “thrown” in the same way that C++ exceptions can. It’s an ordinary C++ value. You can’t return an error from a constructor, but you can pass a reference to an error in and set that. In general the style guideline for a “may-fail” constructors would be to write something like this:

class Foo {
public:

static TypedErrorOr create(int X, int Y) {
TypedError Err;
Foo F(X, Y, Err);
if (Err)
return std::move(Err);
return std::move(F);
}

private:
Foo(int x, int y, TypedError &Err) {
if (x == y) {
Err = make_typed_error();
return;
}
}
};

Then you have:

TypedErrorOr F = Foo::create(X, Y);

The only way to catch failure of a destructor is for the class to hold a reference to a TypedError, and set that. This is extremely difficult to do correctly, but as far is I know all error schemes suffer from poor interaction with destructors. In LLVM failing destructors are very rare, so I don’t anticipate this being a problem in general.

If a constructor fails and doesn’t establish it’s invariant, what will prevent the use of that invalid object?

If the style guideline above is followed the invalid object will never be returned to the user. Care must be taken to ensure that the destructor can destruct the partially constructed object, but that’s always the case.

How many subclasses do you expect to make of TypedError? Less than 10? More than 100?

This is a support library, so it’s not possible to reason about how many external clients will want to use it in their projects, or how many errors they would define. In LLVM I’d like to see us adopt a ‘less-is-more’ approach: New error types should be introduced sparingly, and each new error type should require a rationale for its existence. In particular, distinct error types should only be introduced when it’s reasonable for some client to make a meaningful distinction between them. If an error is only being returned in order to produce a string diagnostic, a generic StringDiagnosticError should suffice.

Answering your question more directly: In the LLVM code I’m familiar with I can see room for more than 10 error types, but fewer than 100.

How common is it to want to handle a specific error code in a non-local way? In my experience, I either want a specific error handled locally, or a fail / not-failed from farther away. The answer to this question may influence the number of subclasses you want to make.

Errors usually get handled locally, or just produce a diagnostic and failure, however there are some cases where we want non-local recovery from specific errors. The archive-walking example I gave earlier is one such case. You’re right on the point about subclasses too - that’s what I was hoping to capture with my comment above: only introduce an error type if it’s meaningful for a client to distinguish it from other errors.

Are file, line number, and / or call stack information captured? I’ve found file and line number information to be incredibly useful from a productivity standpoint.

I think that information is helpful for programmatic errors, but those are better represented by asserts or “report_fatal_error”. This system is intended to support modelling of non-programmatic errors - bad input, resource failures and the like. For those, the specific point in the code where the error was triggered is less useful. If such information is needed, this system makes it easy to break on the failure point in a debugger.

Cheers,
Lang.

Hi James,

The complexity involved in runtime checking is minimal. In terms of source complexity the checking code runs only ~20 lines total (it’s orthogonal to the RTTI system and utilities, which take up the bulk of the code). The runtime overhead would be minimal in debug builds, and non-existent in release if we turn off checking there.

Runtime checking is significantly more powerful too. Take an anti-pattern that I’ve seen a few times:

for (auto &Elem : Collection) {
if (std::error_code Err = foo(Elem))
if (Err == recoverable_error_code) {
// Skip items with ‘recoverable’ failures.
continue;
}
// Do stuff with elem.
}

This is the kind of code I want to stop: The kind where we pay just enough lip service to the error to feel like we’ve “handled” it, so we can get on with what we really want to do. This code will fail at runtime with unpredictable results if Err is anything other than ‘success’ or ‘recoverable_error_code’, but it does inspect the return type, so an attribute won’t generate any warning.

The advantage of catchTypedErrors over an if statement is that it lets you defer errors, which we wanted to be able to do in our archive walking code:

TypedError processArchive(Archive &A) {
TypedError Errs;

for (auto &Obj : A) {
if (auto Err = processObject(Obj))
if (Err.isA()) {
// processObject failed because our object was bad. We want to report
// this to the user, but we also want to walk the rest of the archive
// to collect further diagnostics, or take other meaningful actions.
// For now, just append ‘Err’ to the list of deferred errors.
Errs = join_error(std::move(Errs), std::move(Err));
continue;
} else
return join_error(std::move(Err), std::move(Errs));

// Do more work.
}

return Errs;
}

and now in main, you can have:

catchTypedErrors(processArchive(A),
handleTypedError([&](std::unique_ptr OE) {

})
);

And this one handler will be able to deal with all your deferred object errors.

For clients who know up-front that they’ll never have to deal with compound errors, the if-statement would be fine, but I think it’s better not to assume that.

I want to stress that I appreciate the distaste boilerplate, but as I mentioned in the proposal actually catching errors is the uncommon case, so it’s probably ok if it’s a little bit ugly.

Cheers,
Lang.

Hi Mehdi,

For a generic error class it is not an option indeed, but I was looking at it in the context of LLVM internal use, so just like our RTTI is not an option for “generic RTTI” but fits our need, we could (not should) do the same with ErrorHandling.

Definitely. If this was LLVM only there’d be a strong case for using the existing RTTI system. The reason for the new RTTI system is that it would let us re-use this error class in other LLVM projects (LLD, LLDB, etc) without having to enumerate their errors in LLVM.

Nice, and since this is on the error path we don’t care if it is not “as fast as” the custom LLVM RTTI.

Yep.

OK got it now, the “empty” TypedError()is the key :slight_smile:
(and I was using success/failure terminology reversed compare to you)

Yeah - this is confusing. It’s no worse than ‘std::error_code()’, but it’s no better either. Maybe introducing a utility wrapper like ‘TypedErrorSuccess()’ or even ‘ESuccess()’ would make things more readable.

  • Lang.

Hi Mehdi,

For a generic error class it is not an option indeed, but I was looking at it in the context of LLVM internal use, so just like our RTTI is not an option for “generic RTTI” but fits our need, we could (not should) do the same with ErrorHandling.

Definitely. If this was LLVM only there’d be a strong case for using the existing RTTI system. The reason for the new RTTI system is that it would let us re-use this error class in other LLVM projects (LLD, LLDB, etc) without having to enumerate their errors in LLVM.

I think it is great :slight_smile:

Side question on the related work because I’m curious: did you look for similar generic scheme for error handling offered by other C++ libraries? Maybe the common scheme is to use C++ exceptions but since many folks disable them (hello Google :wink: ) there has been probably many other attempts to address this.

Nice, and since this is on the error path we don’t care if it is not “as fast as” the custom LLVM RTTI.

Yep.

On this topic of not paying the price on the non-error code path, it would be nice to not pay for constructing all the lambda captures when there is no error. I can imagine many way of expressing a “try/catch” like syntax to achieve this using macros, but not really without that.
Have you thought about this?

Thanks,

Mehdi

This is mostly in line with what I thought the answers would be, so +1 from me, at least for the concept. I haven’t peered into the implementation.

Hi Mehdi,

Side question on the related work because I’m curious: did you look for similar generic scheme for error handling offered by other C++ libraries? Maybe the common scheme is to use C++ exceptions but since many folks disable them (hello Google :wink: ) there has been probably many other attempts to address this.

I did look around, but not very hard. Among the people I asked, everyone was either using exceptions, std::error_code, or had turned on the diagnostic that James Knight suggested. I did some preliminary web-searching, but that didn’t turn up anything interesting. If anybody has seen other error handling schemes of note I’d love to hear about them.

On this topic of not paying the price on the non-error code path, it would be nice to not pay for constructing all the lambda captures when there is no error.

If your catchTypedErrors call is under an if-test then the lambdas are in a scope that won’t be entered on the non-error path:

if (auto Err = foo())
if (auto E2 = catchTypedErrors(std::move(Errs), ))
return;

I think (though I haven’t tested this) that most lambdas should inline away to next to nothing, so I don’t expect the overhead to be noticeable in either case.

  • Lang.