Renaming lldb_private::Error

I have a patch locally that renames lldb_private::Error to lldb_private::LLDBError. As you might expect, this is a pretty large patch so I don’t plan to put it up for review, but since it’s kind of a fundamental class I figured I should at least start a discussion.

The primary motivation for this is to enable cleaner interop between lldb’s Error class and llvm::Error. Currently there is an ambiguity between the two. Obviously they are scoped in different namespaces, but it’s still confusing when looking at code to see such similarly named classes.

There are a number of advantages to llvm::Error over lldb Error which I’m happy to elaborate on if anyone needs some clarification (I’ve also cc’ed lang on here, who wrote the original llvm Error implementation).

Long term I would eventually like to deprecate lldb’s Error class entirely, and switch entirely to llvm::Error. An intermediate transition phase would likely involve making something like LLDBWrapError which interoperates with llvm::Error and wraps an lldb_private::LLDBError.

For now though, I’m only proposing a very simple rename with minimal invasiveness.

Comments?

Hi Zachary,

I’m new to LLDB so take my opinion with a grain of salt, but this sounds like a good idea to me. LLDB is likely to encounter more and more LLVM APIs using llvm::Error in the future, so renaming lldb_private::Error to reduce confusion seems sensible.

Replacing lldb_private::Error at some point in the future probably makes sense too. The author of lldb_private::Error seems to have had a similar idea:

/// …In the future we may wish to switch to a
/// registration mechanism where new error types can be registered at
/// runtime instead of a hard coded scheme.

The challenge here will be interoperability with the python APIs, which look like they map the current lldb_private::Error into Python. That will take some thought, but I think it should be possible.

For any LLDB devs who are interested in llvm::Error, the lightning talk that introduced it is at: https://www.youtube.com/watch?v=Wq8fNK98WGw , and the API is covered in more detail in the LLVM programmer’s manual: http://llvm.org/docs/ProgrammersManual.html#recoverable-errors .

Cheers,
Lang.

/// …In the future we may wish to switch to a
/// registration mechanism where new error types can be registered at
/// runtime instead of a hard coded scheme.

I immediately regret my decision to copy-paste from terminal. :confused:

  • Lang.

I think it should be fairly easy to keep interoperability with SBError. SBError just wraps an lldb_private::Error, so as long as we can construct some implementation of ErrorInfoBase from an SBError, we should be good to go. As an extreme example, we could keep LLDBError exactly as is and provide an ErrorInfoBase implementation that wraps it, but keep it around only for compatibility but be deprecated otherwise. Obviously we can do better than this, but the point is it’s not a big obstacle

I'm mostly but not entirely tongue in cheek wondering why we aren't calling llvm::Error llvm::LLVMError, as the lldb error class much preceded it, but that's a minor point.

If it is actually causing confusion (I haven't experienced such yet) I don't mind typing some extra letters.

As we've discussed several times in the past, we often use errors for informational purposes (for instance in the ValueObject system) with no programmatic requirement they be checked. So the llvm::Error class is not a drop-in replacement for our uses of lldb_private::Error in subset of its uses. More generally, the environment the debugger lives in is often pretty dirty, bad connections to devices, uncertain debug information, arguments with clang about what types mean, weird user input, etc. But the job of the debugger is to keep going as well/long as it can in the face of this. For something like a compiler, if some operation goes bad that whole execution is likely rendered moot, and so bagging out early is the right thing to do. For lldb, if the debug info for a frame is all horked up, users can still resort to memory reading and casts, or some other workaround, provided the debugger stays alive. This makes me a little leery of adopting an infrastructure whose default action is to abort on mishandling.

BTW, I don't think the comment Lang cited had to do with replacing the errors with some other error backend. It was more intended to handle a problem that came up with gdb where we tried to multiplex all various system error numbers into one single error. lldb errors have a flavor (eErrorTypePosix, eErrorTypeWin32, etc) which allows you to use each native error number by annotating it with the flavor. At some point we envisioned process plugins, for instance, being dynamically loadable. If you had lldb working that way, when you dynamically loaded a process plugin for a system with a different error numbering scheme, you would also have to dynamically load the new error flavor.

Jim

I’m mostly but not entirely tongue in cheek wondering why we aren’t calling llvm::Error llvm::LLVMError, as the lldb error class much preceded it, but that’s a minor point.

FWIW I think the naming chosen by LLVM is correct. It’s intended to be a generic utility class, extensible enough to be used by anything that links against LLVM. As such, calling it LLVMError kind of gives off the false impression that it should only be used by errors that originate from LLVM, when in fact it’s much more general purpose.

If it is actually causing confusion (I haven’t experienced such yet) I don’t mind typing some extra letters.

I think that’s in part because llvm::Error isn’t very prevalent inside of LLDB (yet).

As we’ve discussed several times in the past, we often use errors for informational purposes (for instance in the ValueObject system) with no programmatic requirement they be checked. So the llvm::Error class is not a drop-in replacement for our uses of lldb_private::Error in subset of its uses. More generally, the environment the debugger lives in is often pretty dirty, bad connections to devices, uncertain debug information, arguments with clang about what types mean, weird user input, etc. But the job of the debugger is to keep going as well/long as it can in the face of this. For something like a compiler, if some operation goes bad that whole execution is likely rendered moot, and so bagging out early is the right thing to do. For lldb, if the debug info for a frame is all horked up, users can still resort to memory reading and casts, or some other workaround, provided the debugger stays alive. This makes me a little leery of adopting an infrastructure whose default action is to abort on mishandling.

Just re-iterating from previous discussions, but it only does that in debug mode. When you have a release build, it will happily continue on without aborting. The point of all this is that you catch unhandled errors immediately the first time you run the test suite.

Even if you have a bad connection, uncertain debug information, etc you still have to propagate that up the callstack some number of levels until someone knows what to do. All this class does is make sure (when in debug mode) that you’re doing that instead of silently ignoring some condition.

That said, it certainly seems plausible that we could come up with some kind of abstraction for informational status messages. With that in mind, I’ll change my original renaming proposal from LLDBError to Status. This way we will have llvm::Error and lldb_private::Status.

In the future, perhaps we can discuss with Lang and the larger community about whether such a class makes in LLVM as well. Maybe there’s a way to get both checked and unchecked errors into LLVM using a single consistent interface. But at least then the person who generates the error is responsible for deciding how important it is.

BTW, I don’t think the comment Lang cited had to do with replacing the errors with some other error backend. It was more intended to handle a problem that came up with gdb where we tried to multiplex all various system error numbers into one single error. lldb errors have a flavor (eErrorTypePosix, eErrorTypeWin32, etc) which allows you to use each native error number by annotating it with the flavor.

FWIW, using the llvm::Error model, the way this is handled is by doing something like this:

return make_error(::GetLastError());

return make_error(errno);

but it is general enough to handle completely different categories of errors as well, so you can “namespace” out your command interpreter errors, debug info errors, etc.

return make_error(“Incorrect command usage”);

return make_error(“Invalid DIE specification”);

etc

I'm mostly but not entirely tongue in cheek wondering why we aren't calling llvm::Error llvm::LLVMError, as the lldb error class much preceded it, but that's a minor point.
FWIW I think the naming chosen by LLVM is correct. It's intended to be a generic utility class, extensible enough to be used by anything that links against LLVM. As such, calling it LLVMError kind of gives off the false impression that it should only be used by errors that originate from LLVM, when in fact it's much more general purpose.

If it is actually causing confusion (I haven't experienced such yet) I don't mind typing some extra letters.
I think that's in part because llvm::Error isn't very prevalent inside of LLDB (yet).

As we've discussed several times in the past, we often use errors for informational purposes (for instance in the ValueObject system) with no programmatic requirement they be checked. So the llvm::Error class is not a drop-in replacement for our uses of lldb_private::Error in subset of its uses. More generally, the environment the debugger lives in is often pretty dirty, bad connections to devices, uncertain debug information, arguments with clang about what types mean, weird user input, etc. But the job of the debugger is to keep going as well/long as it can in the face of this. For something like a compiler, if some operation goes bad that whole execution is likely rendered moot, and so bagging out early is the right thing to do. For lldb, if the debug info for a frame is all horked up, users can still resort to memory reading and casts, or some other workaround, provided the debugger stays alive. This makes me a little leery of adopting an infrastructure whose default action is to abort on mishandling.
Just re-iterating from previous discussions, but it only does that in debug mode. When you have a release build, it will happily continue on without aborting. The point of all this is that you catch unhandled errors immediately the first time you run the test suite.

Yup, we do that for assertions. But asserting isn't appropriate even in the testsuite for cases where we don't require the errors be checked.

Even if you have a bad connection, uncertain debug information, etc you still have to propagate that up the callstack some number of levels until someone knows what to do. All this class does is make sure (when in debug mode) that you're doing that instead of silently ignoring some condition.

That said, it certainly seems plausible that we could come up with some kind of abstraction for informational status messages. With that in mind, I'll change my original renaming proposal from LLDBError to Status. This way we will have llvm::Error and lldb_private::Status.

That seems better.

In the future, perhaps we can discuss with Lang and the larger community about whether such a class makes in LLVM as well. Maybe there's a way to get both checked and unchecked errors into LLVM using a single consistent interface. But at least then the person who generates the error is responsible for deciding how important it is.

It's not "how important it is" but "does this error need to be dealt with programmatically proximate to the code that produces it." For instance, if an error makes it to the SB API level - something that is quite appropriate for the SBValues for instance, we wouldn't want to use an llvm::Error. After all forcing everybody to check this at the Python layer would be really annoying. I guess you could work around this by hand-checking off any error when you go from lldb_private -> SBError. But that seems like now you're just pretending to be doing something you aren't, which I don't think is helpful.

Probably better as you say to make everything into lldb_private::Status behaving as it does now, to side-step the name collision, and then start with all the uses where the error doesn't propagate very far, and try converting those to use llvm::Error and working judiciously out from there. 'Course you can't change the SB API names, so there will always be a little twist there.

Jim

BTW, I'm interested to know if you have some analysis which leads you to think that propagating unchecked errors actually is a big problem in lldb, or are you just doing this to remove a spelling collision? I see a lot of bugs for lldb come by, but I really haven't seen many that this sort of forced checking would have fixed.

Jim

The rename is just to avoid the spelling collision. Nothing in particular leads me to believe that unchecked errors are a source of major bugs.

That said, I have some short term plans to begin making use of some llvm library classes which deal in llvm::Error, and doing this first should make those changes less confusing. Additionally I’d like to be able to start writing new LLDB code against llvm::Error where appropriate, so it would be nice if this collision weren’t present.

BTW, I’m curious why you think asserting is still bad even in the test suite when errors don’t need to be checked. I think of it as something like this:

void foo(int X) {
return;
}

And your compiler giving you a warning that you’ve got an unused parameter. So to silence it, you write:

void foo(int X) {
(void)X;
}

The point here being, it’s only the function foo() that knows whether the parameter is needed. Just like if you write:

Error E = foo();

the function foo() cannot possibly know whether the error needs to be checked, because it depends on the context in which foo() is called. One caller might care about the error, while the other doesn’t. So foo() should assume that the caller will check the error (otherwise why even bother returning one if it’s just going to be ignored), and the caller can explicitly opt out of this behavior by writing:
consumeError(foo());

which suppresses the assertion.

So yes, the error has to be “checked”, but you can “check” it by explicitly ignoring it at a particular callsite.

If it is actually causing confusion (I haven't experienced such yet) I
don't mind typing some extra letters.

I think that's in part because llvm::Error isn't very prevalent inside of
LLDB (yet).

I actually tried to use llvm:Error a couple of times, but I had given
up on it each time because it make all files which do "using namespace
llvm && lldb_private;" not compile because of the ambiguity, so I
welcome the rename wholeheartedly.

For me the biggest advantage of llvm::Error is the integration with
Expected<T>, avoiding the "do I make the Error the return value (and
return the *real* result by reference), or the other way around",
dilemma.

As we've discussed several times in the past, we often use errors for
informational purposes (for instance in the ValueObject system) with no
programmatic requirement they be checked. So the llvm::Error class is not a
drop-in replacement for our uses of lldb_private::Error in subset of its
uses. More generally, the environment the debugger lives in is often pretty
dirty, bad connections to devices, uncertain debug information, arguments
with clang about what types mean, weird user input, etc. But the job of the
debugger is to keep going as well/long as it can in the face of this. For
something like a compiler, if some operation goes bad that whole execution
is likely rendered moot, and so bagging out early is the right thing to do.
For lldb, if the debug info for a frame is all horked up, users can still
resort to memory reading and casts, or some other workaround, provided the
debugger stays alive. This makes me a little leery of adopting an
infrastructure whose default action is to abort on mishandling.

Just re-iterating from previous discussions, but it only does that in debug
mode. When you have a release build, it will happily continue on without
aborting. The point of all this is that you catch unhandled errors
immediately the first time you run the test suite.

Even if you have a bad connection, uncertain debug information, etc you
still have to propagate that up the callstack some number of levels until
someone knows what to do. All this class does is make sure (when in debug
mode) that you're doing that instead of silently ignoring some condition.

That said, it certainly seems plausible that we could come up with some kind
of abstraction for informational status messages. With that in mind, I'll
change my original renaming proposal from LLDBError to Status. This way we
will have llvm::Error and lldb_private::Status.

I think Status is better than LLDBError, although the fact that we
will then have gazillion of declarations "Status error;" almost
counterbalances that.

The rename is just to avoid the spelling collision. Nothing in particular leads me to believe that unchecked errors are a source of major bugs.

That said, I have some short term plans to begin making use of some llvm library classes which deal in llvm::Error, and doing this first should make those changes less confusing. Additionally I'd like to be able to start writing new LLDB code against llvm::Error where appropriate, so it would be nice if this collision weren't present.

BTW, I'm curious why you think asserting is still bad even in the test suite when errors don't need to be checked.

I think I was making a more limited statement that you took it to be.

Errors that should be checked locally because you know locally that it is fatal not to check them should always be checked - testsuite or no. But a lot of lldb's surface area goes out to the SB API's, and we don't control the callers of those. All the errors of that sort can't be checked before they pass the boundary (and are more appropriate as Status's instead.) The failure to check those errors shouldn't propagate to the SB API's or we are just making an annoying API set... So test suite asserting for this class of errors would not be appropriate.

But a larger point about asserting as a result of errors is that it makes it seem to the lldb developer like once you've raised an assert on error your job is done. You've stopped the error from propagating, two points! For the debugger, you should really be thinking "oh, that didn't go right, how can I back out of that so I can leave the debug session alive." There's nothing about force checked errors for code you can reason on locally that enforces one way of resolving errors or the other. But IME it does favor the "bag out early" model.

Jim

I think we agree about the SB layer. You can’t have mandatory checking on things that go through the SB API. I think we could code around that though.

Regarding the other point, I actually think force checked errors help you think about how to back out and leave the debug session alive. Specifically because they force you think think about it at all. With unchecked errors, a caller might forget that a function even returns an error (or Status) at all, and so they may just call a function and proceed on assuming it worked as expected. With a checked error, this would never happen because the first time you called that function in a test, regardless of whether it passed or failed, you would get an assertion saying you forgot to check the error. Then you can go back and think about what the most appropriate thing to do is in that situation, and if the appropriate thing to do is ignore it and continue, then you can do that.

Most of these error conditions are things that rarely happen (obviously), and it’s hard to get test coverage to make sure the debugger does the right thing when it does happen. Checked errors is at least a way to help you identify all the places in your code that you may have overlooked a possible failure condition. And you can always just explicitly ignore it.

I think we agree about the SB layer. You can't have mandatory checking on things that go through the SB API. I think we could code around that though.

Regarding the other point, I actually think force checked errors *help* you think about how to back out and leave the debug session alive. Specifically because they force you think think about it at all. With unchecked errors, a caller might forget that a function even returns an error (or Status) at all, and so they may just call a function and proceed on assuming it worked as expected. With a checked error, this would never happen because the first time you called that function in a test, regardless of whether it passed or failed, you would get an assertion saying you forgot to check the error. Then you can go back and think about what the most appropriate thing to do is in that situation, and if the appropriate thing to do is ignore it and continue, then you can do that.

Most of these error conditions are things that rarely happen (obviously), and it's hard to get test coverage to make sure the debugger does the right thing when it does happen. Checked errors is at least a way to help you identify all the places in your code that you may have overlooked a possible failure condition. And you can always just explicitly ignore it.

Sure, it is the policy not the tool to enforce it that really matters. But for instance lldb supports many debug sessions in one process (a mode it gets used in all the time under Xcode) and no matter how bad things go in one debug session, none of the other debug sessions care about that. So unless you know you're about to corrupt memory in some horrible and unavoidable way, no action in lldb should take down the whole lldb session. Playing with tools that do just that - and automatically too - means you've equipped yourself with something you are going to have to be very careful handling.

Jim

Does Xcode ship with a build of LLDB that has asserts enabled? Because if not it shouldn’t affect anything. If so, can you shed some light on why?

Does Xcode ship with a build of LLDB that has asserts enabled? Because if not it shouldn't affect anything. If so, can you shed some light on why?

Not sure that's entirely relevant. The point is not to make failure points assert then turn them off in production because they shouldn't assert. The point is to make sure that instead of asserting you always do the best job you can backing out from any error.

Jim

I’m confused. By having the library assert, you are informed of places where you didn’t do a good job of backing from errors, thereby allowing you to do a better job.

You said this earlier:

But a larger point about asserting as a result of errors is that it makes it seem to the lldb developer like once you’ve raised an assert on error your job is done. You’ve stopped the error from propagating, two points!

But when you’re using llvm::Error, no developer is actively thinking about asserts. Nobody is thinking "well the library is going to assert, so my job is done here " because that doesn’t make any sense. !!!It’s going to assert even if the operation was successful!!!

Your job can’t possibly be done because if you don’t check the error, you will assert 100% of the time you execute that codepath. You might as well have just written int x = *nullptr; Surely nobody could agree that their job is done after writing int x = *nullptr; in their code.

If you write this:

Error foo(int &x) {
x = 42;
return Error::success();
}

void bar() {
int x;
foo(x);
cout << x;
}

Then this code is going to assert. It doesn’t matter that no error actually occurred. That is why I’m saying it is strictly a win, no matter what, in all situations. If you forget to check an error code, you necessarily aren’t doing the best possible job backing out of the code in case an error does occur. Now you will find it and be able to fix it.

I'm confused. By having the library assert, you are informed of places where you didn't do a good job of backing from errors, thereby allowing you to do a *better* job.

You said this earlier:

> But a larger point about asserting as a result of errors is that it makes it seem to the lldb developer like once you've raised an assert on error your job is done. You've stopped the error from propagating, two points!

But when you're using llvm::Error, no developer is actively thinking about asserts. Nobody is thinking "well the library is going to assert, so my job is done here " because that doesn't make any sense. !!!!It's going to assert even if the operation was successful!!!!

Your job can't possibly be done because if you don't check the error, you will assert 100% of the time you execute that codepath. You might as well have just written int x = *nullptr; Surely nobody could agree that their job is done after writing int x = *nullptr; in their code.

If you write this:

Error foo(int &x) {
  x = 42;
  return Error::success();
}

void bar() {
  int x;
  foo(x);
  cout << x;
}

Then this code is going to assert. It doesn't matter that no error actually occurred. That is why I'm saying it is strictly a win, no matter what, in all situations. If you forget to check an error code, you *necessarily* aren't doing the best possible job backing out of the code in case an error does occur. Now you will find it and be able to fix it.

Yeah, Lang was just explaining this. I think I was over-reacting to the asserts part because llvm's aggressive use of early failure was a real problem for lldb. So my hackles go up when something like it comes up again.

In practical terms, lldb quite often uses another measure than the error to decide how it's going to proceed. I ask for some symbols, and I get some, but at the same time, one of 10 object files had some bad DWARF so an error was produced. I'll pass that error along for informational purposes, but I don't really care, I'm still going to set breakpoints on all the symbols I found. Lang said it is possible to gang something like the "number of symbols" and the error, so that checking the number of symbols automatically ticks the error box as well. If eventually ever comes we'll have to deal with this sort of complication.

As for Error -> Status to avoid confusion, that seems fine, though if you are going to do it, I agree with Pavel it would be gross to have "Status error;" all over the place.

Jim

Yea, grouping the error and the result together is one of the most compelling features of it. It’s called Expected, so where we would currently write something like:

int getNumberOfSymbols(Error &Err) {}

or

Error getNumberOfSymbols(int &Count) {}

You would now write:

Expected getNumberOfSymbols() {
if (foo) return 1;
else return make_error(“No symbols!”);

}

and on the caller side you write:

Error processAllSymbols() {
if (auto Syms = getNumberOfSymbols()) {
outs() << “There are " << *Syms << " symbols!”;
} else {
return Syms.takeError();
// alternatively, you could write:
// consumeError(Syms.takeError());
// return Error::success();
}
}

Yea, grouping the error and the result together is one of the most compelling features of it. It's called Expected<T>, so where we would currently write something like:

int getNumberOfSymbols(Error &Err) {}

or

Error getNumberOfSymbols(int &Count) {}

You would now write:

Expected<int> getNumberOfSymbols() {
   if (foo) return 1;
   else return make_error<DWARFError>("No symbols!");
}

and on the caller side you write:

Error processAllSymbols() {
  if (auto Syms = getNumberOfSymbols()) {
    outs() << "There are " << *Syms << " symbols!";
  } else {
    return Syms.takeError();
    // alternatively, you could write:
    // consumeError(Syms.takeError());
    // return Error::success();
  }
}
  
Interesting.

This pattern doesn't quite work for fetching symbols - maybe that really is more suitable as a Status than an Error. After all, number of symbols == 0 is not necessarily an error, there just might not have been any symbols (e.g. a fully stripped main); and I'm going to work on whatever symbols I get back, since there's nothing I can do about the ones that didn't make it. I just want to propagate the error so the user knows that there was a problem.

Jim

For instances where we know that we just want to log errors and continue with some sensible default, we could introduce a new utility template along the lines of:

template
T logUnwrap(Expected Result, T DefaultValue = T()) {
if (Result)
return std::move(*Result);
else {
log(Result.takeError());
return DefaultValue;
}
}

Now we can write:

T Result = logUnwrap(foo(…));

and have a strong guarantee that errors will be logged, and that we know the value of Result if an error occurs.

Cheers,
Lang.