RFC: Error handling in release builds aka "can I use lldbassert or not?"

The Motivating Case

On a recent change of mine it was pointed out that asserts like this
(https://github.com/llvm/llvm-project/blob/24060db3a93297ca77abef8feae301af2e9163b5/lldb/source/Plugins/Process/elf-core/RegisterContextPOSIXCore_arm64.cpp#L290)
were potentially dangerous.

This assert basically checks that our register information is not corrupted
in such a way that we would read (or in the case of a live process, write)
outside of the buffer we have the register stored in.

If this happened with assert, an asserts build would fail at that point.
However a release build would simply carry on, which would return bogus results
or in the case of a write, make the situation worse.

Initally I proposed a new macro LLDB_ASSERT_OR.

#ifdef NDEBUG
#define LLDB_ASSERT_OR_RETURN(expr, retval)                                    \
  if (!(expr))                                                                 \
    return retval;
#else
#define LLDB_ASSERT_OR_RETURN(expr, retval) assert(expr);
#endif

If we have asserts on, use the assert. If not, check the condition and if it does
not hold, return some value. In the case of ReadRegister, a Status, elsewhere it
might be false, nullptr, etc.

However, some did argue that given that we know state is corrupted, we shouldn’t
continue at all even in release mode.

Which led me to llvm::report_fatal_error(), but then, looking at our guidelines
(Contributing - 🐛 LLDB)
we have:

Assertions. Assertions (from assert.h) should be used liberally to assert internal consistency.

So in some sense, assert for the mentioned cases is fine because register information
isn’t a user input (not usually at least) and should be consistent.

Also:

Fatal errors. Aborting LLDB’s process using llvm::report_fatal_error() or abort() should be avoided at all costs. It’s acceptable to use llvm_unreachable() for actually unreachable code such as the default in an otherwise exhaustive switch statement.

On the general theme of don’t crash the debugger when you could continue and still be
of some use.

I could argue that if you can’t read a register, the debugger is probably not much use
though.

The guidelines for lldbassert say:

New code should not be using lldbassert() and existing uses should be replaced by other means of error handling.

This was added in ⚙ D59911 Don't abort() in lldb_assert and document why. where there was some previous discussion.

Previous Discussion

Some points from there (some editorialisation when abbreviating):

  • Pavel is annoyed that he has to write lldbassert of a condition, then immediately follow
    it with if that same condition. And wonders if lldbassert can be made to combine the two.
  • Returning error/result types is better if it can be done. Especially with anything
    that’s user input.
  • Having a way to assert in development builds but log a message on release builds is useful.
    The example given is a new DWARF item, this way a user will at least get some hint that
    their version of lldb doesn’t support this.
  • Overusing that would cause a lot of noise, Pavel suggests a construct that reports only once to address this.
  • lldbassert is for things we can recover from.
  • Adrian points out that even when lldbassert prints the message, it’s not in a place you
    can easily see it if it’s behind an IDE.

My Current Understanding

Half baked pseudocode ahead. I mostly align with the existing docs except that lldbassert is still hazy for me.

Result Types

Use these when parsing user input, opening files, etc.

llvm::Expected<SomeDWARFAttribute> GetSomeDwarfAttribute(...) {
   return llvm error "DWARF DIE should have had SomeDwarfAttribute but nothing was found"

Assertions

Used for internal state, things that lldb makes and constructs all on its own so they should
be correct.

void DoThingWithTarget(TargetSP target) {
	assert(target && "Oops, this is supposed to be non-null but couldn't be a ref for some reason")

I would put the aforementioned asserts in core file here too. The things they assert are 90% built into lldb, they are checking that lldb’s logic is sound (that 10% is things like variable sized notes).

I suppose one could craft a strange corefile and trip these asserts, but I don’t think that proves much. Said corefile wouldn’t be produced by any standard system.

Logging

Non fatal issues that are easily recovered from and do not result in an unusable debugger.

/* ... Parse some fields from XML register information ... */
if (!some_required_field)
  log("Dropping XML entity foo because some_required_field was missing");
/* ... carry on procesing the rest of the XML info ... */

I do this in the register field parsing. Since it is an extra bonus feature it’s fine to drop it.

Soft Assertions

Some situation that should be obvious to a developer using an asserts build but will only
log in release builds. Like the example from Phab about some new DWARF type being added.

	default:
		lldbassert("Found type unknown type value %d", type);
		return {};

Where this new type being present wouldn’t cause everything else to break. For example
if we expected to see things in a very specific order, then we might use an assert
or Expected to raise a “proper” error to the user.

This situation seems like the most vague and case by case to me.

Questions

I’d like to put a few code/pseudocode examples in the docs eventually, feel free to use some to back up your comments.

I know we are all busy and asserts aren’t the top of anyone’s TODO list but at minimum I can record the conclusions into a tracking issue that can give context to the comments we already have in the documentation.

@JDevlieghere you mentioned there were conversations at the dev meeting around this topic.

Adrian suggests (I think, correct me if I misinterpreted) on my PR that the assert message could go in a Status just as well. So one route is to convert assert messages to Status messages, and convert interfaces that only return true/false into Status to do the same there.

Thanks for putting up this RFC! The current wording in the documentation is the result of discussion in a code review. While that’s fine, this has historically been a somewhat contentious topic and therefore I think it’s important to have a broader discussion and agreement through the RFC process.

My mental model of an assert is a way to convey a condition that must hold. External input is not something the developer has control over (unless the input was checked, in which case an assert is appropriate). Internal consistency (e.g. I have checked a pointer previously) is something the developer has control over and therefore fair game for an assert. In other words, barring any bugs, all assert must always hold.

In that context LLDB_ASSERT_OR_RETURN sends a conflicting message: this statement must hold, but it might not. I think what you’re trying to convey here is that this statement should hold, but it might not and if it doesn’t, it’s bad enough that it warrants taking down the debugger (in an asserts build). I’m not trying to be pedantic here but instead highlight the difference in expectation between the two.

Coming back to some of the things you said in this post:

To me that sounds like user input (maybe better called external input) and therefore shouldn’t be an assert. I also would caution against making assumptions about producers. I’ve seen some really broken input but never has that warranted crashing the debugger.

These have the same problem I outlined above and therefore I think the consensus is/was that we should get rid of them. This might be a variant of what I described: a situation where we think something is bad enough that it warrants bugging the user to file a bug report (and crash in a debug build). I think there’s value in that, but it shouldn’t have assert in the name. The latter is what I was arguing for with @labath at the dev meeting.

We can address this issue by using Debugger::ReportError() instead of printing to stderr.

Suggestion

Here’s my suggestion:

  • We get rid of lldbassert and convert the remaining ones to plain asserts.
  • We introduce a new concept (name TBD) that you can call to crash the debugger in a debug build and potentially prompt the user to file a bug in release mode (similar to what lldbassert does today, but minus the assert part and using ReportError):
#ifdef NDEBUG
#define LLDB_FAULT(msg) 
#else
#define LLDB_FAULT(msg) llvm::report_fatal_error(msg)
#endif

I have a very strong opinion on this topic. The debugger should never assert when reporting an error is sufficient. The only time the debugger should assert (to a user, crash) is when it’s going to crash anyway, and it’s basically dead in the water.

The example from Phab that @DavidSpickett posted:
default:
lldbassert(“Found type unknown type value %d”, type);
return {};
Should not be an assert. If I load a new image with new DWARF info that lldb doesn’t handle, it shouldn’t crash. It should try to keep going as best it can. Print out an error message. If I’m working on lldb, I can grep for the message, and run lldb in a debugger to see why it hit that spot. If I’m a user, the debugger should NEVER crash.

In the case of the assert in RegisterContextPOSIXCore_arm64.cpp, report an error and fail. I might have another target open in lldb, or another core loaded, and my session just got destroyed.

I’m speaking as an embedded developer who’s customers debug multiple heterogeneous cores, running Android, Linux, QNX and multiple RTOSes. If I’m debugging AArch64 RedHat and multiple Hexagon QuRT instances, and the AArch64 session asserts and takes down my entire debug session, I’m going to be mad.

Just say no to asserts! :smiley:

2 Likes

I hadn’t considered that we as developers have many ways to find the location of a problem or failure. Which means that adding a crash on top of those methods isn’t such an important requirement.

Sure, if this was a security assessment I would say the same thing but for whatever reason decided not to apply the same logic here.

So I agree, even if every time you tried to use the core file lldb just spewed warnings about how broken it was - at least lldb is still running so it can do that reporting.

(flashbacks to Windows apps that run in cmd and print things but you only see it for a fraction of a second)

So the asserts in corefile handling would be a perfect candidate for using Status messages instead (and converting bool returns to Status).

I agree with Ted here. The debugger should never assume that the world is in a good state. Core-file writers can get interrupted mid-stream and produce bogus core files. The person working on a new loader can have made a mistake that causes the load information in a target program to be wrong in some way and they would really appreciate the debugger not crashing. Ditto for the person working on the DWARF writer… Memory corruption means you can’t guarantee anything in the stack or heap (and in TEXT if the program is compromised) is in a good state.

Asserts are useful to say “I am not in a good place to handle checking this, so my callers have to have”. But you should never assert because something in the target looks odd, because that’s when you need the debugger the most.

Jim

On Nov 7, 2023, at 5:01 PM, Ted Woodward via LLVM Discussion Forums notifications@llvm.discoursemail.com wrote:

tedwoodward
November 8

I have a very strong opinion on this topic. The debugger should never assert when reporting an error is sufficient. The only time the debugger should assert (to a user, crash) is when it’s going to crash anyway, and it’s basically dead in the water.

The example from Phab that @DavidSpickett posted:
default:
lldbassert(“Found type unknown type value %d”, type);
return {};
Should not be an assert. If I load a new image with new DWARF info that lldb doesn’t handle, it shouldn’t crash. It should try to keep going as best it can. Print out an error message. If I’m working on lldb, I can grep for the message, and run lldb in a debugger to see why it hit that spot. If I’m a user, the debugger should NEVER crash.

In the case of the assert in RegisterContextPOSIXCore_arm64.cpp, report an error and fail. I might have another target open in lldb, or another core loaded, and my session just got destroyed.

I’m speaking as an embedded developer who’s customers debug multiple heterogeneous cores, running Android, Linux, QNX and multiple RTOSes. If I’m debugging AArch64 RedHat and multiple Hexagon QuRT instances, and the AArch64 session asserts and takes down my entire debug session, I’m going to be mad.

Just say no to asserts! :smiley:


Visit Topic or reply to this email to respond.

To unsubscribe from these emails, click here.

Apparently I already did that in [lldb] Print lldbassert to debugger diagnostics · llvm/llvm-project@83a6a0a · GitHub

I’m not an lldb user, but I use other libraries from the llvm ecosystem and share similar pain points.
I believe we need a wider audience and reach a conclusion with common guidance across the whole project.

In the end, the whole llvm-poject is advertised to be made of libraries [1,2], and libraries might be used in contexts where a crash might not be a desired or acceptable error reporting, let alone have UB due to the lack of having asserts in released products guarding e.g. container accesses etc.
One such use case is tooling, and I’m in particular involved with the Clang frontend component. However, even the llvm-project offers products such as clangd where I think these thoughts should also directly apply.

The liberal use of asserts [3] and the lack of handling the case when the assert is gone (like in release build), we often face UB. Downstream, we have a growing set of patches working around cases like that. For example, in Clang Sema, or in the parser language spec invariants are sometimes assumed to hold, which is fine for well-formed code, but falls short after parsing invalid code, and having recovery expression nodes. But the sentiment remains true: we might have invalid input, but it’s still undesired to just crash. The line between what is an invariant that should be always true and an assumption derived from some user input is blurry, but it’s frequently possible to handle the situation gracefully.
One example of handling it is by checking the condition, and have an assert but also having a return/continuation path basically limiting the damage. This approach worked for us, but in the end, usually, the code needs to have this in mind when designed, and hard to retrofit.

I believe, the elaborate use of asserts, as advertised in [3], and similarly unjustified llvm unreachables [4] cause significant pain for vendors building on top of the llvm-project. And I think we, as a project, should be aligned when should we use assert, and when do more sophisticated error detection/handling.


[1] GitHub project About: “The LLVM Project is a collection of modular and reusable compiler and toolchain technologies.”
[2] llvm.org “[…] are great examples of the sort of tools that can be built using the Clang frontend as a library to parse C/C++ code.”
[3] CodingStandards.html#assert-liberally
[4] Discrourse: llvm_unreachable is widely misused