Summary
We propose adding a new Clang builtin, __builtin_verbose_trap(string-literal) , that makes the program stop execution abnormally and (potentially) shows a human-readable description of the reason for the termination when a debugger is attached or in a symbolicated crash log. The description message is stored in the debug info and has no impact on the binary size.
Motivation
For a tool that detects runtime bugs that can lead to a security vulnerability (e.g. an out-of-bounds access), it is desirable to stop execution as fast as possible when a runtime issue is detected. This makes using __builtin_trap a natural choice; however, this builtin doesn’t provide the best user experience because it does not contain any details about what exactly went wrong. Adding the ability to attach a descriptive message to a trap would significantly simplify debugging a crashing executable; this new builtin would be useful for the ongoing libc++ hardening and the C++ buffer hardening efforts.
Implementation
The challenge is encoding the message with minimal size overhead. From this perspective, storing the string literal inside the executable is suboptimal — that would increase the size of the binary with strings that would never actually be referenced at the runtime. Instead, we plan to store the messages in the debug info.
Specifically, the idea is to encode the message as the name of an artificial “synthesized” function; a call to __builtin_verbose_trap will be lowered to llvm.trap in the resulting LLVM IR, but the compiler will additionally emit debug metadata that represents an artificial inline frame whose name encodes the given string literal, prefixed by a “magic” prefix . A similar technique is being used by the Swift standard library for traps on, e.g., arithmetic overflow, as well as the -fbounds-safety effort; we would need to make sure that the infrastructure that adds the debug info supports different kinds of traps.
The string literal would have to be known at the compile time (we don’t see this as a significant limitation). Because the name of the artificial function only exists in the debug info, it is not limited to the character set of a valid C identifier and can use any UTF-8 characters.
To give an example, consider the following code:
void foo(int* p) {
if (p == nullptr)
__builtin_verbose_trap("Argument must not be null");
}
The debug metadata would look as if it were produced for the following code:
__attribute__((always_inline))
inline void __llvm_verbose_trap_Argument_must_not_be_null() {
__builtin_trap();
}
void foo(int* p) {
if (p == nullptr)
__llvm_verbose_trap_Argument_must_not_be_null();
}
(However, like mentioned above, the LLVM IR would not actually contain a call to the artificial function — it only exists in the debug metadata)
On the LLDB side, we would add logic to translate the name of the artificial inline frame into a human-readable stop reason. When a program traps, LLDB would recognize that the trap instruction was called from a specially-prefixed frame, parse the name of the function and present the resulting message as the stop reason.
As the name of the function is stored in the debug info, this would have no impact on the size of the binary. If the debug info is not available, calling __builtin_verbose_trap results in the same user experience as the existing __builtin_trap .
By default, the compiler is free to merge the traps, even if they have different messages. This is good for code size, but it can make debugging which exact source location corresponds to a crash more difficult. At least in the first iteration, we intend to let the compiler merge traps, but we are open to exploring ways of improving the debugging experience in the future.
For the quality of implementation, we would need to make sure the proposed mechanism interacts well with diagnostics that display source locations (e.g. optimization remarks).