In this RFC we propose rewriting the builtin libc++ LLDB data formatters in Python and moving them into the libc++ subproject.
Issues with builtin LLDB formatters
-
libc++ layouts/field names change frequently which cause the LLDB data-formatters to break. We made efforts to run formatters on libc++ CI, but there are gaps:
- Only the bootstrapping libc++ runners run the formatter tests (i.e., the ones that build Clang/LLDB). This means if we wanted to test LLDB formatters against non-default libc++ configurations (e.g., hardened libc++), we’d have to set up a new boostrapping builder with the changed libc++ configuration. That’s resource/maintenance burden that the libc++ maintainers don’t want to carry.
-
When libc++ developers see that their pre-merge CI fail because of a formatter test, they have to reach out to LLDB developers to fix the formatters or try to build/fix LLDB themselves. Both are developer friction.
-
The main difficulty in maintaining/developing data-formatters is familiarity with the data types being formatted. Putting the burden on LLDB developers increases the likelihood of outdated and imprecise formatters.
-
If the libc++ community gets more familiar with writing formatters, new formatters could land around the same time that the corresponding libc++ type does. This would reduce bug reports to LLDB about missing formatter support for long-released data types.
-
Once an LLDB release is shipped, the formatters cannot be fixed until the next release. We try hard not to break the formatters but there are still occasionally issues in the field. At that point the workaround boils down to upgrading/downgrading lldb or libc++, neither of which is ideal.
Goals/Benefits
-
libc++ community maintain formatters instead of LLDB devs
-
lower barrier to contribution since Python-based formatters tend to have much fewer boilerplate/foot-guns
-
having the formatters written against just the public SBAPI lets us dog-food the public APIs
-
formatters get tested against all libc++ configurations (since they could be run from the libc++ test-suite on all build bots)
-
LLDB can drop the requirement of top-of-tree libc++ from the API test-suite (this was ever only enforced on Darwin and has been getting increasingly difficult to keep up as libc++/libc++abi requirements evolve on macOS)
-
Fixing formatters could be done by patching the Python script on the user’s machine.
Prior art for formatter distribution
-
libstdc++ GDB formatters
-
On Linux these get installed into
/usr/share/gdb(or similar, depending on distribution) -
GDB can be configured by packagers to auto-load from certain directories (using the
--with-auto-load-dirGDB build option). Additionally, the GDB build option--with-auto-load-safe-pathdetermines a list of “directories trusted for automatic loading and execution of scripts”. -
Various `auto-load` settings control which kinds of scripts GDB should auto-load (GDB scripts vs. Python scripts vs. command scripts) and a
auto-load safe-pathssetting controls which paths are safe to load from (as mentioned above).
-
-
@DebugDescription macro
- The Swift compiler will turn any class annotated with this attribute into a LLDB formatter bytecode program and embed it into a special section in the binary. LLDB knows how to load and interpret this bytecode. Summary providers and synthetic child providers are both supported.
-
libc++ GDB formatters
-
Initial check-in: https://reviews.llvm.org/D65609
-
Formatters are written in python and are distributed with libc++ (though Linux distributions don’t seem to install them into
/usr/like they do for libstdc++) -
Tests live in the libc++ test-suite
-
This proposal takes the “libstdc++ GDB formatters” approach since that’s what many are already familiar with and is a more mature ecosystem. (note, transitioning to formatter bytecode is still a possibility in the future and compatible with the direction of this RFC)
Proposal
-
Rewrite the libc++ formatters in Python and ship them alongside libc++ headers
-
On macOS the formatters would be placed somewhere inside the SDK (where the libc++ headers live)
-
On Linux this would be up to the distribution but should mimick the installation of libstdc++'s gdb formatters (which get installed into
/usr/share/gdb) -
On Windows this would be up to the toolchain maintainer
- E.g., will ensure that the Swift toolchain installer places the formatters into some location that LLDB knows about.
-
-
On target launch, LLDB would auto-load formatters from these blessed locations.
- For this we would introduce a setting (similar to gdb’s
auto-load), set differently depending on platform, but can be overriden by users/distributions/toolchains in their.lldbinit. Discussed slightly further in theAuto-loadingsection below.
- For this we would introduce a setting (similar to gdb’s
-
The formatters infrastructure shouldn’t require any other changes. Instead of adding type summaries/synthetic providers using C++ function pointers, we add them using the Python class names from the loaded formatter scripts.
Considerations
Auto-loading
Since the location of the formatters is not up to LLDB and will vary between distributions/vendors/etc., we want a configurable way to specify where to load the formatters from. A natural precedent for this is GDB’s auto-load infrastructure (discussed above). The current proposal omits the GDB’s auto-load booleans and implements the equivalent of a safe-paths setting for scripts specficially. If we want more granular control over this, we could provide separate “should auto-load this kind of script” and “should auto-load scripts from” settings. But we could also say if you don’t want to auto-load, unset the paths settings.
Existing Settings
For scripts
target.load-script-from-symbol-file -- Allow LLDB to load scripting resources embedded in symbol files when available.
AFAIK, the main use of this is for formatters distributed in dSYMs. When a dSYM gets loaded it would only automatically load the contained scripts if this setting is set. From a security perspective, loading scripts distributed with dSYMs automatically is more of a risk than loading from a system path set up by the vendor. So explicitly opting into auto-loading from dSYMs makes sense. This proposal does not change the semantics or existence of this setting. It seems like a natural counterpart (auto-load from symbol file vs. auto-load from path).
For .lldbinit
--local-lldbinit
--no-lldbinit
These control automatic loading of .lldbinit files. This proposal would not affect these. If we wanted to create auto-load-paths counterparts for these we could add a target.auto-load-paths.init setting.
Build-time Setting
We would introduce a new CMake variable that takes a list of paths and get embedded into LLDB (e.g.,-DLLDB_AUTO_LOAD_PATHS_SCRIPTS, with the idea being that if we want some other kinds of auto-load paths they would be called AUTO_LOAD_PATHS_FOO). This would be the primary way we expect the auto-load paths to be configured. This would include the system path but also local build paths.
Runtime Setting
A setting like settings set target.auto-load-paths.scripts path/to/1;path/to/2 would allow users to modify the auto-load paths. The default value of this setting would be configured at LLDB build-time. We want this to be configurable per-target, since one could be debugging different targets with their own libc++ formatter setup. Turning off auto-load entirely could be done by assigning to the setting an empty string. This would be useful for local development, testing or more bespoke setups.
Formatter backwards compatibility
Ideally we would load the libc++ formatter script that matches the libc++ version that the target was compiled against. This might be hard to check/enforce, so a better heuristic could be to load the formatters from the *newest* toolchain. If only a newer libc++ toolchain is available than what the target was compiled against, then the newer libc++ formatters should still be able to format the old layouts. I.e.,libc++ formatter scripts should be backwards compatible. This is already true for the builtin LLDB formatters and will be enforced when porting them to Python (more on testing in the Testing section below).
Other STLs
We also have MSVC STL and libstdc++ formatters in LLDB. Some are written in Python (many of the libstdc++ formatters) and some are written in C++ (all of the MSVC formatters).
This RFC only concerns itself with libc++, so we keep them untouched.
Rollout
To iron out any issues in the rewritten formatters, we could consider a period of time (a single LLVM release?) where we have both the C++ and Python formatters. These could be switched using an LLDB setting (the default being the Python formatters).
Testing
The tests will live in libc++ so they run on all the libc++ builders. Backwards compatibility of the formatters could be tested by copy-pasting the headers with the old layouts into the test-suite (this is what we already do in the LLDB test-suite for a handful of types) .
We could also make all the LLDB formatter tests run against the built-in C++ formatters *and* the new auto-loaded Python formatters.