A Minimal Python Install for LLDB

LLDB’s dependency on Python is a longstanding source of friction. Because we use Python’s private API, LLDB requires the exact version of Python it was built against. This can be confusing for users:

  • They want to import lldb from the Python interpreter in their path, which they reasonably expect to work.
  • They want to use the command line driver, or a tool linking against LLDB as a library, but we can’t load the matching Python library, because it’s available in the location we expect.

These issues are so common that we have a dedicated explanation on the website.

What We Do Today

Here’s how we handle linking Python on the various platforms we support today:

  • macOS: LLDB links against a copy of Python that ships with Xcode. However, Xcode is relocatable, which means this only works when LLDB is part of Xcode and can use a relative RPATH to load Python. For the nightly Swift toolchains (which get installed in ~/Library/Toolchains), we use an absolute RPATH that only works if you have Xcode installed in /Applications/Xcode.app with the expected Python version.
  • Linux and BSD: LLDB links against the default Python for that system. This works because we build releases for specific distributions.
  • Windows: We link a specific Python version and require users to download the matching release from python.org.

Embedding Python

Shipping a copy of Python is the most robust solution because it guarantees that you load the exact version. However, this essentially means maintaining our own Python distribution. Our users often rely on packages and want to import Python packages in their LLDB scripts, or import LLDB from scripts that use a variety of packages. Maintaining a Python distribution for every platform is not something the LLDB community can shoulder.

Breaking the Revlock

As mentioned above, LLDB uses the Python private (unstable) API, which is why we had to load the matching Python library. We were also limited by SWIG, which generated code that wasn’t part of the limited API. This has changed:

By using only the Python Limited C API, we can import LLDB in a Python 3.8 or newer interpreter, and similarly, load a Python 3.8 or later shared library. While we’ve made significant progress by transitioning to the Python Limited C API, there’s still more work to be done to enable loading LLDB in a different Python interpreter. However, this opens up a path towards a hybrid solution.

The Hybrid Solution

I’m proposing we ship LLDB with a copy of the Python shared library, but without everything else that makes up a Python distribution. This means a Python shared library and standard library, but no standalone interpreter and no third-party packages. By default, users get a minimal installation of Python that guarantees the embedded interpreter (i.e., the script command) in LLDB works.

What about import lldb?

To run a Python script that uses import lldb, users will need a Python interpreter, which means they need to install a complete Python distribution. Because we use the stable API, they can use any version as long as it’s Python 3.8 or newer.

What about packages?

Users wouldn’t be able to use third-party packages with the minimal Python distribution that comes with LLDB, and there’s no pip module to install them. Similarly, they would need to install a Python distribution. The proposed hybrid approach hinges on LLDB automatically picking up and preferring the full distribution if it’s available.

  • On macOS: We use an absolute RPATH to /Library/Frameworks/Python.framework/Python and have a fallback to the minimal embedded library.
  • On Windows: Since #162509, we can use SetDllDirectoryW to set the load path dynamically.
  • On Linux: TBD (?)

Next Steps

I’m sharing this RFC to outline my plan for how we can move forward with Python distribution in LLDB. It’s important to note that adoption of this hybrid approach depends on the various LLDB distributors and whether they choose to follow this model.

What’s outlined here is the path I’m planning to pursue for macOS. @charles-zablit has started working on a similar approach for LLDB on Windows on the Swift fork.

Feedback Requested

I’m looking for feedback from the community on:

  • Feasibility on your platform: Does this approach make sense for Linux, Windows, or other platforms you maintain?
  • Technical concerns: Are there edge cases or compatibility issues I haven’t considered?
  • Implementation details: Particularly for Linux, what’s the best way to implement the dynamic library loading preferences?
7 Likes

What will the criteria be for lldb to update its embedded python and standard library?

It seems there is the potential for version (Python or stdlib) mismatches? For example some lldb python code works as a command in lldb, but fails when loaded as a library in an external python (or vice versa).

The proposed plan wouldn’t change that: it would remain to be at the discretion of the distributor.

Yes, that’s a good callout. By using the stable API, that’s even possible between different versions of the Python interpreter. The only reason we don’t hit that today is because we crash, so I’d say we’re still strictly better off.

For anyone distributing lldb as part of some package/set of packages which includes python, there’s no incentive to ship a python shared library separate from python itself. Presumably this will continue to work.

For anyone distributing a toolchain independently, without python, the proposal is… the lldb build system somehow installs a minimal python alongside lldb? Or does some packaging script do this? How is the python install built? Is there some set of build flags to make a python build “minimal”?

Do you expect any releases at Releases · llvm/llvm-project · GitHub will include python?

Is there some reason we don’t want to just print “error: please install python” if a user uses a command that requires python, but python isn’t installed? Anyone who actually uses python has it installed, and it sounds like the exact version doesn’t matter anymore.

I realize I should have been more explicit about the goals of this RFC in my original post. The main purpose of this RFC is to inform the community about what I’m planning to implement on macOS and how it will work. I want everyone to understand the big picture, especially if you encounter changes related to this work. I want to ensure the community has visibility into the approach and has an opportunity to identify limitations that I may not have considered.

To be clear, I’m not proposing to change how things currently work for anyone else. This RFC is about adding support for distributing LLDB with a minimal Python install, not replacing or modifying how we distribute LLDB today.

That’s correct.

Yes, depending on the platform. For macOS, I would let CMake handle putting a copy of the Python shared library in LLDB.framework. On the Swift fork, Charles is handling this in build.ps1 which is a wrapper around CMake to build Swift & LLDB on Windows.

Similar to how Python is specified today, it would remain an external dependency that’s passed to the LLDB build.

I don’t plan to change the releases. However, I’ve seen the same complaint in that context, so they could certainly benefit from this solution, but then we would need to answer the questions you’ve outlined above.

That’s a great question. During the Python 2-to-3 transition I looked into this. Doing what you’re suggesting would require either weak linking Python or using dlsym. Both require code changes and the majority of our glue code is generated by SWIG, which to my knowledge doesn’t support either.

1 Like

On Windows, since #162509, we take advantage of the delay loading feature of MSVC. It allows us to search for python3.dll in specific locations and add it to the PATH at runtime and even print a warning. This is not possible on other platforms.

We’ve been discussing Python internally. For the last 10 years, we’ve shipped Python with our Hexagon tools, to enable LLDB. As part of research to answer “what does upstream do?”, I downloaded the latest 21.1 Windows release. I ran LLDB, and got the Windows “can’t find python310.dll” error dialog.

I also downloaded the x64 Linux version, and saw that liblldb.so required libpython3.10.so, in the Ubuntu 22.04 system Python location. Which means our LLDB release on Github won’t run on a default Ubuntu 24.04 release, which uses Python 3.12. And it won’t run on bunch of other distros.

I couldn’t find installation instructions that said that Python was required, which Python version was required, or where to get it. I think we need to upgrade the release to say these, and what we built and tested LLVM against.

What happens if you load up lldb with the embedded (bundled?) dylib for say py3.10 and then debug another executable (or whatever) that links against py3.10. Where will dlsym resolve symbols from? If you were statically linking you could default-visibility=private on lldb (or strip all symbols) but with a dylib you can’t? Also isn’t glibc an issue? On Linux you could use the Python dylib from a manylinux distro but that doesn’t exist for Mac/Windows?

Sorry if I’ve misunderstood your plan.

This is a large issue when using the prebuilt versions of LLDB from GitHub. There are at least seven issues on GitHub about this (see #137467 for a list), so it’s great that we make some progress here and that a proper error is printed since #162509.

1 Like

This plan sounds reasonable to me. It’s a good thing to do - and I also agree on keeping the changes primarily outside of the main LLDB CMake build, as every distributor may wish to do this in different ways.

For llvm-mingw, I bundle it with a minimal install of Python since a couple years; I have the python DLL copied into the LLVM/LLDB `bin` directory, so it works already before https://github.com/llvm/llvm-project/pull/162509. But it’s up to each distributor how to set it up for their case.

I am having trouble working out what role the stable C API work has here.

“We now”, but the issue is still open. Do you mean “we would”?

I think you meant “we would” because this implies that there’s more changes to make.

Unless the “more work to be done” is not to do with using the stable C API. In other words: even when Make LLDB compatible with the Python Limited C API · Issue #151617 · llvm/llvm-project · GitHub is complete, there is more work to do.

Using the stable C API is only the first step. That’s the message, right?

So assuming we finish the stable C API work. We could say users must install some version of Python, and LLDB will find that…somehow.

You’d prefer that they can do that but do not have to do that. Right?

In a previous job we did a similar thing except we did ship a full Python. The reasons there were:

  1. Out of the box experience (your main motivation too it seems).
  2. Consistency. If the customer used our Python, they could expect all our libraries to work as described and bug reporting was smoother.

#2 is in theory a problem for LLDB but our scripts get tested on a range of versions from 3.8 on up, so it’s less likely to come up. Users can always match the internal Python version (which they can find from the internal Python prompt) and we can ask them to reproduce on the built-in Python as one more datapoint if needed.

I’m assuming we’ll add tags and logs in the right places so we can tell which was used when there’s a bug report.

I think this explains my earlier confusion.

  • LLDB would ship with this “internal” Python.
  • It interacts with that Python using the stable C API.
  • User installs a full external Python
  • LLDB switches to this and it can do that because LLDB is using only the stable C API

I wonder how clear we can be to users. Just a rough idea, what if imports in the built-in Python when they failed included “external packages cannot be installed into this Python…”.

Even someone who is used to the maze of Python paths and versions may assume at first that the script Python is their system one and wonder why the thing they just pip installed doesn’t show up.

If we don’t want to customise it that much, we could print a message when the interpreter starts.

Using the stable API is a prerequisite for being able to import lldb in a Python interpreter that does not match the exact version LLDB was built against. This is a pilkar of the proposed hybrid approach.

The issue is still open because:

  1. We still use two non-stable APIs, though they’re guarded by a define.
  2. This requires SWIG 4.1 while the official minimum SWIG version is 4.0 and not everyone is ready to upgrade yet.

Using the stable API is a prerequisite, but it seems like there’s at least a little bit more work to be done. When I tried importing LLDB, built against the stable API, in a different Python I hit an import error.

Yes, in the sense that a user wouldn’t have to, unless they want to import a third party package in their LLDB script, or import LLDB into their external script (which requires an Python interpreter anyway). But if we found the a full Python installation, we would prefer it.

Yep, that’s a great summary!

I think that’s a good idea to improve the user experience and given we have a lot of control over the embedded interpreter, that sounds feasible.

Can we move the lldb python glue into a separate liblldbpython.so library, and dlopen that?

I guess this is a longer-term project, in any case, but it would be good to know if that’s what we want to work towards.

A quick nm on LLDBWrapPython.cpp.o shows 12k symbols. I’m sure we don’t need all of them, but with our extensive Python bindings, it’s a huge API surface.

What I did for the Python 2-to-3 transition for macOS was make liblldbPluginScriptInterpreterPython a shared library (normally we link it statically, like all the other “plugins”) and have two copies: one linked with Python 2 and on linked with Python 3 and dynamically load the requested one using dlopen.

That was certainly a hack and not something I think we should aspire to. It requires re-exporting a bunch of lldb_private and llvm symbols from libLLDB. You can’t link them statically to the shared library because then you end up with a different copy there, which breaks a variety of things. There’s also no plugin ABI, so on every rebase I had to change the export list. And of course a project accidentally started relying on those exported symbols, and we had to maintain a duplicate copy of some of the Support & ADT classes to not break them.

Just to clarify, the error printed here is not accurate. We would need to iterate over the possible DLL paths to make sure that there is no python.dll. If we don’t find any, then we can emit the error.

At some point I’ll be releasing LLDB for the platform I’m currently working on, and I see myself doing something like what you are proposing. Nowadays lots of devs want to use numpy or similar libraries to analyze big chunks of data, which can be gotten from the debugger, and if LLDB can pick up any stable python from the system, it would be amazing, because users will have access to their tools.

Hi,

As it happens, I’ve got some experience in this area. For a while, my CodeLLDB project was being released in a Python version-agnostic configuration: it would use whatever Python was available on the system (as long as it was >= v3.4). If you are curious, here’s my LLVM fork with the relevant changes, specifically commits 9c78d4f and e8b840d.

However, I pretty quickly realized that there are a lot of broken Python installations in the wild, and these result in cryptic bug reports.

In the end, I opted for bundling a full Python runtime with CodeLLDB (sans some modules which are unlikely to be useful for debugging, like Tk). It’s based on builds provided by the Standalone Python project, which solves the problem of binary Python extensions depending on files shipped with the specific Linux distro. On the downside, they don’t support all of the platforms that LLDB does.

2 Likes

We currently ship Python with our Hexagon toolset. Many years ago someone built a visualizer using Tk in LLDB, with the Python we shipped. He’d display graphs of data from LLDB memory reads. My point - Tk can be useful!

I filed a new top-level issue to track the progress on being able to import lldb in a different Python interpreter than the one we built against: Support `import lldb` in a Python interpreter that's different from the one we link against · Issue #167001 · llvm/llvm-project · GitHub

1 Like