RFC: Separation of embedded Python from the rest of LLDB.

A little background: The single biggest painpoint for working with LLDB on Windows currently is Python. There is a long documented history of the problems with python extension modules on Windows. Part of it is Microsoft’s fault, part of it is Python’s fault (PEP 384 attempts to solve this, but it appears stalled), but the end result is that it’s really terrible and there’s nothing anyone can do about it.

The implications of this for LLDB on Windows are the following:

  1. Anyone building LLDB on Windows needs to compile Python from source
  2. Even after doing so, configuring the LLDB build is difficult and requires a lot of manual steps
  3. You can’t use any other extension modules in your custom built version of python, including any of the over 50,000 from PyPI, unless you also build them from source, which isn’t even always possible.

If you want to be compatible with the binary release of Python and interoperate with the version of Python that probably 99% of Windows people have on their machine, you must compile your extension module with a very old version of the compiler. So old, in fact, that it’s almost end-of-lifed. If it weren’t for Python actually, it would have been end-of-lifed already, and Microsoft ships a free version of this toolchain for the express purpose of compiling python extension modules.

I’ve been thinking about this for many months, and I believe I finally have a solution (2 solutions actually, although I think we should do both. I’ll get to that later). Both will probably be painful, but in the end for the better on all platforms.

Solution 1: Decouple embedded python code from LLDB
Rationale: Currently calls to embedded python code live in a few different places. Primarily in source/Interpreter (e.g. ScriptInterpreterPython) and the generated SWIG code, but there’s a few utility headers scattered around.

I’m proposing moving all of this code to a single shared library with nothing else and creating some heavy restrictions on what kind of code can go into this library, primarily to satisfy the requirement that it be compilable with the old version of the compiler in question. These restrictions would be:

  1. Can’t use fancy C++. It’s hard to say exactly what subset of C++ we could use here, but a good rough approximation is to say C++98.
  2. Can’t depend on any LLVM headers or even other LLDB libraries. Other LLDB projects could depend on it, but it has to stand alone to guarantee that it doesn’t pick up C++ that won’t compile under the old MSVC compiler.

I understand that the first restriction is very annoying, but it would only be imposed upon a very small amount of source code. About 2 or 3 source files.

The second restriction, and the decoupling as a whole will probably cause some pain and breakage for out-of-tree code, but I believe it’s a positive in the long run. In particular, it opens the door to replacing the embedded interpreter with that of a different language. By separating the code in this fashion, all one has to do is write a new module for their own language and initialize it instead of ScriptInterpreterPython. I’ve already seen people asking on the list about generating bindings for other languages and replacing the interpreter, and i believe a separation of this kind is a pre-requisite anyway.

Solution 2: Upgrade to Python 3.5
Rationale: Hopefully I didn’t send you running for the hills. This is going to have to happen someday anyway. Python 2.7 end of life is set to 2020. Seems like a long time, but it’ll be here before we know it, and if we aren’t prepared, it’s going to be a world of hurt.

Why does this help us on Windows? Visual Studio 2015, which releases sometime this year, is finally set to have a stable ABI for it’s CRT. This means that any application compiled against VS2015 or later will be forward compatible with future versions of the CRT forever. Python 3.5 is not yet released, but the current proposal is for Python 3.5 to ship its binary release compiled with VC++ 2015, effectively killing this problem.

I understand that there is a lot of out-of-tree code that is written against Python 2.7. I would encourage the people who own this code to begin thinking about migrating to Python 3 soon. In the meantime, I believe we can begin to address this in-tree in 3 main phases.

  1. Port the test suite to Python 3.5, using a subset of Python 3.5 that is also compatible with 2.7. This ensures no out of tree code is broken.
  2. Upgrading ScriptInterpreterPython with preprocessor flags to select whether you want to link against Python 2.7 and 3.5, and expose these options from CMake and Xcode so you can choose what you link against.
  3. Making multiple buildbots with different configurations and different versions of Python to get better code coverage to ensure that tests work under both language versions.

I realize that the impact of both of these changes is high, but we have a very strong desire to solve this problem on Windows and so we need to push some kind of solution forward. I think both solutions actually contribute to the long term benefit of LLDB as a whole, and so I think it’s worth seriously considering both of these and trying to come up with a path forward.

This seems like a lot of work to support Windows users who might want to use pre-compiled python modules.

I think we should distribute a VS2015 based version of Python 2.7 binaries and call it a day.

We can worry about 2020 problems in 2020. =)

Reasonable people may disagree.

IANAL, but I floated this idea once and people weren’t super thrilled about the idea of checking in binaries or headers. That reminds me though that I should ask again.

It also doesn’t solve the problem for users of lldb on windows who need support for using other extension modules in their scripts. While not me, I believe there are some people who care about that and would like to make that happen.

This seems like a lot of work to support Windows users who might want to
use pre-compiled python modules.

To me, the primary benefit is that you can build LLDB with Python support
without building Python from source. More onerous, once you build Python
from source, you have to maintain two separate environments: your normal
development environment with normal Python on PATH, and one with LLDB's
version of Python on PATH and PYTHONPATH.

I think we should distribute a VS2015 based version of Python 2.7 binaries
and call it a day.

We can worry about 2020 problems in 2020. =)

Reasonable people may disagree.

I disagree. I guess that makes me reasonable. :wink:

The main thing for me is that once we put together a pre-built Python, the
instructions to rebuild it will immediately rot. In a matter of months,
some user will arrive needing LLDB and Python to be built with a different
VS version, and I give it 50/50 odds that the instructions will work.

Personally, I think the lowest impact change for everyone not affected by
this is to split the Python API code out into a shared library. LLDB has
more Python source than C++ source interacting with Python. Maintaining
Python source compat with 2.7 and 3.5 will be fragile and will require
testing more configurations. Reducing the number of obscure ways that
non-Windows developers can break the Windows build is good.

I’d like to also add my support to solution #1.

My use case is literally what Zachary has outlined – Building on Windows, wanting to interface with LLDB not using Python but a different language, say C#.

The Python requirement (and that too building from source) was a bit of an odd (for a non-Python user) step and probably the biggest impediment to getting started.

I’m all for language interop, but this special treatment Python is getting in the LLDB project will accrue (and I’d argue already is) long-term technical debt.

I agree with you for the most part, including that the code separation has the lowest impact. But it’s worth mentioning that here, EOL for VC++ 2008 is 2018. So while technically moving to 3.x is not anyone else’s problem until 2020, the fact that it’s going our problem 2 years earlier means we will have to do something about it by then. And at that point, 3.5 is going to be the only option on the table. We don’t need to take action now, but we should start thinking about it. Luckily the part we care about isn’t much. there’s 4 or 5 python scripts to port and maintain in compatibility mode. And I suspect (hope) that by that time the people with the out of tree 2.7 code will be starting to do something about it.

In any case, I’m mostly taking issue with your statement that “Reducing the number of obscure ways that non-Windows developers can break the Windows build is good”. We shouldn’t be thinking of Python 3.5 in this context as “the windows build”. It’s probably going to be “the build” eventually.

It’s not just a Windows issue, the issue crops up on Linux if you’re not running LLDB on the OS/Version it was built on.

We build on Win 7 64, with Python 2.7.8 that I built using VS 2013, without any of the extras like SSH. We also build on SLES 11 Linux using clang 3.4 and Python 2.7.6 that I built (because of a crasher that Todd Fiala fixed on python.org, but the fix hasn’t made it into python yet). We test on Win 7 and SLES 11, Ubuntu 10 and Ubuntu 12.

The big question is “which version of Python am I running with?” I found that I can’t rely on the system Python on Linux, because different versions of Python don’t play well with each other. I don’t just mean the shared libraries; the modules cause big problems, even when you’re only loading modules written in Python, not .pyd libraries. LLDB built with 2.7.6 (with libpython2.7.so in …/lib) seems to be OK with Ubuntu 10’s Python 2.6 modules, but with SLES’s modules it throws lots of warnings, and with Ubuntu 12’s modules (2.7.3) it crashes. Never mind the Ubuntu 10/12 OpenSSH split (1.0 → 1.1) which means I can’t load hashlib (loaded by default) because it loads SSL, which will fail on the wrong side of the OpenSSL divide.

My solution – ship a partial Python install with LLDB, and set PYTHONHOME/PYTHONPATH to point to the right place. So on Windows I ship 2.7.8 modules and on Linux I ship 2.7.6 modules with LLDB. This way I don’t care what system LLDB is run on; it just works.

Saying you can’t build Python with VS2013 isn’t true. I did. I’m having issues with some of the bindings (specifically “print lldb.debugger” gives “No value”, even though it’s there), but for the most part it works.

Also, you don’t always need to build 3rd party modules. Many of them are only implemented in Python, and don’t load a DLL (.pyd), so you don’t have to rebuild them.

Well sure, I did too. As it’s the only way to run tests. I think his point is just that it’s a big barrier to entry. I’ve got 3 guys who sit around me. One works on the windows linker, and the other two are responsible for much of clang-cl. None of them feel like going through these hoops to build LLDB, even though there’s bugs in LLDB on Windows that their expertise would be a great asset in making some progress on.

I mean yes I could do it for them, and yes they could ultimately do it themselves, but the point is that it shouldn’t be this difficult. What about people without the same amount of technical background, but who still want to hack around on the debugger? I dont’ want to discourage anyone from being able to work on the project. Barriers close doors. I want the doors to be open.

I’m all for making it simpler. Believe me, coming up with my solution was a pain. But once I built Python, I was done with that part. We save off the binaries/modules, build against them, then copy them into the lib directory. I’ve got a patch that lets the builder set a default PYTHONHOME and PYTHONPATH in cmake, and if they don’t exist when LLDB is run it sets them in the environment based on the defaults.

We need to build Python on Windows with VS2013 because the binaries from python.org are built with VS 2009, and will cause crashes in LLDB built with VS2013 because of incompatibilities between the 2009 C++ library and the 2012 C++ library. Our options seem to be:

  1. Build Python and save an artifact

  2. Require everyone build Python

  3. Separate the Python interface into its own shared library and hope that when it calls the 2009 C++RT it won’t crash

I went for #1 for Hexagon LLDB because I couldn’t rely on the user having the correct Python installed, on Windows or Linux.

Just to be clear, with solution #1 proposed in the original message, then your option #3 is guaranteed to work. (also, minor pedantic nit - it’s VC 2008, not 2009). You won’t have to build python, it will literally just work. The reason for this is that I will have the CMake build automatically compile the separate shared library with Microsoft’s toolchain whose sole purpose in life is to build python extension modules that interoperate with the binary release of python 2.7

The icon for pcbuild.sln has a 9 in it; I assumed that meant VS 2009. Silly me J

I’m not confident that a DLL built with VS 2008 that has any C++ code will not crash when loaded into a program built with VS2013, because of the incompatible C++ library issue.

How do we get around the debug-vs-release issue that makes us have python27_d.dll on a debug build right now?

It won’t be loaded into a

The icon for pcbuild.sln has a 9 in it; I assumed that meant VS 2009. Silly me J

I’m not confident that a DLL built with VS 2008 that has any C++ code will not crash when loaded into a program built with VS2013, because of the incompatible C++ library issue.

This is actually a fairly easy problem to solve. The interface boundary between the VS 2008 DLL and the VS2013 (or whatever version) stuff simply needs to have some rules on it. C++ classes are fine, the only thing that isn’t fine is allocating memory one one side which is freed on the other side. So you can have all the C++ you want, you just have to keep that C++ in the DLL, and not give the user access to any of its internal state that would cause code on the other side to trigger a free.

This sounds scary, but in practice it’s really not that difficult. For example, suppose you have ScriptInterpreterPython defined in the DLL. It derives from a ScriptInterpreter base interface. You don’t expose ScriptInterpreterPython outside the DLL, only ScriptInterpreter. Anyone who uses a pointer to ScriptInterpreter is fine, doesn’t matter what side of the boundary they’re on, because the actual code for the implementation of these methods lives in the DLL. If you expose the full object though, you run the risk of someone passing it by value, then have news on one side and deletes on the other. Another thing you can’t do is pass std::strings or similar by value (which also means you can’t return them from dllexported functions).

Since a side benefit of this refactor is to open the door for someone dropping in a new embedded interpreter though, we would want to hide the implementation details of the interpreter anyway, so this problem is solved as a natural consequence of the design. All we expose is a pointer to the base interface, hide the rest, and pass the pointer across boundaries, making it clear that the pointer is always owned by the DLL.

Lots of Microsoft system DLLs have C++ code in them, so this isn’t really an issue in practice as long as you define your interface.

How do we get around the debug-vs-release issue that makes us have python27_d.dll on a debug build right now?

The reason we need python27_d.dll on a debug build right now is because all of LLDB is compiled into the a single massive DLL that is basically lldb.exe but as a dll that python can load. So that’s actually the point of the split. Python only needs just enough that it can call into LLDB’s public API. The C++ implementation of the public API classes are already dllexported, so the SWIG generated code that wraps those will just call into some dllexported methods, across the boundary, and into code that is compiled with VS2013.

If a developer is hacking on the extension module itself (e.g. the implementation of ScriptInterpreterPython), he/she will need to compile python himself with VS2013 in order to step through the code. But that code is is low-traffic code and relatively stable.

Think about it like kernel32.dll and yourdll.dll. Who knows what version of the compiler kernel32.dll is built with. But they can call into each other no problem, and you can still debug your program with no problem, even though kernel32 is a release binary.

We had planned on python being able to be replaced by another language and abstracted all python stuff into the pure virtual ScriptInterpreter and the one and only scripting subclass ScriptInterpreterPython. What part of this abstraction isn't working for people? I don't see the need for heroic efforts that are proposed without a specific reason. All python specific stuff is in ScriptInterpreterPython and a few swig class. Adding support for another language should be as simple as configuring swig and making it generate all of the classes and subclassing ScriptInterpreter. Python can be disabled for people that don't want/need python support, as any language that we add should be.

So I like solution 2 where only Windows uses Python 3.5 and we allow the test suite to be updated so that it can run with 3.5, yet it still runs on 2.7.

Most Python stuff is in ScriptInterpreterPython (see Include/Utility/PythonPointer.h for an example of code that isn’t). But ScriptInterpreterPython is just compiled straight into Interpreter, instead of into its own library. That’s the part that isn’t working. If someone wants to add support for another language, it would be nice to do it in a way that’s upstreamable. The way to do that in my opinion isn’t to just keep sticking new implementations into Interpreter, it’s to separate the python interpreter code into its own library that can be conditionally linked in or not.

If we separate the library, then there is a little bit of upfront pain to fix out of tree code, and after that initial hurdle is overcome, everything is back to normal. If we fragment the testing environments into one subset of people that only cares about python 2.7 and another subset of people that only cares about 3.5, then there will be contiuous burden on everyone to not break the other person. So I don’t think we should do the python 3.5 solution unless everyone is on board to start moving to 3.5. Anwyay, python 3.5 isn’t even released yet, nor is the C++ compiler. The compiler is targeting summer of this year and python 3.5 is targeting september of this year (I believe).

Also, I don’t think the fixups necessary for out of tree code associated with separating the library will be anything larger than what would need to be done for someone adding another language anyway. Some header and source files will move around, and the build dependency structure will change. The big thing this will break is people who do

#include “lldb/lldb-python.h”

Sorry, hit send too soon. The big thing that will change is people who do:

#include “lldb-python.h”

and

#if !defined(LLDB_DISABLE_PYTHON)

There are currently a lot of instances of this in the code. The path forward for this would be to gradually change #if defined(LLDB_DISABLE_PYTHON) to simply checking if GetScriptInterpreter() returns null. I scanned about 10% of the occurrences of this and there’s nothing in any of them that are python specific.

I don’t know how much of this kind of thing is out of tree, but either way it’s a purely mechanical change. If there are places that are actually relying on it being a ScriptInterpreterPython, then that’s an error in the code, because as you said it was designed to be pluggable.

Hello,

A little background: The single biggest painpoint for working with LLDB on
Windows currently is Python. There is a long
<https://mail.python.org/pipermail/distutils-sig/2013-February/020006.html>
documented <https://docs.python.org/2/extending/windows.html> history of
the problems with python extension modules on Windows. Part
<https://msdn.microsoft.com/en-us/library/ms235460.aspx> of it is
Microsoft's <https://msdn.microsoft.com/en-us/library/bb531344.aspx> fault,
part of it is Python's fault (PEP 384
<https://www.python.org/dev/peps/pep-0384/> attempts to solve this, but it
appears stalled), but the end result is that it's really terrible and
there's nothing anyone can do about it.

So, I'm not sure why you say PEP 384 is stalled. The stable ABI is
definitely supported. It's true that it's not properly advocated or
documented, so it probably sees relatively little use by extension
authors. Also, due to the stability constraint, the stable ABI can
only expose a subset of the full API. Nevertheless, the ABI is
implemented.

*Solution 2:* *Upgrade to Python 3.5*
*Rationale:* Hopefully I didn't send you running for the hills. This is
going to have to happen someday anyway. Python 2.7 end of life is set to
2020 <https://hg.python.org/peps/rev/76d43e52d978>. Seems like a long
time, but it'll be here before we know it, and if we aren't prepared, it's
going to be a world of hurt.

Why does this help us on Windows? Visual Studio 2015, which releases
sometime this year, is finally set to have a stable ABI
<http://blogs.msdn.com/b/vcblog/archive/2014/06/10/the-great-crt-refactoring.aspx>
for it's CRT. This means that any application compiled against VS2015 or
later will be forward compatible with future versions of the CRT forever
<https://mail.python.org/pipermail/python-dev/2014-June/134888.html>.
Python 3.5 is not yet released, but the current proposal is for Python 3.5
<https://mail.python.org/pipermail/python-dev/2014-June/134866.html> to
ship its binary release compiled with VC++ 2015, effectively killing this
problem.

Note this is effectively a done deal. Python 3.5's build procedure is
now officially ported to VS 2015 (even though both are still
unreleased).

Regards

Antoine.