RFC: full support for python files, and avoid using FILE* internally

Hi lldb-dev.

I want to be able to use LLDB inside of iPython, so I can have mixed python and LLDB debug session.

To this end, I’d like to update LLDB to have full support for python file objects, so the outputs of debugger commands can be redirected into iPython’s own streams.

This however, is difficult to do, because LLDB makes use of FILE* streams in a number of places. This presents two problems. The first is that there is no really
correct way to create SWIG typemaps that handle conversion to FILE* and get the ownership semantics correct. The second problem is that there is not a portable
way to make a FILE* with arbitrary callbacks for reading and writing. On Darwin and BSD there’s funopen, and on linux there’s something else, and I don’t know if
there’s any way on windows.

I made an attempt at this a while ago using funopen a while ago, here:

https://reviews.llvm.org/D38829

Zachary Turner suggested a more thorough approach. where instead of trying to use funopen to paper over all the use of FILE* streams, we should make
lldb_private::File capable of doing the dynamic dispatch and excise all the unnecessary FILE* stuff in favor of lldb_private::File.

That’s what I’ve done here: https://github.com/smoofra/llvm-project/tree/files

I’ve posted the first few patches to phabricator for review.

https://reviews.llvm.org/D67793
https://reviews.llvm.org/D67792
https://reviews.llvm.org/D67789

What do you think?

Hello Larry,

thanks for starting this thread.

So, judging by your problem description, it sounds to me like you're primarily interested in the SBCommandInterpreter::HandleCommand family of functions (and by extension, the SBCommandReturnObject class). Would that be a fair thing to say?

The reason I am asking this is that I'm wondering what is the scope of the thing you're proposing to do (and then, whether this is the best way to accomplish that). For instance, if we were only interested in the HandleCommand api, then it might be possible to plug the python in at a higher level (Stream instead of File). I am hoping that doing that might be easier as the Stream class has a simpler interface, and already supports multiple backing implementations (StreamFile, StreamString, ...).

Also, doing that would allow to side step some complicated questions. One of the reasons why getting rid of FILE* is so complicated (you're not the first person to try that) is that there are some APIs (libedit mainly), that we just cannot change, and which require a FILE*.

If you do want to go with the more general change, then I'd like to ask you to give a bit more detail about the your vision of the new role of the lldb_private::File class and its interaction with other major lldb components (SBFile, StreamFile, ???). My understanding (it's been a while since I looked at this in detail) is that the File class can be constructed from both FILE* and a file descriptor and (crucially) it is also able to give back these underlying objects, including converting between the two. Now, I am assuming you're intending to add a third method of constructing a File object (using some python callbacks), but I assume that (due the mentioned lack of funopen etc.) you won't be trying to convert between these types. So, it would be good to spell out what exactly does the File class promise to do, and what happens when (e.g) a pythonified File object makes its way to code (libedit) which requires a FILE*.

regards,
pavel

Hi lldb-dev.
I want to be able to use LLDB inside of iPython, so I can have mixed python and LLDB debug session.
To this end, I’d like to update LLDB to have full support for python file objects, so the outputs of debugger commands can be redirected into iPython’s own streams.
This however, is difficult to do, because LLDB makes use of FILE* streams in a number of places. This presents two problems. The first is that there is no really
correct way to create SWIG typemaps that handle conversion to FILE* and get the ownership semantics correct. The second problem is that there is not a portable
way to make a FILE* with arbitrary callbacks for reading and writing. On Darwin and BSD there’s funopen, and on linux there’s something else, and I don’t know if
there’s any way on windows.
I made an attempt at this a while ago using funopen a while ago, here:
https://reviews.llvm.org/D38829
Zachary Turner suggested a more thorough approach. where instead of trying to use funopen to paper over all the use of FILE* streams, we should make
lldb_private::File capable of doing the dynamic dispatch and excise all the unnecessary FILE* stuff in favor of lldb_private::File.
That’s what I’ve done here: https://github.com/smoofra/llvm-project/tree/files
I’ve posted the first few patches to phabricator for review.
https://reviews.llvm.org/D67793
https://reviews.llvm.org/D67792
https://reviews.llvm.org/D67789
What do you think?

Hello Larry,

thanks for starting this thread.

So, judging by your problem description, it sounds to me like you're primarily interested in the SBCommandInterpreter::HandleCommand family of functions (and by extension, the SBCommandReturnObject class). Would that be a fair thing to say?

Not really. I want to be able to embed a full LLDB session inside of iPython, which means redirecting anything that prints to the debugger's main output and error streams. Yes, in most cases that will be coming from HandleCommand(), but I really want to avoid the situation where some output that would normally be printed to the terminal is missed under iPython.

The reason I am asking this is that I'm wondering what is the scope of the thing you're proposing to do (and then, whether this is the best way to accomplish that). For instance, if we were only interested in the HandleCommand api, then it might be possible to plug the python in at a higher level (Stream instead of File). I am hoping that doing that might be easier as the Stream class has a simpler interface, and already supports multiple backing implementations (StreamFile, StreamString, ...).

Also, doing that would allow to side step some complicated questions. One of the reasons why getting rid of FILE* is so complicated (you're not the first person to try that) is that there are some APIs (libedit mainly), that we just cannot change, and which require a FILE*.

I saw that. My strategy for dealing with that was to audit the codebase for any use of File::GetStream(). I found the only two places I could not remove the use of GetStream() was libedit and IOHandlerCursesGUI. In my prototype, I deal with that by checking for NULL from GetStream() before libedit or IOHandlerCursesGUI are enabled. In other words, If a File can produce a FILE*, it will. But you can still have a valid File that will return NULL from GetStream. If you set your debugger streams to Files that return NULL from GetStream, then libedit and the curses GUI will be disabled. I think this is a reasonable approach. For my use-case in particular, there is no need for either libedit or the curses gui, because the whole point is to use iPython as the gui. In general, libedit and curses only really make sense if the IO streams are a terminal anyway, so it’s not a problem to disable these features if the IO streams are redirected to python.

If you do want to go with the more general change, then I'd like to ask you to give a bit more detail about the your vision of the new role of the lldb_private::File class and its interaction with other major lldb components (SBFile, StreamFile, ???). My understanding (it's been a while since I looked at this in detail) is that the File class can be constructed from both FILE* and a file descriptor and (crucially) it is also able to give back these underlying objects, including converting between the two. Now, I am assuming you're intending to add a third method of constructing a File object (using some python callbacks), but I assume that (due the mentioned lack of funopen etc.) you won't be trying to convert between these types. So, it would be good to spell out what exactly does the File class promise to do, and what happens when (e.g) a pythonified File object makes its way to code (libedit) which requires a FILE*.

OK. My vision for File is that it’s main promise is to implement File::Read and/or File::Write. Files can be constructed from descriptors, or FILE* streams, and in that case they should be able to give those underlying objects back. But files may also be constructed in other ways. Clients should avoid calling GetDescriptor() or GetStream() if they can help it. If they can’t help it, such as in the case of libedit or IOHandlerCursesGUI, then they should check that they got a valid descriptor or stream before proceeding.

Files may also implement seek and tell, or not. If not they should return an “operation not supported” error from Seek() and Tell() and the versions of Read() and Write() that take offsets.

Hi lldb-dev.
I want to be able to use LLDB inside of iPython, so I can have mixed python and LLDB debug session.
To this end, I’d like to update LLDB to have full support for python file objects, so the outputs of debugger commands can be redirected into iPython’s own streams.
This however, is difficult to do, because LLDB makes use of FILE* streams in a number of places. This presents two problems. The first is that there is no really
correct way to create SWIG typemaps that handle conversion to FILE* and get the ownership semantics correct. The second problem is that there is not a portable
way to make a FILE* with arbitrary callbacks for reading and writing. On Darwin and BSD there’s funopen, and on linux there’s something else, and I don’t know if
there’s any way on windows.
I made an attempt at this a while ago using funopen a while ago, here:
https://reviews.llvm.org/D38829
Zachary Turner suggested a more thorough approach. where instead of trying to use funopen to paper over all the use of FILE* streams, we should make
lldb_private::File capable of doing the dynamic dispatch and excise all the unnecessary FILE* stuff in favor of lldb_private::File.
That’s what I’ve done here: https://github.com/smoofra/llvm-project/tree/files
I’ve posted the first few patches to phabricator for review.
https://reviews.llvm.org/D67793
https://reviews.llvm.org/D67792
https://reviews.llvm.org/D67789
What do you think?

Hello Larry,

thanks for starting this thread.

So, judging by your problem description, it sounds to me like you're primarily interested in the SBCommandInterpreter::HandleCommand family of functions (and by extension, the SBCommandReturnObject class). Would that be a fair thing to say?

Not really. I want to be able to embed a full LLDB session inside of iPython, which means redirecting anything that prints to the debugger's main output and error streams. Yes, in most cases that will be coming from HandleCommand(), but I really want to avoid the situation where some output that would normally be printed to the terminal is missed under iPython.

Ok, that's fair.

The reason I am asking this is that I'm wondering what is the scope of the thing you're proposing to do (and then, whether this is the best way to accomplish that). For instance, if we were only interested in the HandleCommand api, then it might be possible to plug the python in at a higher level (Stream instead of File). I am hoping that doing that might be easier as the Stream class has a simpler interface, and already supports multiple backing implementations (StreamFile, StreamString, ...).

Also, doing that would allow to side step some complicated questions. One of the reasons why getting rid of FILE* is so complicated (you're not the first person to try that) is that there are some APIs (libedit mainly), that we just cannot change, and which require a FILE*.

I saw that. My strategy for dealing with that was to audit the codebase for any use of File::GetStream(). I found the only two places I could not remove the use of GetStream() was libedit and IOHandlerCursesGUI. In my prototype, I deal with that by checking for NULL from GetStream() before libedit or IOHandlerCursesGUI are enabled. In other words, If a File can produce a FILE*, it will. But you can still have a valid File that will return NULL from GetStream. If you set your debugger streams to Files that return NULL from GetStream, then libedit and the curses GUI will be disabled. I think this is a reasonable approach. For my use-case in particular, there is no need for either libedit or the curses gui, because the whole point is to use iPython as the gui. In general, libedit and curses only really make sense if the IO streams are a terminal anyway, so it’s not a problem to disable these features if the IO streams are redirected to python.

Ok, that also sounds like a reasonable position to take. Might be the only reasonable position, even. Theoretically, one might try to go the extra mile and try to synthesize a FILE* using fopencookie et al. on platforms that support that (the only platforms that support libedit and curses also happen to have a fopencookie equivalent). That's probably overkill now, but it is nice to have that option open for the future.

If you do want to go with the more general change, then I'd like to ask you to give a bit more detail about the your vision of the new role of the lldb_private::File class and its interaction with other major lldb components (SBFile, StreamFile, ???). My understanding (it's been a while since I looked at this in detail) is that the File class can be constructed from both FILE* and a file descriptor and (crucially) it is also able to give back these underlying objects, including converting between the two. Now, I am assuming you're intending to add a third method of constructing a File object (using some python callbacks), but I assume that (due the mentioned lack of funopen etc.) you won't be trying to convert between these types. So, it would be good to spell out what exactly does the File class promise to do, and what happens when (e.g) a pythonified File object makes its way to code (libedit) which requires a FILE*.

OK. My vision for File is that it’s main promise is to implement File::Read and/or File::Write. Files can be constructed from descriptors, or FILE* streams, and in that case they should be able to give those underlying objects back. But files may also be constructed in other ways. Clients should avoid calling GetDescriptor() or GetStream() if they can help it. If they can’t help it, such as in the case of libedit or IOHandlerCursesGUI, then they should check that they got a valid descriptor or stream before proceeding.

Files may also implement seek and tell, or not. If not they should return an “operation not supported” error from Seek() and Tell() and the versions of Read() and Write() that take offsets.

Ok, this all sounds perfectly reasonable, but thanks for spelling that out. Now we have this description ready to attach to as a comment in one of the patches. :slight_smile:

I think the only remaining thing that bothers me about all of this is the proliferation of shared pointers. Right now, each StreamFile object holds a lldb_private::File instance as a member (so it is uniquely owned). Your patches change this to shared_ptr<File>, which means that now we can have multiple StreamFiles sharing ownership of a single File object. Since Stream objects are already passed around as shared pointer, this seems like it gives us more flexibility (== opportunity to mess things up) than we really need. I kind of get why that might be necessary, and I can imagine that the only reason we did not need that so far is because the File class allows you to "cheat" and create multiple File instances pointing to a single FILE* (as long as at most one of them owns that FILE*).

However, I still can't escape the feeling that there should be some way to avoid that. Since you're now probably most familiar about these classes, what do you think about all of this?

regards,
pl

A bit of a tangent, but I've been getting requests to debug Python and C++ together. Things like TensorFlow start in Python, then call into C++ libraries. Users want to be able to debug the Python code as Python (not debugging into Python itself), then step into the C++ libraries. They want to go up and down the stack, switching languages as needed. "What was my Python code doing when it called into the library. Now what is the library doing?"

I don’t think the shared_ptr are avoidable, once SBFile is introduced as an API. If the user creates a SBFile, and then assigns it to the debugger as the output file, then both the user script
and the debugger both have a reference to it. There has to be a shared_ptr somewhere, right? Unless the user script implements some kind of move semantics where assigning the file to the debugger means the user script loses access to it. And I don’t like that idea — move semantics are pretty alien to python, and it would be less useful anyway. debugger.GetOutputFile().Flush() should do what it looks like it does.

However, it may be possible to get rid of the shared_ptr. I’ll look into that.

Python does have a tool called faulthandler that can produce python stack traces when the interpreter experiences a segfault. I wonder if there’s a way for LLDB invoke that to get python backtraces in the debugger?