Getting ConnectionFileDescriptor working on Windows

I’m trying to get ConnectionFileDescriptor working on Windows. Because select() has completely different semantics on Windows as on posix, it won’t work for this purpose, so It hink the best approach is to re-write ConnectionFileDescriptor for windows. I’ve already submitted a patch that moves CFD to Host so that I can provide a different implementation for Windows.

However, I’m not sure if just providing a Windows implementation is sufficient, due to the very nature of this class. In short, the things you can do with a generic file descriptor on Windows are very different than the things you can do with a file descriptor on posix. And unfortunately, certain assumptions about what you can do with a file descriptor are baked into the code that uses CFD as well as CFD itself (after all, its design alone implies that all these different use cases of file descriptors are interchangeable).

One of the more awkward things to deal with is that CFD provides a constructor that takes a file descriptor. Inside, it assumes that this fd represents an actual file, but in practice we sometimes give it other things to this constructor.

I’ve broken down all the cases of where we use this constructor as follows:

SBCommunication::AdoptFileDesriptor - passes in whatever fd the user decides to give it, and treats it as a file.

ScriptInterpreterPython::ExecuteOneLine - passes in the fd of a pipe, CFD doesn’t have any way to know it’s a pipe though, and treats it as a file.

GDBRemoteCommunicationServer::SetSTDIOFileDescriptor - Not sure, I think this is always an actual file, but maybe not.

Process::SetSTDIOFileDescriptor - Same as before, I think this is always an actual file, but maybe not.

I’m not really sure how to deal with this on Windows. We need to know what it actually is. Furthermore, file descriptors in general don’t provide the necessary functionality required to be able to implement interrupting as used by CFD. For that we need the native OS type for whatever these objects are. For sockets that means an object of type SOCKET. For pipes and files it means a HANDLE. “Here’s an fd, do something with it” just isn’t enough of, or the right kind of information to be able to do the right thing.

AFAICT I only need to deal with the above 4 cases.

The first case I can deal with by #ifdefing it out on Windows returning an error that says “sorry, not supported on Windows”.

For the second case, I’m not sure what to do because I don’t actually understand how this function works or why this level of complexity is required.

For the third and fourth cases, maybe I can just assume they’re files?

I was thinking about splitting CFD into multiple classes, one for each type of descriptor. A FileConnection, PipeConnection, TcpSocketConnection, ListeningConnection, etc. Then, the caller would have to instantiate the right kind. This has its own set of problems, but at least seems to solve the problem of requiring the creator of the CFD to specify what they’re actually giving us. On posix in cases where it’s user specified or don’t know, it seems we could just create a FileConnection and that would be equivalent to the current behavior, since it seems to treat everything the same anyway.

thoughts?

The more I think about this, the more I feel like it's the right approach
(it might even be the only approach that even works, I can't really come up
with anything else that solves the problem, much less does it nicely).
We've already got cases where people create a ConnectionFileDescriptor of a
specific type, and use it in a way that would break if it weren't of that
type. Separating out the cases into different classes this way would allow
these cases to be cleaner, as many methods that are publicly exposed on
CFD, and some of the class members as well are specific to the type of fd
being wrapped. So the interfaces of the other types could become cleaner
as well.

Hey Zachary,

On posix in cases where it’s user specified or don’t know, it seems we could just create a FileConnection and that would be equivalent to the current behavior, since it seems to treat everything the same anyway.

What would the instantiation sites for these classes look like? Would it be a call into some kind of Host level “'Create{Socket,File,Pipe}(…)” factory method that knows what kind of underlying object to create based on that? In which case you can fork off to the right versions for Windows, while POSIX-y systems would (as you noted) be able to use a single file-descriptor-based impl?

Unless you’re using a connection string, the instantion site would just instantiate the exact thing it wants. e.g.

ConnectionTcpListen *listener = new ConnectionTcpListen();
listener->Connect();
listener->GetBoundPort();

something like that.

For the connection string case, you would call a factory method that returns a Connection *

Unless you’re using a connection string, the instantion site would just instantiate the exact thing it wants. e.g.
ConnectionTcpListen *listener = new ConnectionTcpListen();

Ok - so these classes will exist across all OS builds, where on Windows you get impls that have different details for each derived class, and on POSIXy items the derived classes are either derive with no impl different from base class, or typedefs (?) that map to the single base class, and everything just works. Is that about right?

I’m just looking at it from the angle of shared code using those new classes when they’re not using the connection string approach.

Sounds reasonable to me.

That’s correct. ConnectionTcpListen in particular will probably have a different impl on posix, because that one use case is the only reason for the GetBoundPort() method. So removing all of the bound port logic from the interface and implementation of the other types of connections seems desirable.

The concept all sounds good to me.

I suppose as long as the constructors/factories that create these make it hard for the non-Windows platforms to accidentally introduce shared code that really uses the wrong constructor (which at the file-descriptor level might be entirely not obvious and hidden on non-Windows), then that would be good. Otherwise I could see you having run-time breaks that only show up on Windows but merrily work on other platforms.

(By wrong constructor I mean “wrong class” — for instance if the Posix ones actually took file descriptors in a constructor for any/all of the impls on Posix-y systems, it would be easy to construct the wrong kind of class in a POSIX environment and aside from the bound port part, it would be hard to tell the wrong derived class was being used).

I plan to define some additional typedefs similar to lldb::process_t and lldb::pid_t. They will be lldb::pipe_t, lldb::socket_t, and lldb::file_t. On posix these will all just be typedefed to int. On Windows they will be typedefed to HANDLE, SOCKET, HANDLE, respectively.

That should address that, although Windows may go through some pains getting this to work at first since right now fds are pretty prevalent in the code. That’s the idea anyway. If it doesn’t work, as a fallback we can create a fd based constructor on Windows too and only use it in the cases where it’s too difficult to propagate the native type up as far as it needs to go, but mark it deprecated and encourage people to use the appropriate native types going forward.

BTW, in case it wasn’t clear, these new typedefs would be used in the constructors of the derived classes, and not just int. This way if someone created the wrong type of derived class, it would break the build on Windows.

Ok cool yeah I’m just looking for a way to know about semantic errors before runtime. Sounds good to me.

NOTE: I haven't read the other responses to this thread, so I am going to start with this email and work my way through. More comments below.

I'm trying to get ConnectionFileDescriptor working on Windows. Because select() has completely different semantics on Windows as on posix, it won't work for this purpose, so It hink the best approach is to re-write ConnectionFileDescriptor for windows. I've already submitted a patch that moves CFD to Host so that I can provide a different implementation for Windows.

However, I'm not sure if just providing a Windows implementation is sufficient, due to the very nature of this class. In short, the things you can do with a generic file descriptor on Windows are very different than the things you can do with a file descriptor on posix. And unfortunately, certain assumptions about what you can do with a file descriptor are baked into the code that uses CFD as well as CFD itself (after all, its design alone implies that all these different use cases of file descriptors are interchangeable).

One of the more awkward things to deal with is that CFD provides a constructor that takes a file descriptor. Inside, it *assumes* that this fd represents an actual file, but in practice we sometimes give it other things to this constructor.

I've broken down all the cases of where we use this constructor as follows:

SBCommunication::AdoptFileDesriptor - passes in whatever fd the user decides to give it, and treats it as a file.

This is in the public API, so this needs to stay here for non windows builds. On windows we could add a new windows only API that uses a HANDLE if needed, or abstract this in some way through the public interface.

ScriptInterpreterPython::ExecuteOneLine - passes in the fd of a pipe, CFD doesn't have any way to know it's a pipe though, and treats it as a file.

This makes generic use of the CFD that uses a FD. Since we need to interface with the native python layer, we will need to do whatever we need to do to give python a file handle it can use for in/out/err. This probably can be special cased for windows to make it work as needed.

GDBRemoteCommunicationServer::SetSTDIOFileDescriptor - Not sure, I think this is always an actual file, but maybe not.

Process::SetSTDIOFileDescriptor - Same as before, I think this is always an actual file, but maybe not.

These are all internal, so we can re-architect as needed.

I'm not really sure how to deal with this on Windows. We need to know what it *actually* is. Furthermore, file descriptors in general don't provide the necessary functionality required to be able to implement interrupting as used by CFD. For that we need the native OS type for whatever these objects are. For sockets that means an object of type SOCKET. For pipes and files it means a HANDLE. "Here's an fd, do something with it" just isn't enough of, or the right kind of information to be able to do the right thing.

the lldb_private::Connection class is currently a very simple abstraction for reading/writing something that would go over a wire to talk to a remote debug client, but it doesn't provide interruption.

AFAICT I only need to deal with the above 4 cases.

The first case I can deal with by #ifdefing it out on Windows returning an error that says "sorry, not supported on Windows".

Or adding a new windows specific "HANDLE" variation for windows only and then make sure it interfaces with the internal classes (whatever we do with CFD).

For the second case, I'm not sure what to do because I don't actually understand how this function works or why this level of complexity is required.

Again, it will need to interface with the native python layer on windows. I don't know what that looks like or if they change any files around. Sometimes the current debugger has the IOHandler to to stdin/out/err and it needs to just use that, other times we are executing python code and want to catch the output so we can copy it into the result object in case the python code prints any output. So it needs to be generic enough to work. Everything right now on the IOHandler stack uses "FILE *". I really don't want to change that. Windows has an abstraction for "FILE *" that works, so I would just try to make things work on windows using the "FILE *", or there will be very major re-architecture required.

The main reason for you wanting to change things is because there is no select on windows?

For the third and fourth cases, maybe I can just assume they're files?

Both of these two could just be changed to accept a lldb_private::Connection subclass. Easy fix.

I was thinking about splitting CFD into multiple classes, one for each type of descriptor. A FileConnection, PipeConnection, TcpSocketConnection, ListeningConnection, etc. Then, the caller would have to instantiate the right kind. This has its own set of problems, but at least seems to solve the problem of requiring the creator of the CFD to specify what they're actually giving us. On posix in cases where it's user specified or don't know, it seems we could just create a FileConnection and that would be equivalent to the current behavior, since it seems to treat everything the same anyway.

Feel free to do this on windows, but please leave the POSIX stuff alone. All file descriptor calls mostly just work with each other. There is no need to split these out as they are all related. They could be split out, but then a very intelligent base file descriptor class would be needed and then just the special cases can be broken out. I would rather not have to go to 5 different classes and change all of the implementations if they are all the same (open, close, read/write, etc).

The main reason for wanting to change things is because there is no select on windows (technically there is, but it works completely differently), and because just a file descriptor isn’t enough information to make ConnectionFileDescriptor work for all the required types of objects. The implementation details are completely different whether you have a pipe, socket, disk file, or something else.

I think the way to do everything with the lowest possible impact change on posix is the following (this was mentioned earlier in the thread but since you started with the OP i’ll rehash the summary):

  1. Make separate classes for Windows. ConnectionGenericFile, ConnectionPipe, ConnectionSocket, ConnectionSocketAccept, etc.
  2. On Posix, keep ConnectionFileDescriptor, but add ConnectionSocketAccept which derives from ConnectionFileDescriptor and the accept-specific stuff goes in the derived class. No other posix changes. Might need to change the name from ConnectionFileDescriptor to ConnectionGenericFile, for syntax-parity so that any platform can instantiate it regardless of the platform.
  3. In Connection (the base interface), add a static method called CreateGenericConnection which takes a connection string. Essentially this is a factory method. This function has an #ifdef _WIN32 which will look at the connection string and create the approprioately derived type. On Windows this will create either a ConnectionGenericFile, ConnectionPipe, ConnectionSocket, ConnectionSocketAccept. On non-windows it will create either a ConnectionFileDescriptor or a ConnectionSocketAccept.
  4. Any call-sites that currently instantiate a CFD and call Connect() with a string will be updated to use the factory method.
  5. Any call-sites that instantiate a CFD with the fd constructor where the fd comes from the user, or where there’s not enough information to know specifically what type it is will use ConnectionGenericFile (windows) or ConnectionFileDescriptor (posix).
  6. Any call-sites that instantiate a CFD with the fd constructor but where we know what type it is will explicitly instantiate the derived type.

The creation of the ConnectionSocketAccept on posix is necessary because it doesn’t have the same interface as the others. It has GetBoundPort(). But we only use this in cases where we know in advance it’s a listening socket, so there should be no issue here.