About lldbHost

After https://reviews.llvm.org/D29964, we finally have a starting point at which we can begin unravelling the cross-project cyclic dependencies in LLDB. lldb/Utility now is very similar in spirit to llvm/Support.

But llvmSupport goes one step further and includes what lldb would normally put under Host. I think this makes some sense. Practically all parts of a codebase have need of a single OS abstraction layer. So, I think that a lot of the functionality currently in lldbHost is in fact needed by the rest of LLDB.

So, I wonder if it makes sense to follow the path that LLVM has taken, and start moving some of this code from Host down to Utility. Doing so would allow us to break one of the biggest links in the dependency cycle in the entire codebase, which is that Host depends on everything, and everything depends on Host.

Of course, it can’t just be a straight move. Some things in Host interact with Target, with CommandInterpreter, and with many other things. And stuff going into Utility can’t take a dependency.

So there will be some splitting, some moving, some refactoring, etc. But to me tackling Host seems like the logical next step, in large part because Host is where FileSpec is, and it’s going to be hard to break any dependencies without first addressing FileSpec.

The way LLVM handles cross-platform differences in Support is that you include a single header and it does conditional includes into a platform specific subdirectory for the parts that differ.

I’m thinking to follow the same pattern here in lldb/Utility, and begin looking for ways to get pieces of Host down into Utility this way, until ultimately I can get FileSpec down there.

Thoughts?

I agree that the next module that needs figuring on is the host one.
However I am not sure if the decision on what to put in what module
should be motivated by the FileSpec class, as I think it's current
interface has a some issues, and our choice on how to resolve them can
greatly affect what things it depends on.

The main thing that bugs me with FileSpec is that it is used for a two
distinct purposes:
- manipulations on abstract path names (e.g. PrependPathComponent)
- manipulations of actual files present on the host (e.g. MemoryMapFileContents)
For the first one you don't need the files to exist on the host (in
fact they may not even use the same path syntax as the host). For the
other one, they *must* exist and *must* use the same path syntax. This
is currently not very well separated and enforced, and I think it
should be. I believe this is the reason the FileSystem pseudo-class
was created.

So, my counter-proposal would be to finish moving the host-specific
things out of the FileSpec class (basically just replace
file_spec.Foo(...) with FileSystem::Foo(file_spec, ...). At that point
FileSpec should only depend on string manipulation functions, and we
should be able to move it without much hassle. After that, we can take
another look and decide what to do next.

The thing I really like about this idea is that we will end up with
two classes that very closely mirror llvm functionality (FileSpec is a
version of Support/Path.h that does not assume host path syntax,
FileSystem is similar to Support/FileSystem.h, but it supports some
more fancy operations like mmap()). Then we could proceed to merge
this functionality with llvm pretty much independently of any other
refactoring we will be doing.

What do you think?

Yes, in fact that mirrors how i had planned to tackle this.

The question is, can we put in Utility code that is separated by platform, which typically has been for Host? This mirrors what llvm does

Right, I see. In that case, I guess my response would be "let's wait
and see how things look like after FileSpec is moved".

I kinda like how we have the all the host code in a separate module (I
expect we will have a lot more of it than llvm), but I am not against
it if it turns out there is no other way to organize dependencies. I
just don't think we've reached that point yet -- right now I don't see
why we couldn't have FileSpec in Utility and FileSystem in Host.

Or have you thought ahead and found a problem already?

Having FileSpec in Utility and FileSystem in Host would work for now.

Any opinions on the FileSpec interface? Llvm’s path and file system libraries use free functions for everything. in a perfect world I feel that’s the best design, and with lldb we could even do slightly better and make all functions pure (returning copies instead of mutating) since everything is ConstString.

If we were starting over from scratch this would definitely be the way to go imo, but I’m not sure if it’s worth the effort now. What do you think?

I prefer free functions as well, but I think switching FileSpec to
that would be a non-trivial task. If you want to try it out I would
encourage it, but I don't think it's a requirement.

I would prefer things stay object oriented when possible. I don't think it adds anything by making everything a static function. It is ok where it makes sense, but I would vote to add static functions in the FileSpec class over making purely stand alone functions as they indicate that it should be used on a FileSpec object. It is also ok to have both methods and static functions where the methods (like AppendPathComponent() can call through to the static versions if needed.

We also have the "File.h" class that defines lldb_private::File. This is the object that wraps the access to the data in the file (the file descriptor integers and the "FILE *" abstraction. It would be interesting to move any functions in lldb_private::FileSpec, like the mmap functions, over into lldb_private::File where needed. And then see if there is even a need for the FileSystem class at all. If there is we can divvy it up as needed.

I would say lets start with purely moving things around and get the file locations all set and the correct abstractions where FileSpec is for representing a file specification efficiently and all operations on that affect file spec mutations, moving anything that is accessing file contents over into lldb_private::File, and then seeing if we have any functions left over and if we even need FileSystem.

That being said I generally find it useful to do something like:

FileSpec f("/tmp/a.txt");
if (f.Exists())
   ...

It seems like the suggestions would be to move to something like:

FileSpec f("/tmp/a.txt");
if (File::Exists(f))
   ...

Both are fine, but the latter is much less discoverable. Think about when you use IDEs with auto completion. So while it might be cleaner from a code linking standpoint, I don't think it actually improves the code for users. I much prefer the object oriented methodology and I would prefer code linking considerations don't override the clean API we currently have.

I think the main improvement that it provides is that you need something which represents only a path. Because we use FileSpecs to refer to paths on remote machines, and then what does it mean to call Exists or Resolve? So, we could have something like PathSpec and then have FileSpec contain a PathSpec, and PathSpec’s only purpose would be to know about the grammar of a path.

I agree though, in the spirit of doing things incrementally, purely moving things around and not making major substantive changes is a good starting point and reduces the scope of the problem.

I think the main improvement that it provides is that you need something which represents only a path. Because we use FileSpecs to refer to paths on remote machines, and then what does it mean to call Exists or Resolve? So, we could have something like PathSpec and then have FileSpec contain a PathSpec, and PathSpec’s only purpose would be to know about the grammar of a path.

Agreed. Are you ok with starting with moving as much stuff from FileSpec over into File to start with?

I agree though, in the spirit of doing things incrementally, purely moving things around and not making major substantive changes is a good starting point and reduces the scope of the problem.

Agreed.

Yes. In some cases I want to try deleting LLDB’s implementation and seeing if we can fall back on LLVM. For example, MemoryMapFileContents comes to mind. LLVM has some code to memory map files as well. I’ve spent 0 time looking into how widely FileSpec::MemoryMapFileContents is used, so it’s possible that after 1 minute I realize it’s not possible / too much work. But I’d like to at least do due diligence and see if there’s any parts of the FileSpec interface that can be removed in favor of LLVM classes first.

Yes. In some cases I want to try deleting LLDB’s implementation and seeing if we can fall back on LLVM. For example, MemoryMapFileContents comes to mind. LLVM has some code to memory map files as well. I’ve spent 0 time looking into how widely FileSpec::MemoryMapFileContents is used, so it’s possible that after 1 minute I realize it’s not possible / too much work. But I’d like to at least do due diligence and see if there’s any parts of the FileSpec interface that can be removed in favor of LLVM classes first.

I am fine with switching mmap over to llvm if it works. One important thing to LLDB is we have a “mmap if not on remote file system” that must continue to work. If you mmap something from a network drive and then it gets disconnected, you can crash LLDB. So we have a function we used that implements mmap if local, read all contents if remote, that must work to avoid crashes.

Greg

I don't think File is a good place for at least some of these
functions. If File represents an open file, then at the point where
you already have an open file, it's not really worth asking whether it
exists. What makes more sense is to have an external entity (e.g. a
FileSystem) that you give an abstract pathname (FileSpec) to and it
tells you whether that pathname exists within the context of the
entity. Then you tell that entity "please open me this FileSpec" and
it gives you an object representing an open file (File class). In the
case of remote paths you would not ask the FileSystem class but maybe
the Platform class, but the API would be the same.

Similarly for Resolve: I think of it as an operation that takes an
abstract path and returns another abstract path, resolved within the
context of some external entity.

I can imagine MemoryMapFileContents to live within the File class, as
you have to have the file open to mmap it, but I don't think a
filesystem would be a bad place either.

I don't believe this make code completion harder. E.g. you can do
llvm::sys::fs::<tab> to see all the file manipulation functions in
llvm even though they are free functions.

LLVM's MemoryBuffer API already serves too many different use cases.
Initially it was designed to be a utility for efficiently reading source
file inputs. It has a bunch of functionality around ensuring that the
buffer is null terminated, and a boolean to avoid using mmap when the user
might modify the file concurrently. It's too complicated. I wouldn't
recommend using it without a good reason beyond just reusing a platform
abstraction. mmap and MapViewOfFile are not that complicated. LLDB is
probably better off doing its own thing unless it needs to pass mapped file
contents to other parts of LLVM, like maybe clang's VFS.

My main complaint here is if we have a FileSpec and I want to do some operation on it, I have no idea to look in llvm::sys::fs::…

I just took a look and it seems like almost a drop in replacement. Only thing that would need changing is updating shouldUseMmap() to return false if a file is on a network share. But this seems like a good change independently of lldb.

After that all lldb has to do is say MemoryBuffer::getOpenFile() and everything works.

Is there a good reason not to use it? Evenif internally the implementation is complex, the external interface is not

The other thing is on Mac we add new flags to mmap that allow us not to crash if the backing store (network mount) goes away. There is also a flag that says "if code signature is borked, please still allow me to read the file without crashing. That functionality will need to be ported into the LLVM code somehow so we don’t start crashing again. Previously we would crash if someone would do:

(lldb) file /tmp/invalid_codesig_app

And also if the network mounted share would go away on something that was mmap’ed and someone would touch any byte within it. Our version of mmap works around this and we need that to be in any version we adopt.

Greg

If you only ever call MemoryMapContentsIfLocal, then is that first flag even doing anything? And if you are passing that flag, then can you just mmap it always even if non-local?

I searched the code and only in the windows minidump plugin are we unconditionally mmaping, and that could be changed to a conditional-on-local just like everywhere else. If we do that, then nobody is ever mmaping any non-local file AFAICT

If you only ever call MemoryMapContentsIfLocal, then is that first flag even doing anything? And if you are passing that flag, then can you just mmap it always even if non-local?

If that is the only call people are using, then we don’t really need the flag. Not sure how else as local file could go away such that the backing store wouldn’t be available. mmap() on unix will keep the file around as long as its needed AFAIK, so even if someone deletes it, it would be kept around. So if those are our only clients we should be ok. The code signing bit would need to be added for Mac though.

I searched the code and only in the windows minidump plugin are we unconditionally mmaping, and that could be changed to a conditional-on-local just like everywhere else. If we do that, then nobody is ever mmaping any non-local file AFAICT

That currently is true, but it would be shame to lose the ability to be resilient in cases where you do want to mmap. If a file is too large, we really should probably have an upper end cutoff on file size that would mmap even if remote. If we have a 4 GB file that we want to access, probably not a great thing to just pop into a heap based memory buffer…

Greg

Yea, the flag seems like one you would want to use almost always, so I’m not opposed to having it. I’ll see about making the changes in LLVM, even if we end up not using it in LLDB, they seem useful in LLVM independently.

BTW, one difference in LLVM’s mmap code is that in LLDB we always use MAP_PRIVATE, but in LLVM if you are opening for readwrite it uses MAP_SHARED. Are you against using MAP_SHARED when mmaping with readwrite privileges?

Yea, the flag seems like one you would want to use almost always, so I’m not opposed to having it. I’ll see about making the changes in LLVM, even if we end up not using it in LLDB, they seem useful in LLVM independently.

BTW, one difference in LLVM’s mmap code is that in LLDB we always use MAP_PRIVATE, but in LLVM if you are opening for readwrite it uses MAP_SHARED. Are you against using MAP_SHARED when mmaping with readwrite privileges?

You will need to do some testing. If you do MAP_SHARED and the file goes away, you might crash as it won’t keep the file around as long as it needs it. I could be wrong on this. But I do remember explicitly picking MAP_PRIVATE for some reason in the past…