RFC: A virtual file system for clang

Hi all,

I’ve been hacking on a virtual file system for clang and this seemed like the right time to start getting some feedback. Briefly, the idea is to interpose a virtual file system layer between llvm::sys::fs and Clang’s FileManager that allows us to mix virtual files/links/etc. with the ‘real’ file system in a general way.

Motivation

The use case that I have in mind is to allow a build system to provide a file/directory layout to clang without having to construct it “for real” on disk. For example, I am building a project containing two modules, and module A imports module B. It would be useful if we could bundle up the headers and module.map file for module B from wherever they may exist in the source directories and provide clang with a notion of the file layout of B as it will be installed. Right now, I know of two existing ways to accomplish this:

  1. Copy the files into a fake installation during build. This is unsatisfying, as it requires tracking and copying files every time they are changed. And diagnostics, debug info, etc. do not refer back to the original source file.

  2. Header maps provide this functionality for header files. However, header maps work from within the header search logic, which does not extend well to other kinds of files. They are also insufficient for bundling modules, as clang needs to see the framework for the module laid out as described in the module map.

Description

The idea is to abstract the view of the file system using an AbstractFileSystem class that mimics the llvm::sys::fs interface:

class AbstractFileSystem {
public:
class Status { … };
// openFileForRead
// status, and maybe ‘stat’
// recursive iteration
// getBuffer
// getBufferForOpenFile
// recursive directory iteration
};

that can be implemented by any concrete file system that we want. Clients that want to lookup files/directories (notably the FileManager) can operate on an AbstractFileSystem object. One leaky part of this interface is that clients that need to care whether they are working with a ‘real path’ will need to explicitly ask for it. For example, debug information and diagnostics should ask for the real path. I suggest putting that information into the AbstractFIleSystem::Status object.

Some non-goals (at least for a first iteration):

  1. File system modification operations (create_directory, rename, etc.). Clients will continue to use the real file system for these operations, and we don’t intend to detect any conflicts this might create.
  2. Completely virtual file buffers that do not exist on disk.

One implementation of the AbstractFileSystem interface would be a wrapper over the ‘real’ file system, which would just defer to llvm::sys::fs.

class RealFileSystem : public AbstractFileSystem { … };

And to provide a unified view of the file system, we can create an overlay file system, similar to [1].

class OverlayFileSystem : public AbstractFileSystem { … };

To support a build system providing clang with a virtual file layout, we could add an option to clang that accepts a file describing the layout of a virtual file system. In a first iteration, this could be a simple json file describing the mapping from virtual paths to real paths, and a corresponding class VFSFromJSONFile : public AbstractFileSystem. Later we can evolve a more efficient binary format for this. In addition we should provide functions in libclang to produce these files.

I would appreciate any feedback you might have,

Ben

[1] https://git.kernel.org/cgit/linux/kernel/git/mszeredi/vfs.git/plain/Documentation/filesystems/overlayfs.txt?h=overlayfs.current

Also, thanks to everyone who has already given me feedback thus far.

I’d just like to point out that this might be the right time to fix the handling of unicode paths on windows. Calls to ::open, ::stat and friends don’t support utf8 encoded input on windows http://llvm.org/bugs/show_bug.cgi?id=10348

Nikola, thanks for bringing the issue to our attention, but let’s decouple the windows issue from this proposal. It would be good to focus the discussion on the enhancement we are proposing, without mixing general file system issues.
The windows issue can still be addressed separately.

For NetBSD's GCC, I added the -iremap option to deal with __FILE__ at
least. This is useful for a number of applications that go beyond "map
prefix X to path Y". In pkgsrc, we filter the visiblity of headers to
avoid configure programs picking up dependencies that are not desirable.

Joerg

Hi Ben,

ah, the good old discussion :slight_smile: To get some more context in, see http://lists.cs.uiuc.edu/pipermail/llvmdev/2011-November/045657.html (the discussion goes into December, which the web interface doesn’t seem to be able to cope with). More comments inline.

Manuel Klimek wrote:

Hi Ben,

ah, the good old discussion :slight_smile: To get some more context in, see
http://lists.cs.uiuc.edu/pipermail/llvmdev/2011-November/045657.html (the
discussion goes into December, which the web interface doesn't seem to be
able to cope with). More comments inline.

Prefer to share a gmane link. Not only for the above reason, but because it
gives a threaded view.

http://thread.gmane.org/gmane.comp.compilers.clang.devel/18567

Thanks,

Steve.

Hi Manuel,

Hi Ben,

ah, the good old discussion :slight_smile: To get some more context in, see http://lists.cs.uiuc.edu/pipermail/llvmdev/2011-November/045657.html (the discussion goes into December, which the web interface doesn’t seem to be able to cope with). More comments inline.

Looks like I have some reading to do :slight_smile:

I mean HeaderMap in the HeaderSearchOptions, which is somewhat similar but is considered only during header search, not at the file system layer.

The RemappedFileBuffers is closer to the infrastructure we would need to map files, but it doesn’t allow recursive directory iteration, which is another requirement for handling modules. We could always teach FileManager to support more operations, but increasing the duplication between FileManager and sys::fs seems like a bad idea.

How do you imagine changing clients that currently expect to be able to get a file descriptor? Do you remove that concept and provide only higher-level APIs, like “getBuffer” and “getRawOstream”, or create some opaque file descriptor that can be returned from openFileForReading and openFileForWriting? The latter seems like it doesn’t need to be built in from the start, since we can continue to have the usual file descriptor APIs, and update clients later when we change what a file descriptor is. That’s what I was imagining, but you may have a better idea. Also, even if adding fully virtual files doesn’t make a first iteration harder to implement, what about testing it?

Can you expand on this? Dumb question: why can’t we just ask the OS for a canonical path and work from that?

Thanks,

Ben

Hi Manuel,

Hi Manuel,

Hi Ben,

ah, the good old discussion :slight_smile: To get some more context in, see
http://lists.cs.uiuc.edu/pipermail/llvmdev/2011-November/045657.html (the
discussion goes into December, which the web interface doesn't seem to be
able to cope with). More comments inline.

Looks like I have some reading to do :slight_smile:

Hi all,

I’ve been hacking on a virtual file system for clang and this seemed like
the right time to start getting some feedback. Briefly, the idea is to
interpose a virtual file system layer between llvm::sys::fs and Clang’s
FileManager that allows us to mix virtual files/links/etc. with the ‘real’
file system in a general way.

Motivation

The use case that I have in mind is to allow a build system to provide a
file/directory layout to clang without having to construct it “for real” on
disk. For example, I am building a project containing two modules, and
module A imports module B. It would be useful if we could bundle up the
headers and module.map file for module B from wherever they may exist in
the source directories and provide clang with a notion of the file layout
of B _as it will be installed_. Right now, I know of two existing ways to
accomplish this:

1) Copy the files into a fake installation during build. This is
unsatisfying, as it requires tracking and copying files every time they are
changed. And diagnostics, debug info, etc. do not refer back to the
original source file.

2) Header maps provide this functionality for header files. However,
header maps work from within the header search logic, which does not extend
well to other kinds of files. They are also insufficient for bundling
modules, as clang needs to see the framework for the module laid out as
described in the module map.

By "Header map", do you mean the RemappedFileBuffers in
PreprocessorOptions? This seems to actually be enough for us to map files
even in the presence of modules (at least in C++, I don't know how
different Framework handling in Obj-C is here).

I mean HeaderMap in the HeaderSearchOptions, which is somewhat similar but
is considered only during header search, not at the file system layer.

The RemappedFileBuffers is closer to the infrastructure we would need to
map files, but it doesn’t allow recursive directory iteration, which is
another requirement for handling modules. We could always teach
FileManager to support more operations, but increasing the duplication
between FileManager and sys::fs seems like a bad idea.

I'm fully with you here, just wanted to make sure all related issues are
clear...

Description

The idea is to abstract the view of the file system using an
AbstractFileSystem class that mimics the llvm::sys::fs interface:

class AbstractFileSystem {
public:
  class Status { … };
  // openFileForRead
  // status, and maybe ‘stat'
  // recursive iteration
  // getBuffer
  // getBufferForOpenFile
  // recursive directory iteration
};

that can be implemented by any concrete file system that we want. Clients
that want to lookup files/directories (notably the FileManager) can operate
on an AbstractFileSystem object. One leaky part of this interface is that
clients that need to care whether they are working with a ‘real path’ will
need to explicitly ask for it. For example, debug information and
diagnostics should ask for the real path. I suggest putting that
information into the AbstractFIleSystem::Status object.

Some non-goals (at least for a first iteration):
1) File system modification operations (create_directory, rename, etc.).
Clients will continue to use the real file system for these operations,
and we don’t intend to detect any conflicts this might create.
2) Completely virtual file buffers that do not exist on disk.

I'd vote for making that an explicit goal; two reasons:
1. I don't think it'll make a first iteration harder to implement
2. saying that we'll do things like that later will almost certainly make
it super-hard to do later

For us, the ability to have virtual file buffers that do not exist on disk
is one of the core requirements we have for all our tools; I think in a
more and more network-based world this will also become more necessary in
general in the future.

How do you imagine changing clients that currently expect to be able to
get a file descriptor? Do you remove that concept and provide only
higher-level APIs, like “getBuffer” and “getRawOstream”, or create some
opaque file descriptor that can be returned from openFileForReading and
openFileForWriting? The latter seems like it doesn’t need to be built in
from the start, since we can continue to have the usual file descriptor
APIs, and update clients later when we change what a file descriptor is.
That’s what I was imagining, but you may have a better idea. Also, even
if adding fully virtual files doesn’t make a first iteration harder to
implement, what about testing it?

Well, that's an interesting question :slight_smile: So, do you want a virtual file
system that we can plug below the file manager pretty much "as-is", and it
just works? In that case, I'd guess we need to do the latter. I'm also not
sure which clients we might be able to convert later.

If you don't want to provide a vfs below the file manager, where do you
want to use it? Currently, all Tooling/ stuff relies on being able to use
the file overlaying logic to inject into PPOptions / SourceManager /
FileManager to allow (nearly) fully file system independent replays of
compilations. Would you propose to break this behavior as part of the
transition?

One implementation of the AbstractFileSystem interface would be a wrapper

over the ‘real’ file system, which would just defer to llvm::sys::fs.

class RealFileSystem : public AbstractFileSystem { … };

And to provide a unified view of the file system, we can create an
overlay file system, similar to [1].

class OverlayFileSystem : public AbstractFileSystem { … };

To support a build system providing clang with a virtual file layout, we
could add an option to clang that accepts a file describing the layout of a
virtual file system. In a first iteration, this could be a simple json
file describing the mapping from virtual paths to real paths, and a
corresponding class VFSFromJSONFile : public AbstractFileSystem. Later we
can evolve a more efficient binary format for this. In addition we should
provide functions in libclang to produce these files.

The rest sounds generally good.

One concern I have that has not been brought up is the old problem of
making file operations relative to a directory entry that is taken once at
the start of an operation (imagine starting a compilation from a symlinked
directory, and somebody changing the link). I think this will probably not
a goal for phase 1, but would be nice if we could keep it in the back of
our heads :wink:

Can you expand on this? Dumb question: why can’t we just ask the OS for a
canonical path and work from that?

The main problem is less whether the path is canonical than that an
unrelated happenstance in the file system might lead to inconsistencies in
a compile step.

Imagine a source tree in src-head and one in src-branch.
$ ln -s src-head src
$ cd src
$ clang file.cc &
$ cd ..
$ rm src
$ ln -s src-branch src

Now if the clang process takes a while to run and resolves files it opens
via "absolute" paths rather than relative to the current working dir file
entry, it can see inconsistent state (for example, get a header from
src-branch instead of from src-head).

Cheers,
/Manuel

Hi Manuel,

Some non-goals (at least for a first iteration):
1) File system modification operations (create_directory, rename, etc.).
Clients will continue to use the real file system for these operations,
and we don’t intend to detect any conflicts this might create.
2) Completely virtual file buffers that do not exist on disk.

I'd vote for making that an explicit goal; two reasons:
1. I don't think it'll make a first iteration harder to implement
2. saying that we'll do things like that later will almost certainly make
it super-hard to do later

We don’t have the bandwidth to design / implement / test fully virtual
files.

I'm curious why you think it will be a lot more effort; my gut feeling
would be that this is probably going to be less effort (depending on what
exactly you want to use the VFS layer for) if we don't want to break all of
the Tooling layers in the process.

We also don’t have uses for them (apart from replacing the remapping of
buffers in the SourceManager) so I think this should be driven by someone
that actually needs this and is going to dogfood it.

Apart from that, we are definitely trying to make sure we will not do
anything that will make adding virtual files prohibitively difficult, we
think it can be added on top with maybe some refinements on the interface.
While we are implementing the VFS if you think we are doing something that
will make virtual file buffers “super-hard to do later” please let us know.

Will do.

Cheers,
/Manuel

I definitely want to put this below the FileManager. FileManager would just keep a reference to an AbstractFileSystem (although I’m not sure who should actually own that object), that we use to represent the ‘unified’ file system and do all of its operations through it. Any existing uses of FileManager should continue to work as-is. This way we can phase in the VFS without changing the way overriding files works now.

Hmm, haven’t thought about this one. I guess I agree this is not a near-term goal :slight_smile:

Ben

...

I'd vote for making that an explicit goal; two reasons:
1. I don't think it'll make a first iteration harder to implement
2. saying that we'll do things like that later will almost certainly make
it super-hard to do later

For us, the ability to have virtual file buffers that do not exist on
disk is one of the core requirements we have for all our tools; I think in
a more and more network-based world this will also become more necessary in
general in the future.

How do you imagine changing clients that currently expect to be able to
get a file descriptor? Do you remove that concept and provide only
higher-level APIs, like “getBuffer” and “getRawOstream”, or create some
opaque file descriptor that can be returned from openFileForReading and
openFileForWriting? The latter seems like it doesn’t need to be built in
from the start, since we can continue to have the usual file descriptor
APIs, and update clients later when we change what a file descriptor is.
That’s what I was imagining, but you may have a better idea. Also, even
if adding fully virtual files doesn’t make a first iteration harder to
implement, what about testing it?

Well, that's an interesting question :slight_smile: So, do you want a virtual file
system that we can plug below the file manager pretty much "as-is", and it
just works? In that case, I'd guess we need to do the latter. I'm also not
sure which clients we might be able to convert later.

If you don't want to provide a vfs below the file manager, where do you
want to use it? Currently, all Tooling/ stuff relies on being able to use
the file overlaying logic to inject into PPOptions / SourceManager /
FileManager to allow (nearly) fully file system independent replays of
compilations. Would you propose to break this behavior as part of the
transition?

I definitely want to put this below the FileManager. FileManager would
just keep a reference to an AbstractFileSystem (although I’m not sure who
should actually own that object), that we use to represent the ‘unified’
file system and do all of its operations through it. Any existing uses of
FileManager should continue to work as-is. This way we can phase in the
VFS without changing the way overriding files works now.

Sounds good.

One implementation of the AbstractFileSystem interface would be a

wrapper over the ‘real’ file system, which would just defer to
llvm::sys::fs.

class RealFileSystem : public AbstractFileSystem { … };

And to provide a unified view of the file system, we can create an
overlay file system, similar to [1].

class OverlayFileSystem : public AbstractFileSystem { … };

To support a build system providing clang with a virtual file layout, we
could add an option to clang that accepts a file describing the layout of a
virtual file system. In a first iteration, this could be a simple json
file describing the mapping from virtual paths to real paths, and a
corresponding class VFSFromJSONFile : public AbstractFileSystem. Later we
can evolve a more efficient binary format for this. In addition we should
provide functions in libclang to produce these files.

The rest sounds generally good.

One concern I have that has not been brought up is the old problem of
making file operations relative to a directory entry that is taken once at
the start of an operation (imagine starting a compilation from a symlinked
directory, and somebody changing the link). I think this will probably not
a goal for phase 1, but would be nice if we could keep it in the back of
our heads :wink:

Can you expand on this? Dumb question: why can’t we just ask the OS for a
canonical path and work from that?

The main problem is less whether the path is canonical than that an
unrelated happenstance in the file system might lead to inconsistencies in
a compile step.

Imagine a source tree in src-head and one in src-branch.
$ ln -s src-head src
$ cd src
$ clang file.cc &
$ cd ..
$ rm src
$ ln -s src-branch src

Now if the clang process takes a while to run and resolves files it opens
via "absolute" paths rather than relative to the current working dir file
entry, it can see inconsistent state (for example, get a header from
src-branch instead of from src-head).

Hmm, haven’t thought about this one. I guess I agree this is not a
near-term goal :slight_smile:

I'd like to consider it design-wise at least on a straw-man level (to be
shot down :wink:

If we want to support this in the future, it might affect both the
ownership and the API design question. For example, one design straw-man
would be to have interfaces for FileSystem, Directory and File, where
FileSystem can give you Directory's and those again can give you Files.
That would trivially support using directory-entry based OS interfaces
where they are available, but would mean more code overhead per FileSystem
implementation.

A different approach would be to use descriptors for files and directories,
and have only a single FileSystem interface that can handle the instances.
Seems slightly less "nice" from a user point of view, but potentially
simpler and less overhead for the implementation.

From the other mails in this thread it sounds to me more like you want to

basically punt on those questions and just provide the interface and access
methods to get buffers for files. That might also be fine for now, but I'd
prefer if it is a conscious decision rather than an accidental one :slight_smile:

Cheers,
/Manuel

You can always prove me wrong, after we have the simpler case (mapping virtual paths to real paths) working :wink:

I just want to make clear that we are going to focus on what we need, which is to get modules for ObjC user frameworks working and fully supported. Virtual file buffers is not something we need at the moment (but to reiterate, we don’t want to prohibit either).

Not sure why they would break, if these are based on the FileManager. Also as long as the related “vfs” option is not passed to a compiler invocation to introduce a virtual layout via a configuration file, there should be no behavior change anywhere.

I'd like to consider it design-wise at least on a straw-man level (to be shot down :wink:

If we want to support this in the future, it might affect both the ownership and the API design question. For example, one design straw-man would be to have interfaces for FileSystem, Directory and File, where FileSystem can give you Directory's and those again can give you Files. That would trivially support using directory-entry based OS interfaces where they are available, but would mean more code overhead per FileSystem implementation.

A different approach would be to use descriptors for files and directories, and have only a single FileSystem interface that can handle the instances. Seems slightly less "nice" from a user point of view, but potentially simpler and less overhead for the implementation.

From the other mails in this thread it sounds to me more like you want to basically punt on those questions and just provide the interface and access methods to get buffers for files. That might also be fine for now, but I'd prefer if it is a conscious decision rather than an accidental one :slight_smile:

What filesystem modifications would you consider to be ‘unrelated’ in a compilation? E.g. what if clang is invoked from the ‘real location’ src-head, but there are command-line options that refer to absolute paths in ‘src’. Would we try to recover? I think I would need a concrete idea of what the desired model of consistency is before I could be convinced this is a solvable problem.

Now, specifically about the straw-man proposal: one of the things I like about reusing the llvm::sys::fs interface is that it makes the change very small for users. Both of these options seem to require giving that up for a handle-based API where clients need to think about a new object(s) for any file operations they want to use.

I wonder if handling the clang-invocation-location issue can be solved by virtually mapping `pwd` to `pwd -P` right at the beginning of the compilation? That would allow path-based operations to still work. However, that probably doesn’t scale if you want to treat *every* path that the compiler looks up this way, since you would explode the number of mappings… So it really depends on what our model of consistency is.

Ben

Hi Manuel,

Some non-goals (at least for a first iteration):
1) File system modification operations (create_directory, rename, etc.).
Clients will continue to use the real file system for these operations,
and we don’t intend to detect any conflicts this might create.
2) Completely virtual file buffers that do not exist on disk.

I'd vote for making that an explicit goal; two reasons:
1. I don't think it'll make a first iteration harder to implement
2. saying that we'll do things like that later will almost certainly make
it super-hard to do later

We don’t have the bandwidth to design / implement / test fully virtual
files.

I'm curious why you think it will be a lot more effort; my gut feeling
would be that this is probably going to be less effort (depending on what
exactly you want to use the VFS layer for)

You can always prove me wrong, after we have the simpler case (mapping
virtual paths to real paths) working :wink:

I just want to make clear that we are going to focus on what we need,
which is to get modules for ObjC user frameworks working and fully
supported. Virtual file buffers is not something we need at the moment (but
to reiterate, we don’t want to prohibit either).

Ok :slight_smile: I'm probably mainly confused because I think it'll be hard to *not*
support virtual file buffers with any design I can think of, so I was
concerned when you said that it's an explicit non-goal (as opposed to being
just not a primary goal).

if we don't want to break all of the Tooling layers in the process.

Not sure why they would break, if these are based on the FileManager. Also
as long as the related “vfs” option is not passed to a compiler invocation
to introduce a virtual layout via a configuration file, there should be no
behavior change anywhere.

Sounds good.

>
> I'd like to consider it design-wise at least on a straw-man level (to be
shot down :wink:
>
> If we want to support this in the future, it might affect both the
ownership and the API design question. For example, one design straw-man
would be to have interfaces for FileSystem, Directory and File, where
FileSystem can give you Directory's and those again can give you Files.
That would trivially support using directory-entry based OS interfaces
where they are available, but would mean more code overhead per FileSystem
implementation.
>
> A different approach would be to use descriptors for files and
directories, and have only a single FileSystem interface that can handle
the instances. Seems slightly less "nice" from a user point of view, but
potentially simpler and less overhead for the implementation.
>
> From the other mails in this thread it sounds to me more like you want
to basically punt on those questions and just provide the interface and
access methods to get buffers for files. That might also be fine for now,
but I'd prefer if it is a conscious decision rather than an accidental one
:slight_smile:

What filesystem modifications would you consider to be ‘unrelated’ in a
compilation? E.g. what if clang is invoked from the ‘real location’
src-head, but there are command-line options that refer to absolute paths
in ‘src’. Would we try to recover? I think I would need a concrete idea
of what the desired model of consistency is before I could be convinced
this is a solvable problem.

The proposed model is basically implemented today - as long as all paths
you give to the compiler are relative subpaths of the cwd, you'll stay
relative to the same directory inode.

Our main problem today is when we think about multi-threading the
compilation (mainly for tooling - I want to be able to parse multiple TUs
from within one program). There, just chdir'ing into the right directory
for the TU doesn't work any more.

Which brings me to a different question: multi-threading; since your main
use case is remapping on the lowest level, do you think you'll want
different mappings per TU?

Now, specifically about the straw-man proposal: one of the things I like
about reusing the llvm::sys::fs interface is that it makes the change very
small for users. Both of these options seem to require giving that up for
a handle-based API where clients need to think about a new object(s) for
any file operations they want to use.

Just for the record: I'm totally in favor of modeling stuff after the
llvm:sys::fs interface - that was the same plan we came up with back in the
day (but always were able to work around spending time on).

I wonder if handling the clang-invocation-location issue can be solved by
virtually mapping `pwd` to `pwd -P` right at the beginning of the
compilation? That would allow path-based operations to still work.
However, that probably doesn’t scale if you want to treat *every* path
that the compiler looks up this way, since you would explode the number of
mappings… So it really depends on what our model of consistency is.

I tried to explain above. Does that make sense?

Cheers,
/Manuel

Is this because FileManager caches the DirectoryEntries along the way?

I’m still not sure I get what you’re trying to solve. Is it to ensure that multiple threads have a consistent picture of the file system, similarly to if they all shared a FileManager, even in the presence of a chdir?

I don’t have a good answer to this yet. My plan was to start by having the compiler instance own the virtual file system, and see where that got us.

Cool - I’ll probably post an initial patch based on this approach in the near future as a first step.

Just want to make sure, are you aware of the “-working-directory” option ? This is what it is intended to solve.

>
> I'd like to consider it design-wise at least on a straw-man level (to
be shot down :wink:
>
> If we want to support this in the future, it might affect both the
ownership and the API design question. For example, one design straw-man
would be to have interfaces for FileSystem, Directory and File, where
FileSystem can give you Directory's and those again can give you Files.
That would trivially support using directory-entry based OS interfaces
where they are available, but would mean more code overhead per FileSystem
implementation.
>
> A different approach would be to use descriptors for files and
directories, and have only a single FileSystem interface that can handle
the instances. Seems slightly less "nice" from a user point of view, but
potentially simpler and less overhead for the implementation.
>
> From the other mails in this thread it sounds to me more like you want
to basically punt on those questions and just provide the interface and
access methods to get buffers for files. That might also be fine for now,
but I'd prefer if it is a conscious decision rather than an accidental one
:slight_smile:

What filesystem modifications would you consider to be ‘unrelated’ in a
compilation? E.g. what if clang is invoked from the ‘real location’
src-head, but there are command-line options that refer to absolute paths
in ‘src’. Would we try to recover? I think I would need a concrete idea
of what the desired model of consistency is before I could be convinced
this is a solvable problem.

The proposed model is basically implemented today - as long as all paths
you give to the compiler are relative subpaths of the cwd, you'll stay
relative to the same directory inode.

Is this because FileManager caches the DirectoryEntries along the way?

Nope, it's because clang just stays in one directory, and that basically
keeps all relative file operations inside the original working directory.

Our main problem today is when we think about multi-threading the
compilation (mainly for tooling - I want to be able to parse multiple TUs
from within one program). There, just chdir'ing into the right directory
for the TU doesn't work any more.

I’m still not sure I get what you’re trying to solve. Is it to ensure
that multiple threads have a consistent picture of the file system,
similarly to if they all shared a FileManager, even in the presence of a
chdir?

Yep.

Which brings me to a different question: multi-threading; since your main
use case is remapping on the lowest level, do you think you'll want
different mappings per TU?

I don’t have a good answer to this yet. My plan was to start by having
the compiler instance own the virtual file system, and see where that got
us.

I like the plan (similar to what I'd intuitively have done).

Now, specifically about the straw-man proposal: one of the things I like
about reusing the llvm::sys::fs interface is that it makes the change very
small for users. Both of these options seem to require giving that up for
a handle-based API where clients need to think about a new object(s) for
any file operations they want to use.

Just for the record: I'm totally in favor of modeling stuff after the
llvm:sys::fs interface - that was the same plan we came up with back in the
day (but always were able to work around spending time on).

Cool - I’ll probably post an initial patch based on this approach in the
near future as a first step.

Great! Looking forward to it :slight_smile:

I remember looking into it, but I don't remember why it didn't work for us
- I'll make sure to look into it once more, perhaps things have changed by
now (which would be awesome). Thanks for the tip.

Cheers,
/Manuel