LLVM & Clang file management

r4nt · November 28, 2011, 10:49am

Hi,

while working on tooling on top of clang/llvm we found the file system
abstractions in clang/llvm to be one of the points that could be nicer
to integrate with. I’m writing this mail to propose a strawman and get
some feedback on what you guys think the right way forward is (or
whether we should just leave things as they are).

First, the FileManager we have in clang has helped us a lot for our
tooling - when we run clang in a mapreduce we don’t need to lay out
files on a disk, we can just map files into memory and happily clang
over them. We’re also using the same mechanism to map builtin
includes; in short, the FileManager has made it possible to do clang
at scale.

Now we’re aware that it was not really the intention of the
FileManager to allow doing the things we do with it: not every module
in clang uses the FileManager, and the moment we hit llvm there is no
FileManager at all. For example, in case of the Driver we hack around
the fact that the header search tries to access the file system
driectly in rather brittle ways, relying on implementation details and
#ifdefs.

So why not make FileManager a more principled (and still blazing fast)
file system abstraction?
Pro:
- only one interface for developers to learn on the project (no more
PathV1 vs PathV2 vs FileManager)
- only one implementation (per-platform) for easier maintenance of the
file system platform abstraction
- one point to insert synchronization guarantees for tools / IDE
integration that wants to run clang in multiple threads at once (for
example when re-indexing on 12-ht-core machines)
- being able to replay compilations by injecting a virtual file system
that exactly “copies” the original file system’s content, which allows
easy scaling of replays, running tools against dirty edit buffers on a
lower level than the SourceManager and unit testing

Con:
- there would be yet another try at unifying the APIs which would be
in an intermediate state while being worked on (and PathV1 vs PathV2
is already bad enough)
- making it the canonical file system interface is a lot of effort
that requires touching a lot of systems (while we’re volunteering to
do the work, it will probably eat up other people’s time, too)

What parts (if any) of this type of transition makes sense?
1. Figure out the “correct” interface we’d want for FileManager to be
more generally useful
2. Change FileManager to that interface
4. Sink FileManager into llvm, so it can be used by other projects
4. Use it throughout clang
5. Use it throughout llvm
We don’t need to do all of them at once, and should be able to
evaluate the results along the way.

Thoughts? If folks are generally happy, I’d start up an email thread
to drive the target design of the FileManager to get things rolling.

/Manuel

ddunbar · November 28, 2011, 8:07pm

Hi Manual,

I'm +2 on the general idea.

I have had various thoughts in this direction as well (although no
implementation). See:
http://lists.cs.uiuc.edu/pipermail/cfe-dev/2010-July/009903.html
for my RFC from last year (focused at bug reporting, but involved
defining a VFS layer).

My one main implementation level comment is I don't think FileManager
is the right API layer to abstract at (it is too specific to Clang's
usage, and too hard to propagate through the rest of LLVM). My
intuition is that it is better to set out to define a lower level VFS
layer that is rich enough to support everything we do and the vagaries
of Win32/Unix, but is otherwise minimal.

One requirement I hope any proposed VFS design will support is
emulating Win32 on Unix (and vice versa), which imposes assorted API
complications but I think is worth it overall.

I see many positive future technologies we could build if we had a
good VFS layer, I'd absolutely love to see work in this direction.

- Daniel

r4nt · November 28, 2011, 9:04pm

Hi Manual,

I'm +2 on the general idea.

I have had various thoughts in this direction as well (although no
implementation). See:
http://lists.cs.uiuc.edu/pipermail/cfe-dev/2010-July/009903.html
for my RFC from last year (focused at bug reporting, but involved
defining a VFS layer).

Cool, that sounds like another use case very similar to our replaying
at scale use case.

My one main implementation level comment is I don't think FileManager
is the right API layer to abstract at (it is too specific to Clang's
usage, and too hard to propagate through the rest of LLVM). My
intuition is that it is better to set out to define a lower level VFS
layer that is rich enough to support everything we do and the vagaries
of Win32/Unix, but is otherwise minimal.

What about FileManager is too high level / too clang specific? The
uniquing logic? The possibility to add in stats caches?
Do you think we'd want to have a CachingFileSystem on top of the VFS
layer? That would sound more orthogonal, on the other hand FileManager
is doing pretty OS-specific stuff to unique the inodes where possible.

One requirement I hope any proposed VFS design will support is
emulating Win32 on Unix (and vice versa), which imposes assorted API
complications but I think is worth it overall.

I'm not sure I understand what you mean with "emulating win32"? I'd
hope to get win32 / unix stuff hidden behind the VFS; do you expect
that not to be possible performance wise?

Cheers,
/Manuel

Chandler_Carruth · November 28, 2011, 9:13pm

What about simulating case-insensitive behaviors? (I’m stabbing in the dark… i should leave the win32 stuff to those who know how it works.)

ddunbar · November 29, 2011, 8:29pm

Hi Manual,

I'm +2 on the general idea.

I have had various thoughts in this direction as well (although no
implementation). See:
http://lists.cs.uiuc.edu/pipermail/cfe-dev/2010-July/009903.html
for my RFC from last year (focused at bug reporting, but involved
defining a VFS layer).

Cool, that sounds like another use case very similar to our replaying
at scale use case.

My one main implementation level comment is I don't think FileManager
is the right API layer to abstract at (it is too specific to Clang's
usage, and too hard to propagate through the rest of LLVM). My
intuition is that it is better to set out to define a lower level VFS
layer that is rich enough to support everything we do and the vagaries
of Win32/Unix, but is otherwise minimal.

What about FileManager is too high level / too clang specific? The
uniquing logic? The possibility to add in stats caches?
Do you think we'd want to have a CachingFileSystem on top of the VFS
layer? That would sound more orthogonal, on the other hand FileManager
is doing pretty OS-specific stuff to unique the inodes where possible.

I guess I was thinking that it might be more cumbersome to move the
other parts of LLVM / Clang that do direct file access to use
FileManager, and would require expanding the FileManager interface
much beyond what it currently is (e.g., there are no interfaces at all
for output).

It's mostly an intuitive guess at this point, but that lead me to
think it would be better to have the VFS be slightly lower. But this
also depends on the design goal of the VFS, discussed a bit in the
reply below.

One requirement I hope any proposed VFS design will support is
emulating Win32 on Unix (and vice versa), which imposes assorted API
complications but I think is worth it overall.

I'm not sure I understand what you mean with "emulating win32"? I'd
hope to get win32 / unix stuff hidden behind the VFS; do you expect
that not to be possible performance wise?

I'd like to distinguish between "hidden" and virtualized. What I was
thinking was to virtualize the interfaces so that LLVM/Clang would
still be aware of the differences between win32 / unix (when
necessary, like in relation to inodes), but that would all be based on
going through a VFS layer. So one could then emulate any FS on another
one, but the definition of the VFS would still expose the underlying
differences between Unix/Win32/etc.

Was your plan directed more at hiding? In that case I can see why you
would want to start at the FileManager level.

I think both approaches probably can work, although hiding makes me a
bit more nervous because I think the API design ends up being much
harder (and more likely to incur performance tradeoffs). I'm always
pretty leery of attempts to paper over the differences between
platforms.

Did that explanation make sense? If not I can sketch pseudocode to
make it more obvious.

- Daniel

DougGregor · December 3, 2011, 9:33pm

Hi Manuel,

Hi,

while working on tooling on top of clang/llvm we found the file system
abstractions in clang/llvm to be one of the points that could be nicer
to integrate with. I’m writing this mail to propose a strawman and get
some feedback on what you guys think the right way forward is (or
whether we should just leave things as they are).

First, the FileManager we have in clang has helped us a lot for our
tooling - when we run clang in a mapreduce we don’t need to lay out
files on a disk, we can just map files into memory and happily clang
over them. We’re also using the same mechanism to map builtin
includes; in short, the FileManager has made it possible to do clang
at scale.

Now we’re aware that it was not really the intention of the
FileManager to allow doing the things we do with it: not every module
in clang uses the FileManager, and the moment we hit llvm there is no
FileManager at all. For example, in case of the Driver we hack around
the fact that the header search tries to access the file system
driectly in rather brittle ways, relying on implementation details and
#ifdefs.

So why not make FileManager a more principled (and still blazing fast)
file system abstraction?

Yes, please!

Having a proper virtual file system across Clang and LLVM would be a huge boon, especially for pushing Clang into more applications that aren't simply "grab stuff from the local file system." The current FileManager/SourceManager dance used to provide in-memory content for a (virtual or old) file is quite the mess.

Pro:
- only one interface for developers to learn on the project (no more
PathV1 vs PathV2 vs FileManager)
- only one implementation (per-platform) for easier maintenance of the
file system platform abstraction
- one point to insert synchronization guarantees for tools / IDE
integration that wants to run clang in multiple threads at once (for
example when re-indexing on 12-ht-core machines)
- being able to replay compilations by injecting a virtual file system
that exactly “copies” the original file system’s content, which allows
easy scaling of replays, running tools against dirty edit buffers on a
lower level than the SourceManager and unit testing

… and making sure that all of the various stages of compilation see the same view of the file system.

Con:
- there would be yet another try at unifying the APIs which would be
in an intermediate state while being worked on (and PathV1 vs PathV2
is already bad enough)

I'm fine with intermediate states so long as the direction and benefits are clear. The former we can certainly discuss, and the latter is obvious already.

- making it the canonical file system interface is a lot of effort
that requires touching a lot of systems (while we’re volunteering to
do the work, it will probably eat up other people’s time, too)

I doubt I'll have much time to directly hack on this, but I'll be happy to review / discuss / help with adoption. libclang is one of the huge beneficiaries of such a change, so I care a lot about getting that to work well.

What parts (if any) of this type of transition makes sense?
1. Figure out the “correct” interface we’d want for FileManager to be
more generally useful
2. Change FileManager to that interface
4. Sink FileManager into llvm, so it can be used by other projects
4. Use it throughout clang
5. Use it throughout llvm
We don’t need to do all of them at once, and should be able to
evaluate the results along the way.

I share some of Daniel's concern about re-using FileManager, because the interface is very narrowly designed for Clang's usage and some of the functionality intended for the VFS is split out into SourceManager. My advice would be to start building a new VFS down in LLVM, and make FileManager an increasingly-shrinking interface on top of the new VFS. At some point, FileManager will be thin enough that its clients can just switch directly over to using the VFS, and FileManager can eventually go away.

I do realize that this could end up like PathV1 vs. PathV2, where both exist for a while, but the benefits of the VFS should outweigh our collective laziness.

- Doug

r4nt · December 4, 2011, 4:33pm

Hi Manual,

I'm +2 on the general idea.

I have had various thoughts in this direction as well (although no
implementation). See:
http://lists.cs.uiuc.edu/pipermail/cfe-dev/2010-July/009903.html
for my RFC from last year (focused at bug reporting, but involved
defining a VFS layer).

Cool, that sounds like another use case very similar to our replaying
at scale use case.

My one main implementation level comment is I don't think FileManager
is the right API layer to abstract at (it is too specific to Clang's
usage, and too hard to propagate through the rest of LLVM). My
intuition is that it is better to set out to define a lower level VFS
layer that is rich enough to support everything we do and the vagaries
of Win32/Unix, but is otherwise minimal.

What about FileManager is too high level / too clang specific? The
uniquing logic? The possibility to add in stats caches?
Do you think we'd want to have a CachingFileSystem on top of the VFS
layer? That would sound more orthogonal, on the other hand FileManager
is doing pretty OS-specific stuff to unique the inodes where possible.

I guess I was thinking that it might be more cumbersome to move the
other parts of LLVM / Clang that do direct file access to use
FileManager, and would require expanding the FileManager interface
much beyond what it currently is (e.g., there are no interfaces at all
for output).

It's mostly an intuitive guess at this point, but that lead me to
think it would be better to have the VFS be slightly lower. But this
also depends on the design goal of the VFS, discussed a bit in the
reply below.

One requirement I hope any proposed VFS design will support is
emulating Win32 on Unix (and vice versa), which imposes assorted API
complications but I think is worth it overall.

I'm not sure I understand what you mean with "emulating win32"? I'd
hope to get win32 / unix stuff hidden behind the VFS; do you expect
that not to be possible performance wise?

I'd like to distinguish between "hidden" and virtualized. What I was
thinking was to virtualize the interfaces so that LLVM/Clang would
still be aware of the differences between win32 / unix (when
necessary, like in relation to inodes), but that would all be based on
going through a VFS layer. So one could then emulate any FS on another
one, but the definition of the VFS would still expose the underlying
differences between Unix/Win32/etc.

Was your plan directed more at hiding? In that case I can see why you
would want to start at the FileManager level.

Well, the answer to that question depends highly on the performance
characteristics we can get.
I usually prefer hiding, unless performance requires us to break the
abstraction.

Ok, after browsing the implementations of PathV2 and FileSystem, this
stuff already looks pretty close to what I'd want to write anyway,
minus putting it into classes to enable run-time virtualization (and
it doesn't look like it would be too hard to switch FileManager to run
on top of FileSystem...), and splitting up FileSystem into a
FileSystem and a OperatingSystemPaths or something (how system
libraries are found, etc)

Do you know whether there were roadblocks to the PathV1->V2
transition? (or was just somebody with enough stamina missing

I think both approaches probably can work, although hiding makes me a
bit more nervous because I think the API design ends up being much
harder (and more likely to incur performance tradeoffs). I'm always
pretty leery of attempts to paper over the differences between
platforms.
Did that explanation make sense? If not I can sketch pseudocode to
make it more obvious.

Sure, code always helps

Also, do you have any objections to just virtualizing
Support/FileSystem and basing FileManager back on top of that?

Cheers,
/Manuel

r4nt · December 4, 2011, 5:06pm

Hi Manuel,

Hi,

while working on tooling on top of clang/llvm we found the file system
abstractions in clang/llvm to be one of the points that could be nicer
to integrate with. I’m writing this mail to propose a strawman and get
some feedback on what you guys think the right way forward is (or
whether we should just leave things as they are).

First, the FileManager we have in clang has helped us a lot for our
tooling - when we run clang in a mapreduce we don’t need to lay out
files on a disk, we can just map files into memory and happily clang
over them. We’re also using the same mechanism to map builtin
includes; in short, the FileManager has made it possible to do clang
at scale.

Now we’re aware that it was not really the intention of the
FileManager to allow doing the things we do with it: not every module
in clang uses the FileManager, and the moment we hit llvm there is no
FileManager at all. For example, in case of the Driver we hack around
the fact that the header search tries to access the file system
driectly in rather brittle ways, relying on implementation details and
#ifdefs.

So why not make FileManager a more principled (and still blazing fast)
file system abstraction?

Yes, please!

Great /me jumps right into the design discussion then.

Having a proper virtual file system across Clang and LLVM would be a huge boon, especially for pushing Clang into more applications that aren't simply "grab stuff from the local file system." The current FileManager/SourceManager dance used to provide in-memory content for a (virtual or old) file is quite the mess.

Pro:
- only one interface for developers to learn on the project (no more
PathV1 vs PathV2 vs FileManager)
- only one implementation (per-platform) for easier maintenance of the
file system platform abstraction
- one point to insert synchronization guarantees for tools / IDE
integration that wants to run clang in multiple threads at once (for
example when re-indexing on 12-ht-core machines)
- being able to replay compilations by injecting a virtual file system
that exactly “copies” the original file system’s content, which allows
easy scaling of replays, running tools against dirty edit buffers on a
lower level than the SourceManager and unit testing

… and making sure that all of the various stages of compilation see the same view of the file system.

Con:
- there would be yet another try at unifying the APIs which would be
in an intermediate state while being worked on (and PathV1 vs PathV2
is already bad enough)

I'm fine with intermediate states so long as the direction and benefits are clear. The former we can certainly discuss, and the latter is obvious already.

- making it the canonical file system interface is a lot of effort
that requires touching a lot of systems (while we’re volunteering to
do the work, it will probably eat up other people’s time, too)

I doubt I'll have much time to directly hack on this, but I'll be happy to review / discuss / help with adoption. libclang is one of the huge beneficiaries of such a change, so I care a lot about getting that to work well.

What parts (if any) of this type of transition makes sense?
1. Figure out the “correct” interface we’d want for FileManager to be
more generally useful
2. Change FileManager to that interface
4. Sink FileManager into llvm, so it can be used by other projects
4. Use it throughout clang
5. Use it throughout llvm
We don’t need to do all of them at once, and should be able to
evaluate the results along the way.

I share some of Daniel's concern about re-using FileManager, because the interface is very narrowly designed for Clang's usage and some of the functionality intended for the VFS is split out into SourceManager. My advice would be to start building a new VFS down in LLVM, and make FileManager an increasingly-shrinking interface on top of the new VFS. At some point, FileManager will be thin enough that its clients can just switch directly over to using the VFS, and FileManager can eventually go away.

I do realize that this could end up like PathV1 vs. PathV2, where both exist for a while, but the benefits of the VFS should outweigh our collective laziness.

So, as I noted in my replay to Daniel, after working through
llvm/Support (and bringing FileManager back to my mind) I think I'm
actually seeing a way forward, tell me if I'm crazy:
1. morph FileSystem (I don't know whether that would include PathV2,
but I currently don't think so) into a class that exports a nice
interface for all FileSystem functions that we can override; to be
able to do that step-by-step, we could for example introduce a static
FileSystem pointer that is initialized with the default system file
system on startup (I like being able to do baby-steps)
2. add methods to FileSystem to support opening MemoryBuffers; the
path forward will be to move all calls to MemofyBuffer::get*File
through the FileSystem interface, but again that can be handled
incrementally
3. at that point we'd have enough stuff in FileSystem to rebase
FileManager on top of it; once 1 and 2 are finished for clang/.* we'll
be able to completely move the virtual file support over into a nice
OverlayFileSystem implementation (argh, I've coded too many of those
in my life);
4. add methods to FileSystem to support opening raw_fd_ostreams; this
is basically the process for reading mirrored

Thoughts? Completely broken approach? Broken order?

On a different note, switching to the SourceManager topic - I know
enough about SourceManager to be dangerous but not enough to ever
claim I would have understood the crazy buffer management that's going
on in ContentCache So I'd need a lot of help to pry that box open
eventually. Currently I'd think that this can be done in a subsequent
step after the file system is sorted out, but I might be wrong...

Cheers,
/Manuel

Bigcheese · December 6, 2011, 1:11am

Just for some background about why we have PathV2.

In my quest to improve Windows support across LLVM and Clang I ran
into many issues with the way PathV1 worked. A few were:
* PathV1, and most of LLVM, use std::string to handle errors. This
makes code more verbose than needed, and loses os level error
information.
* PathV1 makes it difficult to handle Unicode on Windows. Although
apparently I didn't solve the problem correctly either :P.
* PathV1 requires constructing a Path object before calling any
functions. This is inefficient when most of the time you have
something StringRef'able.

Thus when I designed PathV2 I made it stateless, utf-8 only, and used
error_code.

The reason I bring this up is because I support a VFS, however, I want
to make sure that we keep in mind the reasons PathV2 was created while
writing it.

PathV1 -> PathV2 transition stopped because I ran out of time to do
it. There's so much code that uses it, and some of the changes are non
trivial in the cases where the Path class is stored and accessed many
places instead of just used to access the path functions.

The approach and order seems good to me. The llvm::sys::path parts can
stay separate, only the llvm::sys::fs parts need to be virtualized.

- Michael Spencer

ddunbar · December 6, 2011, 5:04am

Hi Manuel,

Hi Manuel,

Hi,

while working on tooling on top of clang/llvm we found the file system
abstractions in clang/llvm to be one of the points that could be nicer
to integrate with. I’m writing this mail to propose a strawman and get
some feedback on what you guys think the right way forward is (or
whether we should just leave things as they are).

First, the FileManager we have in clang has helped us a lot for our
tooling - when we run clang in a mapreduce we don’t need to lay out
files on a disk, we can just map files into memory and happily clang
over them. We’re also using the same mechanism to map builtin
includes; in short, the FileManager has made it possible to do clang
at scale.

Now we’re aware that it was not really the intention of the
FileManager to allow doing the things we do with it: not every module
in clang uses the FileManager, and the moment we hit llvm there is no
FileManager at all. For example, in case of the Driver we hack around
the fact that the header search tries to access the file system
driectly in rather brittle ways, relying on implementation details and
#ifdefs.

So why not make FileManager a more principled (and still blazing fast)
file system abstraction?

Yes, please!

Great /me jumps right into the design discussion then.

Having a proper virtual file system across Clang and LLVM would be a huge boon, especially for pushing Clang into more applications that aren't simply "grab stuff from the local file system." The current FileManager/SourceManager dance used to provide in-memory content for a (virtual or old) file is quite the mess.

Pro:
- only one interface for developers to learn on the project (no more
PathV1 vs PathV2 vs FileManager)
- only one implementation (per-platform) for easier maintenance of the
file system platform abstraction
- one point to insert synchronization guarantees for tools / IDE
integration that wants to run clang in multiple threads at once (for
example when re-indexing on 12-ht-core machines)
- being able to replay compilations by injecting a virtual file system
that exactly “copies” the original file system’s content, which allows
easy scaling of replays, running tools against dirty edit buffers on a
lower level than the SourceManager and unit testing

… and making sure that all of the various stages of compilation see the same view of the file system.

Con:
- there would be yet another try at unifying the APIs which would be
in an intermediate state while being worked on (and PathV1 vs PathV2
is already bad enough)

I'm fine with intermediate states so long as the direction and benefits are clear. The former we can certainly discuss, and the latter is obvious already.

- making it the canonical file system interface is a lot of effort
that requires touching a lot of systems (while we’re volunteering to
do the work, it will probably eat up other people’s time, too)

I doubt I'll have much time to directly hack on this, but I'll be happy to review / discuss / help with adoption. libclang is one of the huge beneficiaries of such a change, so I care a lot about getting that to work well.

What parts (if any) of this type of transition makes sense?
1. Figure out the “correct” interface we’d want for FileManager to be
more generally useful
2. Change FileManager to that interface
4. Sink FileManager into llvm, so it can be used by other projects
4. Use it throughout clang
5. Use it throughout llvm
We don’t need to do all of them at once, and should be able to
evaluate the results along the way.

I share some of Daniel's concern about re-using FileManager, because the interface is very narrowly designed for Clang's usage and some of the functionality intended for the VFS is split out into SourceManager. My advice would be to start building a new VFS down in LLVM, and make FileManager an increasingly-shrinking interface on top of the new VFS. At some point, FileManager will be thin enough that its clients can just switch directly over to using the VFS, and FileManager can eventually go away.

I do realize that this could end up like PathV1 vs. PathV2, where both exist for a while, but the benefits of the VFS should outweigh our collective laziness.

So, as I noted in my replay to Daniel, after working through
llvm/Support (and bringing FileManager back to my mind) I think I'm
actually seeing a way forward, tell me if I'm crazy:

The following seems like a good plan and breakdown to me. And +1 on
working baby-steps.

1. morph FileSystem (I don't know whether that would include PathV2,
but I currently don't think so) into a class that exports a nice
interface for all FileSystem functions that we can override; to be
able to do that step-by-step, we could for example introduce a static
FileSystem pointer that is initialized with the default system file
system on startup (I like being able to do baby-steps)
2. add methods to FileSystem to support opening MemoryBuffers; the
path forward will be to move all calls to MemofyBuffer::get*File
through the FileSystem interface, but again that can be handled
incrementally
3. at that point we'd have enough stuff in FileSystem to rebase
FileManager on top of it; once 1 and 2 are finished for clang/.* we'll
be able to completely move the virtual file support over into a nice
OverlayFileSystem implementation (argh, I've coded too many of those
in my life);

Is this true (enough support to base FileManager on top)? I'm
specifically thinking about some of the places we look at inodes. Or
are you expecting to expose some kind of abstracted representation of
an inode?

4. add methods to FileSystem to support opening raw_fd_ostreams; this
is basically the process for reading mirrored

Thoughts? Completely broken approach? Broken order?

On a different note, switching to the SourceManager topic - I know
enough about SourceManager to be dangerous but not enough to ever
claim I would have understood the crazy buffer management that's going
on in ContentCache So I'd need a lot of help to pry that box open
eventually. Currently I'd think that this can be done in a subsequent
step after the file system is sorted out, but I might be wrong...

I'd try away from SourceManager. I would hope that the VFS layer stuff
doesn't interact (or minimally) with SourceManager (although
SourceManager is also aware of inodes, which is sad).

- Daniel

DougGregor · December 6, 2011, 5:23am

Hi Manuel,

Hi Manuel,

Hi,

while working on tooling on top of clang/llvm we found the file system
abstractions in clang/llvm to be one of the points that could be nicer
to integrate with. I’m writing this mail to propose a strawman and get
some feedback on what you guys think the right way forward is (or
whether we should just leave things as they are).

First, the FileManager we have in clang has helped us a lot for our
tooling - when we run clang in a mapreduce we don’t need to lay out
files on a disk, we can just map files into memory and happily clang
over them. We’re also using the same mechanism to map builtin
includes; in short, the FileManager has made it possible to do clang
at scale.

Now we’re aware that it was not really the intention of the
FileManager to allow doing the things we do with it: not every module
in clang uses the FileManager, and the moment we hit llvm there is no
FileManager at all. For example, in case of the Driver we hack around
the fact that the header search tries to access the file system
driectly in rather brittle ways, relying on implementation details and
#ifdefs.

So why not make FileManager a more principled (and still blazing fast)
file system abstraction?

Yes, please!

Great /me jumps right into the design discussion then.

Having a proper virtual file system across Clang and LLVM would be a huge boon, especially for pushing Clang into more applications that aren't simply "grab stuff from the local file system." The current FileManager/SourceManager dance used to provide in-memory content for a (virtual or old) file is quite the mess.

Pro:
- only one interface for developers to learn on the project (no more
PathV1 vs PathV2 vs FileManager)
- only one implementation (per-platform) for easier maintenance of the
file system platform abstraction
- one point to insert synchronization guarantees for tools / IDE
integration that wants to run clang in multiple threads at once (for
example when re-indexing on 12-ht-core machines)
- being able to replay compilations by injecting a virtual file system
that exactly “copies” the original file system’s content, which allows
easy scaling of replays, running tools against dirty edit buffers on a
lower level than the SourceManager and unit testing

… and making sure that all of the various stages of compilation see the same view of the file system.

Con:
- there would be yet another try at unifying the APIs which would be
in an intermediate state while being worked on (and PathV1 vs PathV2
is already bad enough)

I'm fine with intermediate states so long as the direction and benefits are clear. The former we can certainly discuss, and the latter is obvious already.

- making it the canonical file system interface is a lot of effort
that requires touching a lot of systems (while we’re volunteering to
do the work, it will probably eat up other people’s time, too)

I doubt I'll have much time to directly hack on this, but I'll be happy to review / discuss / help with adoption. libclang is one of the huge beneficiaries of such a change, so I care a lot about getting that to work well.

What parts (if any) of this type of transition makes sense?
1. Figure out the “correct” interface we’d want for FileManager to be
more generally useful
2. Change FileManager to that interface
4. Sink FileManager into llvm, so it can be used by other projects
4. Use it throughout clang
5. Use it throughout llvm
We don’t need to do all of them at once, and should be able to
evaluate the results along the way.

I share some of Daniel's concern about re-using FileManager, because the interface is very narrowly designed for Clang's usage and some of the functionality intended for the VFS is split out into SourceManager. My advice would be to start building a new VFS down in LLVM, and make FileManager an increasingly-shrinking interface on top of the new VFS. At some point, FileManager will be thin enough that its clients can just switch directly over to using the VFS, and FileManager can eventually go away.

I do realize that this could end up like PathV1 vs. PathV2, where both exist for a while, but the benefits of the VFS should outweigh our collective laziness.

So, as I noted in my replay to Daniel, after working through
llvm/Support (and bringing FileManager back to my mind) I think I'm
actually seeing a way forward, tell me if I'm crazy:

The following seems like a good plan and breakdown to me. And +1 on
working baby-steps.

1. morph FileSystem (I don't know whether that would include PathV2,
but I currently don't think so) into a class that exports a nice
interface for all FileSystem functions that we can override; to be
able to do that step-by-step, we could for example introduce a static
FileSystem pointer that is initialized with the default system file
system on startup (I like being able to do baby-steps)
2. add methods to FileSystem to support opening MemoryBuffers; the
path forward will be to move all calls to MemofyBuffer::get*File
through the FileSystem interface, but again that can be handled
incrementally
3. at that point we'd have enough stuff in FileSystem to rebase
FileManager on top of it; once 1 and 2 are finished for clang/.* we'll
be able to completely move the virtual file support over into a nice
OverlayFileSystem implementation (argh, I've coded too many of those
in my life);

Is this true (enough support to base FileManager on top)? I'm
specifically thinking about some of the places we look at inodes. Or
are you expecting to expose some kind of abstracted representation of
an inode?

4. add methods to FileSystem to support opening raw_fd_ostreams; this
is basically the process for reading mirrored

Thoughts? Completely broken approach? Broken order?

On a different note, switching to the SourceManager topic - I know
enough about SourceManager to be dangerous but not enough to ever
claim I would have understood the crazy buffer management that's going
on in ContentCache So I'd need a lot of help to pry that box open
eventually. Currently I'd think that this can be done in a subsequent
step after the file system is sorted out, but I might be wrong...

I'd try away from SourceManager. I would hope that the VFS layer stuff
doesn't interact (or minimally) with SourceManager (although
SourceManager is also aware of inodes, which is sad).

SourceManager has some code for overriding on-disk files with alternative buffers and for detecting when the underlying file system has changed from underneath us. That functionally should eventually move into FileSystem.

r4nt · December 6, 2011, 10:14am

Ah, yes, I've already asked myself what the purpose of the overriding
is in SourceManager, as it seems like the FileManager's overriding
abilities would already be enough to achieve the same purpose (apart
from the direct call to stat() in SourceManager.cpp I see). If I'm not
missing anything I'd assume that all that crazyness would go away once
there is a full virtual file system?

Cheers,
/Manuel

r4nt · December 6, 2011, 10:18am

Hi Manuel,

Hi Manuel,

Hi,

while working on tooling on top of clang/llvm we found the file system
abstractions in clang/llvm to be one of the points that could be nicer
to integrate with. I’m writing this mail to propose a strawman and get
some feedback on what you guys think the right way forward is (or
whether we should just leave things as they are).

First, the FileManager we have in clang has helped us a lot for our
tooling - when we run clang in a mapreduce we don’t need to lay out
files on a disk, we can just map files into memory and happily clang
over them. We’re also using the same mechanism to map builtin
includes; in short, the FileManager has made it possible to do clang
at scale.

Now we’re aware that it was not really the intention of the
FileManager to allow doing the things we do with it: not every module
in clang uses the FileManager, and the moment we hit llvm there is no
FileManager at all. For example, in case of the Driver we hack around
the fact that the header search tries to access the file system
driectly in rather brittle ways, relying on implementation details and
#ifdefs.

So why not make FileManager a more principled (and still blazing fast)
file system abstraction?

Yes, please!

Great /me jumps right into the design discussion then.

Having a proper virtual file system across Clang and LLVM would be a huge boon, especially for pushing Clang into more applications that aren't simply "grab stuff from the local file system." The current FileManager/SourceManager dance used to provide in-memory content for a (virtual or old) file is quite the mess.

Pro:
- only one interface for developers to learn on the project (no more
PathV1 vs PathV2 vs FileManager)
- only one implementation (per-platform) for easier maintenance of the
file system platform abstraction
- one point to insert synchronization guarantees for tools / IDE
integration that wants to run clang in multiple threads at once (for
example when re-indexing on 12-ht-core machines)
- being able to replay compilations by injecting a virtual file system
that exactly “copies” the original file system’s content, which allows
easy scaling of replays, running tools against dirty edit buffers on a
lower level than the SourceManager and unit testing

… and making sure that all of the various stages of compilation see the same view of the file system.

Con:
- there would be yet another try at unifying the APIs which would be
in an intermediate state while being worked on (and PathV1 vs PathV2
is already bad enough)

I'm fine with intermediate states so long as the direction and benefits are clear. The former we can certainly discuss, and the latter is obvious already.

- making it the canonical file system interface is a lot of effort
that requires touching a lot of systems (while we’re volunteering to
do the work, it will probably eat up other people’s time, too)

I doubt I'll have much time to directly hack on this, but I'll be happy to review / discuss / help with adoption. libclang is one of the huge beneficiaries of such a change, so I care a lot about getting that to work well.

What parts (if any) of this type of transition makes sense?
1. Figure out the “correct” interface we’d want for FileManager to be
more generally useful
2. Change FileManager to that interface
4. Sink FileManager into llvm, so it can be used by other projects
4. Use it throughout clang
5. Use it throughout llvm
We don’t need to do all of them at once, and should be able to
evaluate the results along the way.

I share some of Daniel's concern about re-using FileManager, because the interface is very narrowly designed for Clang's usage and some of the functionality intended for the VFS is split out into SourceManager. My advice would be to start building a new VFS down in LLVM, and make FileManager an increasingly-shrinking interface on top of the new VFS. At some point, FileManager will be thin enough that its clients can just switch directly over to using the VFS, and FileManager can eventually go away.

I do realize that this could end up like PathV1 vs. PathV2, where both exist for a while, but the benefits of the VFS should outweigh our collective laziness.

So, as I noted in my replay to Daniel, after working through
llvm/Support (and bringing FileManager back to my mind) I think I'm
actually seeing a way forward, tell me if I'm crazy:

The following seems like a good plan and breakdown to me. And +1 on
working baby-steps.

1. morph FileSystem (I don't know whether that would include PathV2,
but I currently don't think so) into a class that exports a nice
interface for all FileSystem functions that we can override; to be
able to do that step-by-step, we could for example introduce a static
FileSystem pointer that is initialized with the default system file
system on startup (I like being able to do baby-steps)
2. add methods to FileSystem to support opening MemoryBuffers; the
path forward will be to move all calls to MemofyBuffer::get*File
through the FileSystem interface, but again that can be handled
incrementally
3. at that point we'd have enough stuff in FileSystem to rebase
FileManager on top of it; once 1 and 2 are finished for clang/.* we'll
be able to completely move the virtual file support over into a nice
OverlayFileSystem implementation (argh, I've coded too many of those
in my life);

Is this true (enough support to base FileManager on top)? I'm
specifically thinking about some of the places we look at inodes. Or
are you expecting to expose some kind of abstracted representation of
an inode?

Exactly. FileSystem already supports that implicitly, we just need to
export it in a sensible way.

4. add methods to FileSystem to support opening raw_fd_ostreams; this
is basically the process for reading mirrored

Thoughts? Completely broken approach? Broken order?

On a different note, switching to the SourceManager topic - I know
enough about SourceManager to be dangerous but not enough to ever
claim I would have understood the crazy buffer management that's going
on in ContentCache So I'd need a lot of help to pry that box open
eventually. Currently I'd think that this can be done in a subsequent
step after the file system is sorted out, but I might be wrong...

I'd try away from SourceManager. I would hope that the VFS layer stuff
doesn't interact (or minimally) with SourceManager (although
SourceManager is also aware of inodes, which is sad).

I think the concept of unique system wide file IDs makes sense (and
like Douglas said we can probably push most of that stuff down from
SourceManager once the FileSystem is providing all the hooks we need),
and I'm confident we can express that in an OS independent way.

Cheers,
/Manuel

r4nt · December 6, 2011, 10:27am

Hi Manuel,

Hi,

while working on tooling on top of clang/llvm we found the file system
abstractions in clang/llvm to be one of the points that could be nicer
to integrate with. I’m writing this mail to propose a strawman and get
some feedback on what you guys think the right way forward is (or
whether we should just leave things as they are).

First, the FileManager we have in clang has helped us a lot for our
tooling - when we run clang in a mapreduce we don’t need to lay out
files on a disk, we can just map files into memory and happily clang
over them. We’re also using the same mechanism to map builtin
includes; in short, the FileManager has made it possible to do clang
at scale.

Now we’re aware that it was not really the intention of the
FileManager to allow doing the things we do with it: not every module
in clang uses the FileManager, and the moment we hit llvm there is no
FileManager at all. For example, in case of the Driver we hack around
the fact that the header search tries to access the file system
driectly in rather brittle ways, relying on implementation details and
#ifdefs.

So why not make FileManager a more principled (and still blazing fast)
file system abstraction?

Yes, please!

Great /me jumps right into the design discussion then.

Having a proper virtual file system across Clang and LLVM would be a huge boon, especially for pushing Clang into more applications that aren't simply "grab stuff from the local file system." The current FileManager/SourceManager dance used to provide in-memory content for a (virtual or old) file is quite the mess.

Pro:
- only one interface for developers to learn on the project (no more
PathV1 vs PathV2 vs FileManager)
- only one implementation (per-platform) for easier maintenance of the
file system platform abstraction
- one point to insert synchronization guarantees for tools / IDE
integration that wants to run clang in multiple threads at once (for
example when re-indexing on 12-ht-core machines)
- being able to replay compilations by injecting a virtual file system
that exactly “copies” the original file system’s content, which allows
easy scaling of replays, running tools against dirty edit buffers on a
lower level than the SourceManager and unit testing

… and making sure that all of the various stages of compilation see the same view of the file system.

Con:
- there would be yet another try at unifying the APIs which would be
in an intermediate state while being worked on (and PathV1 vs PathV2
is already bad enough)

I'm fine with intermediate states so long as the direction and benefits are clear. The former we can certainly discuss, and the latter is obvious already.

- making it the canonical file system interface is a lot of effort
that requires touching a lot of systems (while we’re volunteering to
do the work, it will probably eat up other people’s time, too)

I doubt I'll have much time to directly hack on this, but I'll be happy to review / discuss / help with adoption. libclang is one of the huge beneficiaries of such a change, so I care a lot about getting that to work well.

What parts (if any) of this type of transition makes sense?
1. Figure out the “correct” interface we’d want for FileManager to be
more generally useful
2. Change FileManager to that interface
4. Sink FileManager into llvm, so it can be used by other projects
4. Use it throughout clang
5. Use it throughout llvm
We don’t need to do all of them at once, and should be able to
evaluate the results along the way.

I share some of Daniel's concern about re-using FileManager, because the interface is very narrowly designed for Clang's usage and some of the functionality intended for the VFS is split out into SourceManager. My advice would be to start building a new VFS down in LLVM, and make FileManager an increasingly-shrinking interface on top of the new VFS. At some point, FileManager will be thin enough that its clients can just switch directly over to using the VFS, and FileManager can eventually go away.

I do realize that this could end up like PathV1 vs. PathV2, where both exist for a while, but the benefits of the VFS should outweigh our collective laziness.

So, as I noted in my replay to Daniel, after working through
llvm/Support (and bringing FileManager back to my mind) I think I'm
actually seeing a way forward, tell me if I'm crazy:
1. morph FileSystem (I don't know whether that would include PathV2,
but I currently don't think so) into a class that exports a nice
interface for all FileSystem functions that we can override; to be
able to do that step-by-step, we could for example introduce a static
FileSystem pointer that is initialized with the default system file
system on startup (I like being able to do baby-steps)
2. add methods to FileSystem to support opening MemoryBuffers; the
path forward will be to move all calls to MemofyBuffer::get*File
through the FileSystem interface, but again that can be handled
incrementally
3. at that point we'd have enough stuff in FileSystem to rebase
FileManager on top of it; once 1 and 2 are finished for clang/.* we'll
be able to completely move the virtual file support over into a nice
OverlayFileSystem implementation (argh, I've coded too many of those
in my life);
4. add methods to FileSystem to support opening raw_fd_ostreams; this
is basically the process for reading mirrored

Thoughts? Completely broken approach? Broken order?

On a different note, switching to the SourceManager topic - I know
enough about SourceManager to be dangerous but not enough to ever
claim I would have understood the crazy buffer management that's going
on in ContentCache So I'd need a lot of help to pry that box open
eventually. Currently I'd think that this can be done in a subsequent
step after the file system is sorted out, but I might be wrong...

Cheers,
/Manuel

Just for some background about why we have PathV2.

In my quest to improve Windows support across LLVM and Clang I ran
into many issues with the way PathV1 worked. A few were:
* PathV1, and most of LLVM, use std::string to handle errors. This
makes code more verbose than needed, and loses os level error
information.
* PathV1 makes it difficult to handle Unicode on Windows. Although
apparently I didn't solve the problem correctly either :P.

Are there open bugs? A quick search for unicode on llvm.org/bugs
didn't show anything windows specific.

* PathV1 requires constructing a Path object before calling any
functions. This is inefficient when most of the time you have
something StringRef'able.

Thus when I designed PathV2 I made it stateless, utf-8 only, and used
error_code.

The reason I bring this up is because I support a VFS, however, I want
to make sure that we keep in mind the reasons PathV2 was created while
writing it.

Yep, that's an important point. As I said, I've looked into PathV2 and
I really like the distinction between path manipulation and file
system access, and the general design of both PathV2 and
Support/FileSystem.

PathV1 -> PathV2 transition stopped because I ran out of time to do
it. There's so much code that uses it, and some of the changes are non
trivial in the cases where the Path class is stored and accessed many
places instead of just used to access the path functions.

The approach and order seems good to me. The llvm::sys::path parts can
stay separate, only the llvm::sys::fs parts need to be virtualized.

Yep, that was exactly my thought. Thanks for confirming and providing
all the background information!

Cheers,
/Manuel

DougGregor · December 6, 2011, 2:59pm

Sent from my iPhone

Hi Manuel,

Hi Manuel,

Hi,

while working on tooling on top of clang/llvm we found the file system
abstractions in clang/llvm to be one of the points that could be nicer
to integrate with. I’m writing this mail to propose a strawman and get
some feedback on what you guys think the right way forward is (or
whether we should just leave things as they are).

First, the FileManager we have in clang has helped us a lot for our
tooling - when we run clang in a mapreduce we don’t need to lay out
files on a disk, we can just map files into memory and happily clang
over them. We’re also using the same mechanism to map builtin
includes; in short, the FileManager has made it possible to do clang
at scale.

Now we’re aware that it was not really the intention of the
FileManager to allow doing the things we do with it: not every module
in clang uses the FileManager, and the moment we hit llvm there is no
FileManager at all. For example, in case of the Driver we hack around
the fact that the header search tries to access the file system
driectly in rather brittle ways, relying on implementation details and
#ifdefs.

So why not make FileManager a more principled (and still blazing fast)
file system abstraction?

Yes, please!

Great /me jumps right into the design discussion then.

Having a proper virtual file system across Clang and LLVM would be a huge boon, especially for pushing Clang into more applications that aren't simply "grab stuff from the local file system." The current FileManager/SourceManager dance used to provide in-memory content for a (virtual or old) file is quite the mess.

Pro:
- only one interface for developers to learn on the project (no more
PathV1 vs PathV2 vs FileManager)
- only one implementation (per-platform) for easier maintenance of the
file system platform abstraction
- one point to insert synchronization guarantees for tools / IDE
integration that wants to run clang in multiple threads at once (for
example when re-indexing on 12-ht-core machines)
- being able to replay compilations by injecting a virtual file system
that exactly “copies” the original file system’s content, which allows
easy scaling of replays, running tools against dirty edit buffers on a
lower level than the SourceManager and unit testing

… and making sure that all of the various stages of compilation see the same view of the file system.

Con:
- there would be yet another try at unifying the APIs which would be
in an intermediate state while being worked on (and PathV1 vs PathV2
is already bad enough)

I'm fine with intermediate states so long as the direction and benefits are clear. The former we can certainly discuss, and the latter is obvious already.

- making it the canonical file system interface is a lot of effort
that requires touching a lot of systems (while we’re volunteering to
do the work, it will probably eat up other people’s time, too)

I doubt I'll have much time to directly hack on this, but I'll be happy to review / discuss / help with adoption. libclang is one of the huge beneficiaries of such a change, so I care a lot about getting that to work well.

What parts (if any) of this type of transition makes sense?
1. Figure out the “correct” interface we’d want for FileManager to be
more generally useful
2. Change FileManager to that interface
4. Sink FileManager into llvm, so it can be used by other projects
4. Use it throughout clang
5. Use it throughout llvm
We don’t need to do all of them at once, and should be able to
evaluate the results along the way.

I share some of Daniel's concern about re-using FileManager, because the interface is very narrowly designed for Clang's usage and some of the functionality intended for the VFS is split out into SourceManager. My advice would be to start building a new VFS down in LLVM, and make FileManager an increasingly-shrinking interface on top of the new VFS. At some point, FileManager will be thin enough that its clients can just switch directly over to using the VFS, and FileManager can eventually go away.

I do realize that this could end up like PathV1 vs. PathV2, where both exist for a while, but the benefits of the VFS should outweigh our collective laziness.

So, as I noted in my replay to Daniel, after working through
llvm/Support (and bringing FileManager back to my mind) I think I'm
actually seeing a way forward, tell me if I'm crazy:

The following seems like a good plan and breakdown to me. And +1 on
working baby-steps.

1. morph FileSystem (I don't know whether that would include PathV2,
but I currently don't think so) into a class that exports a nice
interface for all FileSystem functions that we can override; to be
able to do that step-by-step, we could for example introduce a static
FileSystem pointer that is initialized with the default system file
system on startup (I like being able to do baby-steps)
2. add methods to FileSystem to support opening MemoryBuffers; the
path forward will be to move all calls to MemofyBuffer::get*File
through the FileSystem interface, but again that can be handled
incrementally
3. at that point we'd have enough stuff in FileSystem to rebase
FileManager on top of it; once 1 and 2 are finished for clang/.* we'll
be able to completely move the virtual file support over into a nice
OverlayFileSystem implementation (argh, I've coded too many of those
in my life);

Is this true (enough support to base FileManager on top)? I'm
specifically thinking about some of the places we look at inodes. Or
are you expecting to expose some kind of abstracted representation of
an inode?

4. add methods to FileSystem to support opening raw_fd_ostreams; this
is basically the process for reading mirrored

Thoughts? Completely broken approach? Broken order?

On a different note, switching to the SourceManager topic - I know
enough about SourceManager to be dangerous but not enough to ever
claim I would have understood the crazy buffer management that's going
on in ContentCache So I'd need a lot of help to pry that box open
eventually. Currently I'd think that this can be done in a subsequent
step after the file system is sorted out, but I might be wrong...

I'd try away from SourceManager. I would hope that the VFS layer stuff
doesn't interact (or minimally) with SourceManager (although
SourceManager is also aware of inodes, which is sad).

SourceManager has some code for overriding on-disk files with alternative buffers and for detecting when the underlying file system has changed from underneath us. That functionally should eventually move into FileSystem.

Ah, yes, I've already asked myself what the purpose of the overriding
is in SourceManager, as it seems like the FileManager's overriding
abilities would already be enough to achieve the same purpose (apart
from the direct call to stat() in SourceManager.cpp I see). If I'm not
missing anything I'd assume that all that crazyness would go away once
there is a full virtual file system?

That's my hope as well.

ddunbar · December 6, 2011, 4:36pm

Hi Manuel,

Hi,

while working on tooling on top of clang/llvm we found the file system
abstractions in clang/llvm to be one of the points that could be nicer
to integrate with. I’m writing this mail to propose a strawman and get
some feedback on what you guys think the right way forward is (or
whether we should just leave things as they are).

First, the FileManager we have in clang has helped us a lot for our
tooling - when we run clang in a mapreduce we don’t need to lay out
files on a disk, we can just map files into memory and happily clang
over them. We’re also using the same mechanism to map builtin
includes; in short, the FileManager has made it possible to do clang
at scale.

Now we’re aware that it was not really the intention of the
FileManager to allow doing the things we do with it: not every module
in clang uses the FileManager, and the moment we hit llvm there is no
FileManager at all. For example, in case of the Driver we hack around
the fact that the header search tries to access the file system
driectly in rather brittle ways, relying on implementation details and
#ifdefs.

So why not make FileManager a more principled (and still blazing fast)
file system abstraction?

Yes, please!

Great /me jumps right into the design discussion then.

Having a proper virtual file system across Clang and LLVM would be a huge boon, especially for pushing Clang into more applications that aren't simply "grab stuff from the local file system." The current FileManager/SourceManager dance used to provide in-memory content for a (virtual or old) file is quite the mess.

Pro:
- only one interface for developers to learn on the project (no more
PathV1 vs PathV2 vs FileManager)
- only one implementation (per-platform) for easier maintenance of the
file system platform abstraction
- one point to insert synchronization guarantees for tools / IDE
integration that wants to run clang in multiple threads at once (for
example when re-indexing on 12-ht-core machines)
- being able to replay compilations by injecting a virtual file system
that exactly “copies” the original file system’s content, which allows
easy scaling of replays, running tools against dirty edit buffers on a
lower level than the SourceManager and unit testing

… and making sure that all of the various stages of compilation see the same view of the file system.

Con:
- there would be yet another try at unifying the APIs which would be
in an intermediate state while being worked on (and PathV1 vs PathV2
is already bad enough)

I'm fine with intermediate states so long as the direction and benefits are clear. The former we can certainly discuss, and the latter is obvious already.

- making it the canonical file system interface is a lot of effort
that requires touching a lot of systems (while we’re volunteering to
do the work, it will probably eat up other people’s time, too)

I doubt I'll have much time to directly hack on this, but I'll be happy to review / discuss / help with adoption. libclang is one of the huge beneficiaries of such a change, so I care a lot about getting that to work well.

What parts (if any) of this type of transition makes sense?
1. Figure out the “correct” interface we’d want for FileManager to be
more generally useful
2. Change FileManager to that interface
4. Sink FileManager into llvm, so it can be used by other projects
4. Use it throughout clang
5. Use it throughout llvm
We don’t need to do all of them at once, and should be able to
evaluate the results along the way.

I share some of Daniel's concern about re-using FileManager, because the interface is very narrowly designed for Clang's usage and some of the functionality intended for the VFS is split out into SourceManager. My advice would be to start building a new VFS down in LLVM, and make FileManager an increasingly-shrinking interface on top of the new VFS. At some point, FileManager will be thin enough that its clients can just switch directly over to using the VFS, and FileManager can eventually go away.

I do realize that this could end up like PathV1 vs. PathV2, where both exist for a while, but the benefits of the VFS should outweigh our collective laziness.

So, as I noted in my replay to Daniel, after working through
llvm/Support (and bringing FileManager back to my mind) I think I'm
actually seeing a way forward, tell me if I'm crazy:
1. morph FileSystem (I don't know whether that would include PathV2,
but I currently don't think so) into a class that exports a nice
interface for all FileSystem functions that we can override; to be
able to do that step-by-step, we could for example introduce a static
FileSystem pointer that is initialized with the default system file
system on startup (I like being able to do baby-steps)
2. add methods to FileSystem to support opening MemoryBuffers; the
path forward will be to move all calls to MemofyBuffer::get*File
through the FileSystem interface, but again that can be handled
incrementally
3. at that point we'd have enough stuff in FileSystem to rebase
FileManager on top of it; once 1 and 2 are finished for clang/.* we'll
be able to completely move the virtual file support over into a nice
OverlayFileSystem implementation (argh, I've coded too many of those
in my life);
4. add methods to FileSystem to support opening raw_fd_ostreams; this
is basically the process for reading mirrored

Thoughts? Completely broken approach? Broken order?

On a different note, switching to the SourceManager topic - I know
enough about SourceManager to be dangerous but not enough to ever
claim I would have understood the crazy buffer management that's going
on in ContentCache So I'd need a lot of help to pry that box open
eventually. Currently I'd think that this can be done in a subsequent
step after the file system is sorted out, but I might be wrong...

Cheers,
/Manuel

Just for some background about why we have PathV2.

In my quest to improve Windows support across LLVM and Clang I ran
into many issues with the way PathV1 worked. A few were:
* PathV1, and most of LLVM, use std::string to handle errors. This
makes code more verbose than needed, and loses os level error
information.
* PathV1 makes it difficult to handle Unicode on Windows. Although
apparently I didn't solve the problem correctly either :P.

Are there open bugs? A quick search for unicode on llvm.org/bugs
didn't show anything windows specific.

* PathV1 requires constructing a Path object before calling any
functions. This is inefficient when most of the time you have
something StringRef'able.

Thus when I designed PathV2 I made it stateless, utf-8 only, and used
error_code.

The reason I bring this up is because I support a VFS, however, I want
to make sure that we keep in mind the reasons PathV2 was created while
writing it.

Yep, that's an important point. As I said, I've looked into PathV2 and
I really like the distinction between path manipulation and file
system access, and the general design of both PathV2 and
Support/FileSystem.

PathV1 -> PathV2 transition stopped because I ran out of time to do
it. There's so much code that uses it, and some of the changes are non
trivial in the cases where the Path class is stored and accessed many
places instead of just used to access the path functions.

The approach and order seems good to me. The llvm::sys::path parts can
stay separate, only the llvm::sys::fs parts need to be virtualized.

Yep, that was exactly my thought. Thanks for confirming and providing
all the background information!

Not sure if I follow here, but we will need to do some amount of work
on sys::path (not virtualization per se).

The current PathV2 API has embedded into it an assumption of working
with the native path type.

In the system I was originally imagining, we would have something like:
(1) sys::path::unix and sys::path::windows. So client code can use a
windows specific version if it wanted to for some reason. These would
have the same functions in them.
(2) sys::path, which would just be the same as one of the two
previous namespaces, selected to match the host.
(3) some other path variants, which would take a FileSystem object,
and then call the appropriate path functions for the FileSystem type.

Eventually, any code which we want to be virtualizable would need to
move to not using the sys::path functions that don't take a FileSystem
object.

- Daniel

Ruben_Van_Boxem · December 6, 2011, 5:16pm

2011/12/6 Daniel Dunbar <daniel@zuster.org>

Hi Manuel,

Hi,

while working on tooling on top of clang/llvm we found the file system
abstractions in clang/llvm to be one of the points that could be nicer
to integrate with. I’m writing this mail to propose a strawman and get
some feedback on what you guys think the right way forward is (or
whether we should just leave things as they are).

First, the FileManager we have in clang has helped us a lot for our
tooling - when we run clang in a mapreduce we don’t need to lay out
files on a disk, we can just map files into memory and happily clang
over them. We’re also using the same mechanism to map builtin
includes; in short, the FileManager has made it possible to do clang
at scale.

Now we’re aware that it was not really the intention of the
FileManager to allow doing the things we do with it: not every module
in clang uses the FileManager, and the moment we hit llvm there is no
FileManager at all. For example, in case of the Driver we hack around
the fact that the header search tries to access the file system
driectly in rather brittle ways, relying on implementation details and
#ifdefs.

So why not make FileManager a more principled (and still blazing fast)
file system abstraction?

Yes, please!

Great /me jumps right into the design discussion then.

Having a proper virtual file system across Clang and LLVM would be a huge boon, especially for pushing Clang into more applications that aren’t simply “grab stuff from the local file system.” The current FileManager/SourceManager dance used to provide in-memory content for a (virtual or old) file is quite the mess.

Pro:

only one interface for developers to learn on the project (no more
PathV1 vs PathV2 vs FileManager)

only one implementation (per-platform) for easier maintenance of the
file system platform abstraction

one point to insert synchronization guarantees for tools / IDE
integration that wants to run clang in multiple threads at once (for
example when re-indexing on 12-ht-core machines)

being able to replay compilations by injecting a virtual file system
that exactly “copies” the original file system’s content, which allows
easy scaling of replays, running tools against dirty edit buffers on a
lower level than the SourceManager and unit testing

… and making sure that all of the various stages of compilation see the same view of the file system.

Con:

there would be yet another try at unifying the APIs which would be
in an intermediate state while being worked on (and PathV1 vs PathV2
is already bad enough)

I’m fine with intermediate states so long as the direction and benefits are clear. The former we can certainly discuss, and the latter is obvious already.

making it the canonical file system interface is a lot of effort
that requires touching a lot of systems (while we’re volunteering to
do the work, it will probably eat up other people’s time, too)

I doubt I’ll have much time to directly hack on this, but I’ll be happy to review / discuss / help with adoption. libclang is one of the huge beneficiaries of such a change, so I care a lot about getting that to work well.

What parts (if any) of this type of transition makes sense?

Figure out the “correct” interface we’d want for FileManager to be
more generally useful

Change FileManager to that interface

Sink FileManager into llvm, so it can be used by other projects

Use it throughout clang

Use it throughout llvm
We don’t need to do all of them at once, and should be able to
evaluate the results along the way.

I share some of Daniel’s concern about re-using FileManager, because the interface is very narrowly designed for Clang’s usage and some of the functionality intended for the VFS is split out into SourceManager. My advice would be to start building a new VFS down in LLVM, and make FileManager an increasingly-shrinking interface on top of the new VFS. At some point, FileManager will be thin enough that its clients can just switch directly over to using the VFS, and FileManager can eventually go away.

I do realize that this could end up like PathV1 vs. PathV2, where both exist for a while, but the benefits of the VFS should outweigh our collective laziness.

So, as I noted in my replay to Daniel, after working through
llvm/Support (and bringing FileManager back to my mind) I think I’m
actually seeing a way forward, tell me if I’m crazy:

morph FileSystem (I don’t know whether that would include PathV2,
but I currently don’t think so) into a class that exports a nice
interface for all FileSystem functions that we can override; to be
able to do that step-by-step, we could for example introduce a static
FileSystem pointer that is initialized with the default system file
system on startup (I like being able to do baby-steps)

add methods to FileSystem to support opening MemoryBuffers; the
path forward will be to move all calls to MemofyBuffer::get*File
through the FileSystem interface, but again that can be handled
incrementally

at that point we’d have enough stuff in FileSystem to rebase
FileManager on top of it; once 1 and 2 are finished for clang/.* we’ll
be able to completely move the virtual file support over into a nice
OverlayFileSystem implementation (argh, I’ve coded too many of those
in my life);

add methods to FileSystem to support opening raw_fd_ostreams; this
is basically the process for reading mirrored

Thoughts? Completely broken approach? Broken order?

On a different note, switching to the SourceManager topic - I know
enough about SourceManager to be dangerous but not enough to ever
claim I would have understood the crazy buffer management that’s going
on in ContentCache So I’d need a lot of help to pry that box open
eventually. Currently I’d think that this can be done in a subsequent
step after the file system is sorted out, but I might be wrong…

Cheers,
/Manuel

Just for some background about why we have PathV2.

In my quest to improve Windows support across LLVM and Clang I ran
into many issues with the way PathV1 worked. A few were:

PathV1, and most of LLVM, use std::string to handle errors. This
makes code more verbose than needed, and loses os level error
information.

PathV1 makes it difficult to handle Unicode on Windows. Although
apparently I didn’t solve the problem correctly either :P.

Are there open bugs? A quick search for unicode on llvm.org/bugs
didn’t show anything windows specific.

PathV1 requires constructing a Path object before calling any
functions. This is inefficient when most of the time you have
something StringRef’able.

Thus when I designed PathV2 I made it stateless, utf-8 only, and used
error_code.

The reason I bring this up is because I support a VFS, however, I want
to make sure that we keep in mind the reasons PathV2 was created while
writing it.

Yep, that’s an important point. As I said, I’ve looked into PathV2 and
I really like the distinction between path manipulation and file
system access, and the general design of both PathV2 and
Support/FileSystem.

PathV1 → PathV2 transition stopped because I ran out of time to do
it. There’s so much code that uses it, and some of the changes are non
trivial in the cases where the Path class is stored and accessed many
places instead of just used to access the path functions.

The approach and order seems good to me. The llvm::sys::path parts can
stay separate, only the llvm::sys::fs parts need to be virtualized.

Yep, that was exactly my thought. Thanks for confirming and providing
all the background information!

Not sure if I follow here, but we will need to do some amount of work
on sys::path (not virtualization per se).

The current PathV2 API has embedded into it an assumption of working
with the native path type.

In the system I was originally imagining, we would have something like:
(1) sys::path::unix and sys::path::windows. So client code can use a
windows specific version if it wanted to for some reason. These would
have the same functions in them.

Worst. Idea. Ever. Sorry to be blunt, but how does that help higher-level code at all? The underlying implementation (type) of the path objects could be different, but the API itself should really be platform-independent. No use for a Windows path in a Unix app, and if so, it’s not LLVM’s place to provide that unneeded functionality.

(2) sys::path, which would just be the same as one of the two
previous namespaces, selected to match the host.

This is better. LLVM/Clang shouldn’t know what a sys::path is (UTF8 Unix path or UTF16 Windows path), they just need a “path”.

(3) some other path variants, which would take a FileSystem object,
and then call the appropriate path functions for the FileSystem type.

Eventually, any code which we want to be virtualizable would need to
move to not using the sys::path functions that don’t take a FileSystem
object.

You guys probably know more than I do about LLVM/Clang’s needs wrt a file system cache, but apart from that, why not model according to the Boost implementation? Futureproof and proven useful.

Ruben

r4nt · December 6, 2011, 5:23pm

Hi Manuel,

Hi,

while working on tooling on top of clang/llvm we found the file system
abstractions in clang/llvm to be one of the points that could be nicer
to integrate with. I’m writing this mail to propose a strawman and get
some feedback on what you guys think the right way forward is (or
whether we should just leave things as they are).

First, the FileManager we have in clang has helped us a lot for our
tooling - when we run clang in a mapreduce we don’t need to lay out
files on a disk, we can just map files into memory and happily clang
over them. We’re also using the same mechanism to map builtin
includes; in short, the FileManager has made it possible to do clang
at scale.

Now we’re aware that it was not really the intention of the
FileManager to allow doing the things we do with it: not every module
in clang uses the FileManager, and the moment we hit llvm there is no
FileManager at all. For example, in case of the Driver we hack around
the fact that the header search tries to access the file system
driectly in rather brittle ways, relying on implementation details and
#ifdefs.

So why not make FileManager a more principled (and still blazing fast)
file system abstraction?

Yes, please!

Great /me jumps right into the design discussion then.

Having a proper virtual file system across Clang and LLVM would be a huge boon, especially for pushing Clang into more applications that aren't simply "grab stuff from the local file system." The current FileManager/SourceManager dance used to provide in-memory content for a (virtual or old) file is quite the mess.

Pro:
- only one interface for developers to learn on the project (no more
PathV1 vs PathV2 vs FileManager)
- only one implementation (per-platform) for easier maintenance of the
file system platform abstraction
- one point to insert synchronization guarantees for tools / IDE
integration that wants to run clang in multiple threads at once (for
example when re-indexing on 12-ht-core machines)
- being able to replay compilations by injecting a virtual file system
that exactly “copies” the original file system’s content, which allows
easy scaling of replays, running tools against dirty edit buffers on a
lower level than the SourceManager and unit testing

… and making sure that all of the various stages of compilation see the same view of the file system.

Con:
- there would be yet another try at unifying the APIs which would be
in an intermediate state while being worked on (and PathV1 vs PathV2
is already bad enough)

I'm fine with intermediate states so long as the direction and benefits are clear. The former we can certainly discuss, and the latter is obvious already.

- making it the canonical file system interface is a lot of effort
that requires touching a lot of systems (while we’re volunteering to
do the work, it will probably eat up other people’s time, too)

I doubt I'll have much time to directly hack on this, but I'll be happy to review / discuss / help with adoption. libclang is one of the huge beneficiaries of such a change, so I care a lot about getting that to work well.

What parts (if any) of this type of transition makes sense?
1. Figure out the “correct” interface we’d want for FileManager to be
more generally useful
2. Change FileManager to that interface
4. Sink FileManager into llvm, so it can be used by other projects
4. Use it throughout clang
5. Use it throughout llvm
We don’t need to do all of them at once, and should be able to
evaluate the results along the way.

I share some of Daniel's concern about re-using FileManager, because the interface is very narrowly designed for Clang's usage and some of the functionality intended for the VFS is split out into SourceManager. My advice would be to start building a new VFS down in LLVM, and make FileManager an increasingly-shrinking interface on top of the new VFS. At some point, FileManager will be thin enough that its clients can just switch directly over to using the VFS, and FileManager can eventually go away.

I do realize that this could end up like PathV1 vs. PathV2, where both exist for a while, but the benefits of the VFS should outweigh our collective laziness.

So, as I noted in my replay to Daniel, after working through
llvm/Support (and bringing FileManager back to my mind) I think I'm
actually seeing a way forward, tell me if I'm crazy:
1. morph FileSystem (I don't know whether that would include PathV2,
but I currently don't think so) into a class that exports a nice
interface for all FileSystem functions that we can override; to be
able to do that step-by-step, we could for example introduce a static
FileSystem pointer that is initialized with the default system file
system on startup (I like being able to do baby-steps)
2. add methods to FileSystem to support opening MemoryBuffers; the
path forward will be to move all calls to MemofyBuffer::get*File
through the FileSystem interface, but again that can be handled
incrementally
3. at that point we'd have enough stuff in FileSystem to rebase
FileManager on top of it; once 1 and 2 are finished for clang/.* we'll
be able to completely move the virtual file support over into a nice
OverlayFileSystem implementation (argh, I've coded too many of those
in my life);
4. add methods to FileSystem to support opening raw_fd_ostreams; this
is basically the process for reading mirrored

Thoughts? Completely broken approach? Broken order?

On a different note, switching to the SourceManager topic - I know
enough about SourceManager to be dangerous but not enough to ever
claim I would have understood the crazy buffer management that's going
on in ContentCache So I'd need a lot of help to pry that box open
eventually. Currently I'd think that this can be done in a subsequent
step after the file system is sorted out, but I might be wrong...

Cheers,
/Manuel

Just for some background about why we have PathV2.

In my quest to improve Windows support across LLVM and Clang I ran
into many issues with the way PathV1 worked. A few were:
* PathV1, and most of LLVM, use std::string to handle errors. This
makes code more verbose than needed, and loses os level error
information.
* PathV1 makes it difficult to handle Unicode on Windows. Although
apparently I didn't solve the problem correctly either :P.

Are there open bugs? A quick search for unicode on llvm.org/bugs
didn't show anything windows specific.

* PathV1 requires constructing a Path object before calling any
functions. This is inefficient when most of the time you have
something StringRef'able.

Thus when I designed PathV2 I made it stateless, utf-8 only, and used
error_code.

The reason I bring this up is because I support a VFS, however, I want
to make sure that we keep in mind the reasons PathV2 was created while
writing it.

Yep, that's an important point. As I said, I've looked into PathV2 and
I really like the distinction between path manipulation and file
system access, and the general design of both PathV2 and
Support/FileSystem.

PathV1 -> PathV2 transition stopped because I ran out of time to do
it. There's so much code that uses it, and some of the changes are non
trivial in the cases where the Path class is stored and accessed many
places instead of just used to access the path functions.

The approach and order seems good to me. The llvm::sys::path parts can
stay separate, only the llvm::sys::fs parts need to be virtualized.

Yep, that was exactly my thought. Thanks for confirming and providing
all the background information!

Not sure if I follow here, but we will need to do some amount of work
on sys::path (not virtualization per se).

The current PathV2 API has embedded into it an assumption of working
with the native path type.

I'm guessing that you're concerned about path virtualization regarding
your original proposal of
being able to regression test problems on one platform on a different one?

In the system I was originally imagining, we would have something like:
(1) sys::path::unix and sys::path::windows. So client code can use a
windows specific version if it wanted to for some reason. These would
have the same functions in them.
(2) sys::path, which would just be the same as one of the two
previous namespaces, selected to match the host.
(3) some other path variants, which would take a FileSystem object,
and then call the appropriate path functions for the FileSystem type.

Eventually, any code which we want to be virtualizable would need to
move to not using the sys::path functions that don't take a FileSystem
object.

I think virtualizing the file system and virtualizing the host OS are
related, but different beasts,
so I'd really like to handle them separately.
For example, I do not think that the host OS should be in the
FileSystem - that sounds like mixed
abstractions. I'm not sure yet how to abstract that part out. I'll
starting thinking more about it.
As a straw-man (that is probably not applicable to llvm at this point
any more), in a previous
project I've successfully implemented the strategy of handling all
paths internally as unix paths
and converting to the needed representation when necessary.

Cheers,
/Manuel

r4nt · December 6, 2011, 5:28pm

As far as I understand that's the case currently with Support/PathV2
and Support/FileSystem.
The point of the exercise is not to radically change the interface
(the interface looks fine and makes sense),
but to virtualize it.

Cheers,
/Manuel

ddunbar · December 6, 2011, 8:58pm

Hi Manuel,

Hi,

while working on tooling on top of clang/llvm we found the file system
abstractions in clang/llvm to be one of the points that could be nicer
to integrate with. I’m writing this mail to propose a strawman and get
some feedback on what you guys think the right way forward is (or
whether we should just leave things as they are).

First, the FileManager we have in clang has helped us a lot for our
tooling - when we run clang in a mapreduce we don’t need to lay out
files on a disk, we can just map files into memory and happily clang
over them. We’re also using the same mechanism to map builtin
includes; in short, the FileManager has made it possible to do clang
at scale.

Now we’re aware that it was not really the intention of the
FileManager to allow doing the things we do with it: not every module
in clang uses the FileManager, and the moment we hit llvm there is no
FileManager at all. For example, in case of the Driver we hack around
the fact that the header search tries to access the file system
driectly in rather brittle ways, relying on implementation details and
#ifdefs.

So why not make FileManager a more principled (and still blazing fast)
file system abstraction?

Yes, please!

Great /me jumps right into the design discussion then.

Having a proper virtual file system across Clang and LLVM would be a huge boon, especially for pushing Clang into more applications that aren't simply "grab stuff from the local file system." The current FileManager/SourceManager dance used to provide in-memory content for a (virtual or old) file is quite the mess.

Pro:
- only one interface for developers to learn on the project (no more
PathV1 vs PathV2 vs FileManager)
- only one implementation (per-platform) for easier maintenance of the
file system platform abstraction
- one point to insert synchronization guarantees for tools / IDE
integration that wants to run clang in multiple threads at once (for
example when re-indexing on 12-ht-core machines)
- being able to replay compilations by injecting a virtual file system
that exactly “copies” the original file system’s content, which allows
easy scaling of replays, running tools against dirty edit buffers on a
lower level than the SourceManager and unit testing

… and making sure that all of the various stages of compilation see the same view of the file system.

Con:
- there would be yet another try at unifying the APIs which would be
in an intermediate state while being worked on (and PathV1 vs PathV2
is already bad enough)

I'm fine with intermediate states so long as the direction and benefits are clear. The former we can certainly discuss, and the latter is obvious already.

- making it the canonical file system interface is a lot of effort
that requires touching a lot of systems (while we’re volunteering to
do the work, it will probably eat up other people’s time, too)

I doubt I'll have much time to directly hack on this, but I'll be happy to review / discuss / help with adoption. libclang is one of the huge beneficiaries of such a change, so I care a lot about getting that to work well.

What parts (if any) of this type of transition makes sense?
1. Figure out the “correct” interface we’d want for FileManager to be
more generally useful
2. Change FileManager to that interface
4. Sink FileManager into llvm, so it can be used by other projects
4. Use it throughout clang
5. Use it throughout llvm
We don’t need to do all of them at once, and should be able to
evaluate the results along the way.

I share some of Daniel's concern about re-using FileManager, because the interface is very narrowly designed for Clang's usage and some of the functionality intended for the VFS is split out into SourceManager. My advice would be to start building a new VFS down in LLVM, and make FileManager an increasingly-shrinking interface on top of the new VFS. At some point, FileManager will be thin enough that its clients can just switch directly over to using the VFS, and FileManager can eventually go away.

I do realize that this could end up like PathV1 vs. PathV2, where both exist for a while, but the benefits of the VFS should outweigh our collective laziness.

So, as I noted in my replay to Daniel, after working through
llvm/Support (and bringing FileManager back to my mind) I think I'm
actually seeing a way forward, tell me if I'm crazy:
1. morph FileSystem (I don't know whether that would include PathV2,
but I currently don't think so) into a class that exports a nice
interface for all FileSystem functions that we can override; to be
able to do that step-by-step, we could for example introduce a static
FileSystem pointer that is initialized with the default system file
system on startup (I like being able to do baby-steps)
2. add methods to FileSystem to support opening MemoryBuffers; the
path forward will be to move all calls to MemofyBuffer::get*File
through the FileSystem interface, but again that can be handled
incrementally
3. at that point we'd have enough stuff in FileSystem to rebase
FileManager on top of it; once 1 and 2 are finished for clang/.* we'll
be able to completely move the virtual file support over into a nice
OverlayFileSystem implementation (argh, I've coded too many of those
in my life);
4. add methods to FileSystem to support opening raw_fd_ostreams; this
is basically the process for reading mirrored

Thoughts? Completely broken approach? Broken order?

On a different note, switching to the SourceManager topic - I know
enough about SourceManager to be dangerous but not enough to ever
claim I would have understood the crazy buffer management that's going
on in ContentCache So I'd need a lot of help to pry that box open
eventually. Currently I'd think that this can be done in a subsequent
step after the file system is sorted out, but I might be wrong...

Cheers,
/Manuel

Just for some background about why we have PathV2.

In my quest to improve Windows support across LLVM and Clang I ran
into many issues with the way PathV1 worked. A few were:
* PathV1, and most of LLVM, use std::string to handle errors. This
makes code more verbose than needed, and loses os level error
information.
* PathV1 makes it difficult to handle Unicode on Windows. Although
apparently I didn't solve the problem correctly either :P.

Are there open bugs? A quick search for unicode on llvm.org/bugs
didn't show anything windows specific.

* PathV1 requires constructing a Path object before calling any
functions. This is inefficient when most of the time you have
something StringRef'able.

Thus when I designed PathV2 I made it stateless, utf-8 only, and used
error_code.

The reason I bring this up is because I support a VFS, however, I want
to make sure that we keep in mind the reasons PathV2 was created while
writing it.

Yep, that's an important point. As I said, I've looked into PathV2 and
I really like the distinction between path manipulation and file
system access, and the general design of both PathV2 and
Support/FileSystem.

PathV1 -> PathV2 transition stopped because I ran out of time to do
it. There's so much code that uses it, and some of the changes are non
trivial in the cases where the Path class is stored and accessed many
places instead of just used to access the path functions.

The approach and order seems good to me. The llvm::sys::path parts can
stay separate, only the llvm::sys::fs parts need to be virtualized.

Yep, that was exactly my thought. Thanks for confirming and providing
all the background information!

Not sure if I follow here, but we will need to do some amount of work
on sys::path (not virtualization per se).

The current PathV2 API has embedded into it an assumption of working
with the native path type.

I'm guessing that you're concerned about path virtualization regarding
your original proposal of
being able to regression test problems on one platform on a different one?

In the system I was originally imagining, we would have something like:
(1) sys::path::unix and sys::path::windows. So client code can use a
windows specific version if it wanted to for some reason. These would
have the same functions in them.
(2) sys::path, which would just be the same as one of the two
previous namespaces, selected to match the host.
(3) some other path variants, which would take a FileSystem object,
and then call the appropriate path functions for the FileSystem type.

Eventually, any code which we want to be virtualizable would need to
move to not using the sys::path functions that don't take a FileSystem
object.

I think virtualizing the file system and virtualizing the host OS are
related, but different beasts,
so I'd really like to handle them separately.

That seems reasonable to me.

- Daniel

Topic		Replies	Views
RFC: A virtual file system for clang Clang Frontend	37	455	May 23, 2014
"devirtualizing" files in the VFS LLVM Dev List Archives	6	221	November 28, 2018
Unicode path handling on Windows Clang Frontend	64	707	July 13, 2012
[libTooling] Custom vfs::FileSystem for ClangTool Clang Frontend	6	155	May 28, 2015
[RFC] File system sandboxing in Clang/LLVM Clang Frontend	22	1066	December 12, 2025

LLVM & Clang file management

Related topics