[RFC] Write support for LLVM Virtual File System (VFS) to virtualize compiler outputs

Hello,

Our ROCm compiler team at AMD is working on performance and stability improvements in Clang compiler and is proposing the following RFC.

Proposed Changes:

Write support for LLVM Virtual File System (VFS) to virtualize compiler outputs.

Background:

  • Throughout its execution, the Clang Driver often interacts with the file system. For example, reading input files, reading and writing bitcode files, object files, linked object files, writing executables, etc.

  • Reading and writing to and from disk can be expensive compared to reading from and writing to memory during an online compilation. With an in-memory file system abstraction we could see improvements in stability as we are potentially removing a class of errors possible when interacting with an actual file system. Moreover, this opens the possibility of running clang when no file system is available, like in some embedded environments. Hence, we should update the Clang Driver to optionally support/leverage a read/write Virtual File System (VFS). When VFS compilation is enabled, instead of reading/writing intermediate files to/from disk, they can be managed in-memory and in-process via the VFS.

Proposed solution / approach:

  • Updates are needed to allow both reading and writing of files to the VFS.

  • Extend existing filesystem classes such as RealFileSystem, OverlayFileSystem, and InMemoryFileSystem classes to use VFS write infrastructure.

  • We are currently working on our internal implementation of write support for VFS. However, we have stumbled upon an implementation of write support for VFS written by Duncan P. N. Exon Smith that he has requested an RFC for ([cfe-dev] RFC: Add an llvm::vfs::OutputManager to allow Clang to virtualize compiler outputs). We are heavily considering resurrecting his work since it has already been discussed on llvm-dev and partially reviewed.

  • We also need VFS support for LLD in order to provide an end-to-end VFS for the Clang driver because LLD is called internally by the Clang driver.

    • Hence, after we can provide write support for VFS; we want to make necessary changes/updates to LLD for an end-to-end VFS.

With these proposed changes, we expect to see improved performance and stability in the clang driver.

@Dexonsmith , what are your plans for VFS support patches? Is it ok if we pick up where you left them and commandeer your reviews?

We appreciate any other thoughts and feedback regarding this.

@dexonsmith discourse RFC link:

Thanks for reading!

Thanks for bringing this up and OutputBackend is a task long overdue. Feel free to ping me or @blangmuir if Duncan is not available to review/provide feedback as we are going to take over the patches and upstream an improved version of OutputBackend. You can find the latest version in our experimental branch here (only list some headers here):

The main goal of the output backend is to provide an abstraction for the output so the output producers don’t need to deal with file system when writing outputs and the consumers for the output can decide where the output should go by constructing the corresponding output backend.

In my opinion, it is a different approach to deal with in memory output from writable VFS but it is not conflicting with writable VFS. In my opinion, the interface for output backend is cleaner than writing to file system and it is a lot more configurable than VFS. On the other hand, you can also overlay output backend onto a writable VFS, if it turns out working better for your use case.

I will start to post patches for our newer implementation ASAP and in the meantime, you can read the existing code on our experimental branch to see if that works for you. Any feedback is welcomed.

Patch posted. See: ⚙ D133504 Support: Add vfs::OutputBackend and OutputFile to virtualize compiler outputs and other patches in the stack.

Reviews and feedbacks are welcomed.