[RFC] Modernizing and formalizing the File abstraction interface

The current File abstraction in the libc is functional but has the following stylistic and ease of use problems:

  1. Old style function parameters: The interface functions use old style
    functions - the read and write methods take a pointer and corresponding
    size arguments.
  2. Use of function pointers: An implementation of the abstraction has to
    provide pointers to the functions which perform the platform read, write
    and other unbuffered primitive operations. The main reason for using
    function pointers is because of the restriction that we cannot use virtual
    functions in the libc.
  3. Concrete implementations have to use inheritance: A concrete implementation
    has to subclass the abstract File base class.

This RFC proposes improving the interface with the following changes:

  1. Modern function parameters: Use cpp::span<uint8_t> instead of pointer and size arguments where relevant.
  2. Composition over inheritance: Concrete plat implementation of the File class will use the more preferred composition pattern instead of inheritance.
  3. Instead of function pointers, use an instance with a specific interface: Instead of requiring the platform implementation to provide pointers to the functions performing the unbuffered primitive
    file operations (read, write etc.), require the implementation to provide an instance of a class which provides a certain interface. If C++20 is available, then that interface will be formalized using a concept.

Using cpp::span<uint8_t> as arguments to file operation functions

This is a straightforward change. The elements of the interface affected by
this change will be discussed in detail at relevant places in other parts of this RFC.

The new File class

The new File class will be a final class with the interface as listed below. Most of the differences in comparison to the current File class are straighforward to see - the read and write functions take a
cpp::span argument instead of data and the size arguments. The more involved changes are as follows:

  1. The File constructor takes a much shorter list of arguments. Apart from the fact that it is now a template, the other noteworthy change is that instead of taking pointers to functions implementing platform specific file operations as arguments, it takes a single object supporting a specific interface. This interface, and other requirements on that object, are discussed in detail in a later section.
  2. The destructor of the File class is public and defaulted. This ensures that the global objects for stdout etc. do not require a destructor call via an atexit callback.
  3. The file resources will be cleaned up by the close method - the old static method cleanup has been removed. The close method should be called to close and cleanup the file.
class File final {
  ...

public:
  template <typename T>
  File(
      T &platform_file, // The platform implementation of PlatformFile
      cpp::span<uint8_t> buffer, // Memory to use for buffering
      int buffer_mode, // Buffering mode to use
      bool owned, // If the buffer is owned by the new File instance
      ModeFlags modeflags // Mode in which the file was opened
  );

  ~File() = default;

  // Read bytes into |buf| under the file lock.
  FileIOResult read(cpp::span<uint8_t> buf);

  // Read bytes into |buf| without the file lock.
  FileIOResult read_unlocked(cpp::span<uint8_t> buf);

  // Write bytes from |buf| under the file lock.
  FileIOResult write(cpp::span<const uint8_t> buf);

  // Write bytes from |buf| without the file lock.
  FileIOResult write_unlocked(cpp::span<const uint8_t> buf);

  ErrorOr<int> seek(long offset, int whence);

  ErrorOr<long> tell();

  // Flush under the file lock.
  int flush();

  // Flush without the file lock.
  int flush_unlocked();

  int ungetc_unlocked(int c);

  int ungetc(int c);

  bool error();

  bool error_unlocked() const;

  void clearerr();

  void clearerr_unlocked();

  bool iseof();

  bool iseof_unlocked() const;

  // Previously, close was a private method. In the new file,
  // it is a public function which also cleans up the file.
  int close();
};

Interface of the platform implementation

The new File constructor takes an instance of a class which has to support
a special interface as follows:

  1. The class should copyable.
  2. The class should have a trivial destructor.
  3. The class should have the following non-static methods which perform
    the corresponding unlocked platform-specific file operations:
    // Performs the platform specific file read operation.
    FileIOResult read(cpp::span<uint8_t> buf);
    
    // Performs the platform specific file write operation.
    FileIOResult write(cpp::span<const uint8_t> buf);
    
    // Performs the platform specific file flush operation.
    int flush();
    
    // Performs the platform specific file seek operation.
    ErrorOr<int> seek(long offset, int whence);
    
    // Performs the platform specific file close operation.
    int close();
    

Formalizing the requirements with C++20 concepts

If C++20 is available, then the above interface can be formalized using the
following concept:

struct FileOpResult;

template <typename T, typename U>
concept IsSameAs = cpp::is_same_v<T, U>;

template <typename T>
concept PlatformFile = requires (T file) {
  {file.read(cpp::span<uint8_t>())} -> IsSameAs<FileOpResult>;
  {file.write(cpp::span<const uint8_t>())} -> IsSameAs<FileOpResult>;
  {file.flush()} -> IsSameAs<int>;
  {file.seek(long(), int())} -> IsSameAs<ErrorOr<int>>;
  {file.close()} -> IsSameAs<int>;
} && __is_trivially_destructible(T) &&
cpp::is_copy_assignable_v<T> && cpp::is_copy_constructible_v<T>;

Using above concept, the File class’ template constructor can be declared as:

template <PlatformFile T>
File(T &platform_file, // The platform implementation of PlatformFile
     cpp::span<uint8_t> buffer, // Memory to use for buffering
     int buffer_mode, // Buffering mode to use
     bool owned, // If the buffer is owned by the new File instance
     ModeFlags modeflags // Mode in which the file was opened
);

Creating a File object

Very similar to how it is done today, platform implementation should provide a function by name openfile with the following signature:

ErrorOr<File *> openfile(const char *path, const char *mode);

The difference between the new approach and the old style is in the way in which the File instance is created and returned - instead of returning an instance of the platform specific subclass of File, it should return an appropriately constructed File instance. At a high level, it will have the following structure:

ErrorOr<File *> openfile(const char *path, const char *mode) {
  ... // Platform specific code
  PlatformFileImpl pf(...);
  AllocChecker ac;
  auto file = new (ac) File(pf, ....);
  if (!ac) {
    ... // Handle allocation failure appropriately
  }
  return file;
}

Type erasure in the File class

Type erasure is achieved using cpp::function instead of function pointers. For example, the platform specific read operation is stored as follows:

class File final {
  cpp::function<FileOpResult(cpp::span<uint8_t>)> platform_read;

  ...
public:
  template <PlatformFile T>
  constexpr explicit File(T &file, ...) :
    platform_read([f = file](cpp::span<uint8_t> buffer) mutable {
        return f.read(buffer); }), ... {
  }

If every platform + the mock one have to provide the function, the semantics might secretly differ.

How about:

template<class FileSystem>
ErrorOr<File *> openfile<FileSystem>(const char *path, const char *mode);

During offline discussions with various stake holders, couple of important things were pointed out:

  1. Use cpp::function for type erasure is ill-informed - true type erasure in that manner will require some kind of virtual function based polymorphism somewhere. The current implementation of cpp::function does not take ownership of the callable and does not use virtual functions.
  2. The libc’s fflush function is not to trigger the platform flush operation, say by the fsync syscall on Linux. See the “Notes” section of this man page: fflush(3) - Linux manual page.

So, the plan to take this forward is as follows:

  1. Remove the use of the platform provided flush operation. Which means that it will also not be required of the platform to provide a platform level flush operation (like fsync on Linux).
  2. Retain the current use of function pointers for type erasure.
  3. Modernize the API as originally proposed by this RFC.

For GPU targets it’s very important that we do not rely on function pointers if avoidable. I’m already seeing issues related to them in the implementation. GPUs in general do not fare very well when control flow is unpredictable and functions can’t be inlined.

If I understand correctly, the need for the file pointer interface is to support the fopencookie implementation which uses function pointers at its core. I’m thinking it might be possible to rely on dynamic dispatch instead, a runtime branch is likely going to be cheaper than an indirect call.

I quickly typed up a simple outline of an approach that does a dynamic dispatch on the implementation type could look like Compiler Explorer. It’s a bit more code, but might be necessary if we need to worry about cookie files as a separate implementation. Suggestions welcome.

Hi, sorry about the late response, some things got put on the back burner after Siva moved on.

Based on what’s been discussed I think the best design would probably be one where the use of function pointers is a compile time flag instead of runtime. It’s very unlikely that one platform will need to use both styles. The direct call design is attractive for GPUs, embedded devices, and other closed ecosystem projects, whereas the function pointer design is useful in more system-wide libc cases.

From here, my plan is to create a compile flag that will select which design to use, and use the config.json system to set its default states for our various targets. Does that work for you?