Filesystem has Landed in Libc++

Hi All,

I recently committed to trunk. I wanted to bring attention to some quirks it currently has.

First, it’s been put in a separate library, libc++fs, for now. Users are responsible for linking the library when they use filesystem.

Second, it should still not be considered ABI stable. Vendors should be aware of this before shipping it. Hopefully all the standard and implementation bugs can be resolved by the next release, and we can move it into the main dylib.

Third, libc++experimental no longer contains the symbols for <experimental/filesystem>, which is really just is disguise. If you’ve been using <experimental/filesystem> you now need to link libc++fs instead.

Fourth, <filesystem> is technically available in C++11 and later. The implementation lives in the std::__fs::filesystem namespace, which is marked “inline” in C++17 but not before. We should consider documenting this as an extension to its use w/o C++17.

Happy coding,

/Eric

[1] http://libcxx.llvm.org/docs/UsingLibcxx.html#using-filesystem-and-libc-fs

Eric,

I’m curious to know what the concerns are w.r.t. providing ABI stability for filesystem right now. What do you envision may require changing the ABI in the future?

I feel like taking filesystem out of experimental/ without providing the usual guarantees provided by libc++ for non-experimental code may not be a good idea, as we’ll be pretending that we support filesystem when we really only half support it. In other words, I think the number of people that will start using filesystem while consciously knowing that it is ABI-unstable (and what that means) is quite small, and that is making our users a disservice.

Would it be possible to instead ship the parts we’re not quite sure we can keep ABI stable in the headers (with _LIBCPP_HIDE_FROM_ABI) for the time being, and lower them to the dylib eventually as things stabilize? This would allow us to ship filesystem with LLVM 7.0 without any compromise on the guarantees we make our users.

I’m curious to know what you think of this suggestion.
Louis

FWIW, I’d like for us to come to an agreement before the branch for LLVM 7.0 is cut. How do others feel about this? Am I wrong when I claim that shipping an ABI-unstable feature in the std:: namespace is a deviation from normal practice? Am I overcautious when I say it’s asking for trouble?

Eric, I know you’re busy and may not have time to do the work so I’m totally willing to chime in, but I’d like to have your thoughts on my objection first.

Cheers,
Louis

I only saw this now after the llvm 7 branch was cut.

It would be good to get this all figured out before the release :slight_smile:

Hi,

My current understanding of the problem (based on https://reviews.llvm.org/D49774) is that we have a type, file_time_type, which is part of the ABI and is currently defined as std::chrono::time_point<_FileSystemClock>, where _FileSystemClock is an internal type represented using a __int128_t. However, C++20 will add a type called file_clock and redefine file_time_type to be std::chrono::time_point<std::chrono::file_clock> instead, which is an ABI break.

Is this correct, and is this the only concern we have with respect to the ABI stability of Filesystem as currently included in the release branch for LLVM 7?

If that is correct, then we can either
(1) define file_time_type in a way that is forward-ABI-compatible with the definition that will be used in C++20 and patch LLVM 7 accordingly
(2) do nothing and live with an ABI break in C++20
(3) revert r338093 (and perhaps some patches that depend on it) in the LLVM 7 branch, and keep shipping Filesystem as being experimental

(1) is ideal, but we have to confirm that my understanding is correct AND agree on a fix, soon.
(2) is a no-go.
(3) is the safest and simplest option.

If we can't get a solution for (1) going very soon, I would like to request that we implement (3) and remove Filesystem from the non-experimental namespace in the LLVM 7 branch. This is unfortunate, but we're currently running towards (2), which is desirable for nobody.

I suggest a deadline of Friday August 10th to have settled on a solution for (1) so as to give us time to implement it and review it in time for RC 2, on August 22nd.

Louis

This isn’t about C++20 and the changes. I have no doubt any ABI breaking changes W.R.T. that will be addressed.

During my work finishing up filesystem, I discovered/filed a number of issues with ABI breaking concerns, in particular around directory_entry, but perhaps other places as well (recursive_directory_iterator, various calls in [fs.op.funcs]). These issues have not yet been addressed in the standard, and these spec should be addressed before we commit to an ABI.

Note that we ofter put things into namespace std before they’re considered ABI stable, but up until now that has been limited to components specified by a standard still in flight. Also note that libstdc++ also hasn’t committed to ABI stability for yet. This is the reason we put the symbols in a separate static library.

/Eric

Why did you want the symbols moved out of libc++experimental, and for the header to be moved from <experimental/filesystem> to ? It certainly seems like it’d be safer and clearer to move them back to the old locations, but it’s not clear to me if that’d be trading off something else of value.

Was there some other greater purpose served by the change in location, which will be hindered by moving them back where they were until ABI-stability can be promised?

Well, technically <experimental/filesystem> tracks one specification, another. I normally advocate for libc++'s <experimental/foo> to act like the specification for , but not all STLs do that.

Therefore, the purpose of the change was to allow users to write portable code across multiple STL’s using (minus the changes to the build system needed to link it).

/Eric

[I have had discussions with several people, and I'm attempting to
summarize here]

Due to factors beyond our control, I do not believe that we can provide a
version of std::filesystem and promise future ABI stability at this time.

More information:

* The features for caching directory information were added late in the
C++17 cycle, and there have been some concerns about them.
LWG issue #2708 (Issue 2708: recursive_directory_iterator::recursion_pending() is incorrectly specified) is one of them, and there are a
couple of upcoming papers about the same part of the standard.

* The clock stuff being added in C++20 has already been discussed here.

We can:

1) Not ship std::filesystem, shipping only std::experimental::filesystem.
I think that this is a disservice to our users; because people are asking
for std::filesystem, and other vendors are providing it.
Note: experimental::filesystem is *different* from std::filesystem, and
they're only going to diverge further. In an ideal world, we would have two
implementations; one for experimental::filesystem, and the other for
std::filesystem, and they would behave differently (each according to their
specification)

2) Ship std::filesystem as it is, as part of libc++.dylib.
I don't think that this is a viable option, given that we are pretty sure
(certain) that ABI changes are coming down the pike.

3) We can ship std::filesystem as a static library; marked as "not ABI
stable"
We can put it into the libc++ dylib once we're confident that we can
provide a stable ABI.

Note: libstdc++ has done exactly this.
See

for a discussion of this approach.

People can use std::filesystem, and include the object code it in their
executables, and when the ABI changes, they will be affected at build time,
not at run time. The downside is that they will have to re-build to get bug
fixes.

-- Marshall

P.S. I admit that this is not the best of all possible worlds.

Right. I'm not a fan of that approach.
     <experimental/optional> matched what was in the library fundamental TS.
     <optional> matched what was in the IS.

-- Marshall

After talking to Marshall and Eric, I believe 3) is OK and I take back my objection to shipping a non-experimental filesystem in LLVM 7. The main reasons are:

  • We’re forcing users to link manually with -lc++fs. This forces them to read the documentation, which contains a fat warning about ABI stability.
  • This is what GCC and MS have done — it’s easier if we stay aligned with them.
  • Since we’re shipping c++fs as a static library, the only potential problem is that users will use ABI-unstable types in their own ABIs. We’re not going to break the symbols exported from libc++.dylib at all.
  • Filesystem is in an inline namespace __fs, so we can decide to bump that inline namespace if we break the ABI. This will cause users that may have leaked filesystem types in their ABIs to get clear link errors when we change the ABI.

So I think 3), which is the status quo, is fine.

Louis

I’ve missed the discussions on file_time_type, however I thought I should throw in my opinion here before it is too late to do anything about it.

I believe it is a mistake to model file_time_type with 128 bits. It would be acceptable if this was absolutely necessary to get the job done, but it isn’t. The 16 byte integer is unnecessarily expensive to get the job done.

file_time_type does not need to model the full range and precision of timespec (which on 64 bit platforms is a 128 bit type). All file_time_type needs to model is the full range and precision of what the underlying file system libraries are capable of producing.

The latest Linux file system is ext4 (ext4 - Wikipedia) and is capable of nanosecond resolution. However its timestamp is only 64 bits. It has a range of approximately [1901-12-14, 2446-05-10]. Modeling ext4 would be a good design decision for libc++. libc++ could also model other file systems (Windows, macOS). All of these are based on 64 bit timestamps.

Here is a file_clock, quickly thrown together, lightly tested, that models ext4:

    #include "date/tz.h"
    #include <ostream>
    #include <istream>

    namespace filesystem
    {

    struct file_clock
    {
        using duration = std::chrono::nanoseconds;
        using rep = duration::rep;
        using period = duration::period;
        using time_point = std::chrono::time_point<file_clock>;
        static constexpr bool is_steady = false;

        static time_point now();

        template<typename Duration>
        static
        std::chrono::time_point<std::chrono::system_clock, Duration>
        to_sys(const std::chrono::time_point<file_clock, Duration>& t) noexcept;

        template<typename Duration>
        static
        std::chrono::time_point<file_clock, Duration>
        from_sys(const std::chrono::time_point<std::chrono::system_clock, Duration>& t) noexcept;

        template<typename Duration>
        static
        std::chrono::time_point<date::local_t, Duration>
        to_local(const std::chrono::time_point<file_clock, Duration>& t) noexcept;

        template<typename Duration>
        static
        std::chrono::time_point<file_clock, Duration>
        from_local(const std::chrono::time_point<date::local_t, Duration>& t) noexcept;

        // private helpers

        static
        timespec
        to_timespec(const time_point& t) noexcept;

        static
        time_point
        from_timespec(const timespec& t) noexcept;
    };

    template <class Duration>
        using file_time = std::chrono::time_point<file_clock, Duration>;

    using file_time_type = file_clock::time_point;

    template <class Duration>
    inline
    std::chrono::time_point<std::chrono::system_clock, Duration>
    file_clock::to_sys(const std::chrono::time_point<file_clock, Duration>& t) noexcept
    {
        using namespace date;
        return sys_time<Duration>{t.time_since_epoch()} +
                                 (sys_days{2174_y/1/1} - sys_days{1970_y/1/1});
    }

    template <class Duration>
    inline
    std::chrono::time_point<file_clock, Duration>
    file_clock::from_sys(const std::chrono::time_point<std::chrono::system_clock, Duration>& t) noexcept
    {
        using namespace date;
        return file_time<Duration>{t.time_since_epoch()} -
                                  (sys_days{2174_y/1/1} - sys_days{1970_y/1/1});
    }

    template <class Duration>
    inline
    std::chrono::time_point<date::local_t, Duration>
    file_clock::to_local(const std::chrono::time_point<file_clock, Duration>& t) noexcept
    {
        using namespace date;
        return local_time<Duration>{to_sys(t).time_since_epoch()};
    }

    template <class Duration>
    inline
    std::chrono::time_point<file_clock, Duration>
    file_clock::from_local(const std::chrono::time_point<date::local_t, Duration>& t) noexcept
    {
        using namespace date;
        return file_time<Duration>{from_sys(sys_time<Duration>{t.time_since_epoch()})};
    }

    file_clock::time_point
    file_clock::now()
    {
        return from_sys(std::chrono::system_clock::now());
    }

    template <class CharT, class Traits, class Duration>
    std::basic_ostream<CharT, Traits>&
    to_stream(std::basic_ostream<CharT, Traits>& os, const CharT* fmt,
              const file_time<Duration>& t)
    {
        using namespace std::chrono;
        const std::string abbrev("UTC");
        constexpr std::chrono::seconds offset{0};
        using D128 = duration<__int128, typename Duration::period>;
        return date::to_stream(os, fmt, file_clock::to_local(time_point_cast<D128>(t)),
                               &abbrev, &offset);
    }

    template <class Duration, class CharT, class Traits, class Alloc = std::allocator<CharT>>
    std::basic_istream<CharT, Traits>&
    from_stream(std::basic_istream<CharT, Traits>& is, const CharT* fmt,
                file_time<Duration>& tp,
                std::basic_string<CharT, Traits, Alloc>* abbrev = nullptr,
                std::chrono::minutes* offset = nullptr)
    {
        using namespace date;
        using namespace std::chrono;
        using D128 = duration<__int128, typename Duration::period>;
        local_time<D128> lp;
        from_stream(is, fmt, lp, abbrev, offset);
        if (!is.fail())
            tp = file_clock::from_local(lp);
        return is;
    }

    template <class CharT, class Traits, class Duration>
    std::basic_ostream<CharT, Traits>&
    operator<<(std::basic_ostream<CharT, Traits>& os, const file_time<Duration>& t)
    {
        const CharT fmt = {'%', 'F', ' ', '%', 'T', CharT{}};
        return to_stream(os, fmt, t);
    }

    inline
    timespec
    file_clock::to_timespec(const time_point& t) noexcept
    {
        using namespace date;
        using namespace std::chrono;
        auto tp = to_sys(time_point_cast<std::chrono::duration<__int128, std::nano>>(t));
        auto s = floor<seconds>(tp);
        timespec ts;
        ts.tv_sec = static_cast<decltype(ts.tv_sec)>(s.time_since_epoch().count());
        ts.tv_nsec = static_cast<decltype(ts.tv_nsec)>((tp - s).count());
        return ts;
    }

    inline
    file_clock::time_point
    file_clock::from_timespec(const timespec& t) noexcept
    {
        using namespace date;
        using namespace std::chrono;
        auto d = std::chrono::duration<__int128>{t.tv_sec} + nanoseconds{t.tv_nsec};
        return time_point_cast<duration>(from_sys(sys_time<decltype(d)>{d}));
    }

    } // namespace filesystem

    #include <iostream>
    #include <sstream>

    int
    main()
    {
        using namespace std;
        using namespace date;
        std::cout << filesystem::file_clock::time_point::min() << '\n';
        std::cout << filesystem::file_clock::now() << '\n';
        std::cout << filesystem::file_clock::time_point::max() << '\n';
        std::istringstream in{"2466-04-11 23:47:16.854775807"};
        filesystem::file_clock::time_point tp;
        in >> date::parse("%F %T", tp);
        cout << tp << '\n';
        in.clear();
        in.str("1881-09-22 00:12:43.145224192");
        in >> date::parse("%F %T", tp);
        cout << tp << '\n';
        timespec ts = {15661036036, 854775807}; // or {-2785708037, 145224192}
        tp = filesystem::file_clock::from_timespec(ts);
        cout << tp << '\n';
        ts = filesystem::file_clock::to_timespec(tp);
        cout << "{" << ts.tv_sec << ", " << ts.tv_nsec << "}\n";
        using s32 = chrono::duration<int>;
        using ns64 = chrono::duration<long, nano>;
        using uns64 = chrono::duration<unsigned long, nano>;
        using ns128 = chrono::duration<__int128, nano>;
        cout << date::sys_time<s32>::min() + ns128{uns64::max()} << '\n';
        cout << date::sys_time<s32>::min() + ns128{ns64::max()} + 1ns << '\n';
    }

It is a 64bit timestamp with nanosecond resolution that is capable of representing a superset of ext4 (about +/- 20 years on either side of the limits of ext4). It _does_ internally use __int128 for a few intermediate computations such as converting to/from a timespec. This allows it avoid overflow out near min()/max(). However, the most common operations users will encounter are simply the arithmetic involving time_point and duration, and these are strictly 64 bit (and already provided by chrono).

This is what I advise for the libc++ filesystem library. I have also sent this code to the gcc developers. Consider it public domain.

Howard

  • The clock stuff being added in C++20 has already been discussed here.

I’ve missed the discussions on file_time_type, however I thought I should throw in my opinion here before it is too late to do anything about it.

I believe it is a mistake to model file_time_type with 128 bits. It would be acceptable if this was absolutely necessary to get the job done, but it isn’t. The 16 byte integer is unnecessarily expensive to get the job done.

file_time_type does not need to model the full range and precision of timespec (which on 64 bit platforms is a 128 bit type). All file_time_type needs to model is the full range and precision of what the underlying file system libraries are capable of producing.

The latest Linux file system is ext4 (https://en.wikipedia.org/wiki/Ext4) and is capable of nanosecond resolution. However its timestamp is only 64 bits. It has a range of approximately [1901-12-14, 2446-05-10]. Modeling ext4 would be a good design decision for libc++. libc++ could also model other file systems (Windows, macOS). All of these are based on 64 bit timestamps.

It does seem that I overlooked the fact that EXT4 only provides 64 bits of precision despite timespec providing more.

I’ll look into other filesystems to see if any offer more than that. If not, perhaps I was mistaken trying to match the precision and range of timespec.

Part of me is still concerned with the future, and the filesystems which are yet to exist.

Me too. But it is best to target modern systems when targeting future systems adds an unnecessary cost. When future systems come into being, it is likely because future hardware is making those future systems practical.

E.g. nanosecond precision file systems were not produced prior to the widespread adoption of 64 bit hardware. Mainly because they were just too expensive on 32 bit hardware.

In the future, we will have a better shot at dealing with that future. The std::lib we write today will have to evolve, no matter what we do today. Future proof where it is practical to do so, and don’t where it isn’t.

Howard

Part of me is still concerned with the future, and the filesystems which are yet to exist.

Me too. But it is best to target modern systems when targeting future systems adds an unnecessary cost. When future systems come into being, it is likely because future hardware is making those future systems practical.

I’ll have to write benchmarks which demonstrate the actual cost of the 128 operations which users might commonly cause to be performed.

E.g. nanosecond precision file systems were not produced prior to the widespread adoption of 64 bit hardware. Mainly because they were just too expensive on 32 bit hardware.

In the future, we will have a better shot at dealing with that future. The std::lib we write today will have to evolve, no matter what we do today. Future proof where it is practical to do so, and don’t where it isn’t.

As you’re well aware, ABI evolution isn’t quite that simple.

Btrfs stores 64-bit sec, and 32-bit nsec on disk, and you can set and get it fine on linux. I think it’d be a very bad idea to use a return type which must truncate the timestamp info from the filesystem.

Demo (on a btrfs partition):
$ touch -d ‘9000-04-21 09:45:12.123456789’ asdf
$ ls -l --time-style=full-iso
-rw-r–r-- 1 jyknight users 0 9000-04-21 09:45:12.123456789 -0400 asdf

If the efficiency of 128-bit math was a concern, a different API for std::filesystem should’ve been used, with separate sec/nsec fields. But, seems too late for that now.

In fact, I now have the precise opposite concern of Howard: I’m concerned that on 32-bit systems, libc++'s std::filesystem only uses a 64-bit type (“long long”) for its timestamps, and NOT the >=94-bit type which is required for correctness.

(Note that many 32-bit platforms are already using a 64-bit time_t, and others like Linux/i386 are in the midst of planning and executing on that transition – being 32-bit is no excuse for being wrong.)

Since AFAIK no compilers support __int128 on 32-bit platforms, I suppose a custom class will need to be used, like <https://github.com/abseil/abseil-cpp/blob/master/absl/numeric/int128.h>…

A bit of input from a user here. I’ve never seen code handle timestamp aliasing properly until its been painstakingly debugged in the wild. Most developers assume that if a file is newer than another its timestamp will be > than the other. More accurate timestamps don’t solve the root issue, but they dramatically reduce the probability of it getting hit in the wild. In general file system operations are going to be dominated by IO so it seems like in this case maximum correctness should win. Even on 32-bit I can’t imagine the 128 bit math is that slow.

A 128 bit time_point seems like a good way to model those platforms that can mount Btrfs. A 128 bit time_point does not seem like a good way to model those platforms which can only support 64 bit time stamps.

The authors of a std::lib should write non-portable code so that I don’t have to.

Howard

Sure. However, it’s not only btrfs, that’s just one example.

Take macOS again: although the default filesystem (APFS) only stores a 64-bit value containing nanoseconds since Jan 1, 1970 on disk, that’s just one filesystem, not a fundamental restriction. The kernel and userspace file APIs pass times through with the full 64-bit time_t range. (This can be easily demonstrated with a FUSE filesystem.)

IMO, the set of platforms where a 64-bit value is likely to be always-sufficient is probably just Windows – there, the file APIs themselves restrict the number of bits required (representing time as an unsigned 64-bit quantity containing 100-nanosecond intervals since January 1, 1601), rather than just a particular filesystem’s on-disk-storage method.

Just to double check: this means we don't need to do anything for the 7 branch?

Thanks,
Hans