FW: Performance of std::remove_if in F18 file I/O

Hi Rich,

Most systems, including UNIX derivatives, use a single linefeed character (^J, ASCII LF = 0x0a) as an end-of-line marker in source files. Others (DOS, Windows) use a carriage return / linefeed sequence (0x0d 0x0a). The f18 compiler normalizes source files by removing all of their carriage return characters.

I have extended the benchmark to add another path that runs over source code containing no original carriage return characters. We expect this use case to be the most common outside Windows environments.

I posted the test results. Each set has four runs, each comprising the combination of remove_if and RemoveCarriageReturns with linefeeds and carriage returns.

In summary, RemoveCarriageReturns is faster when there actually are no carriage returns to remove. For cases with carriage returns, RemoveCarriageReturns is faster on x86 Linux, about the same on macOS, and slower on aarch64.

Linux x86-64
https://gist.github.com/sscalpone/11f58f248cc2a984be80a6b0867836ae

macOS
https://gist.github.com/sscalpone/79434dcdfd8e5d8b568a485329562063

aarch64
https://gist.github.com/sscalpone/8aaec89cadb27c7ee580c6f460c955f4

- Steve

Hi all,

I’ve got some results for Windows on an Intel Core i5-9600K; although only with the program Richard attached, not with Steve’s modified version: https://gist.github.com/DavidTruby/aa885134abcab6746ddef962fd90bfc0

The result show that on Windows with msvc std::remove_if is significantly faster in this case (where there are carriage returns). This matches my expectations given how much emphasis Microsoft put on C++ features over C features in their compiler and runtime. When using clang on Windows, all 3 implementations are indistinguishable in performance.

Therefore if we’re likely to want to optimise for the slow case for Windows since this case is more common there, the remove_if implementation would be a better choice. I don’t have any data about what happens if there’s no carraige returns on Windows though, if Steve could share his modified benchmark then I’d be happy to get those results too.

Another thing to consider: In a previous email about this topic (http://lists.llvm.org/pipermail/flang-dev/2020-February/000236.html) Steve mentioned that we may need to extend this to normalising other types of line end characters in future. I believe the std::remove_if implementation is trivial to extend to that, whereas it seems the RemoveCarriageReturns implementation relies on the fact that you’re only looking for one character to normalise? Please correct me if I’m wrong here.

Thanks
David Truby