Plan for landing flang in monorepo

Hi llvm-dev

It’s been a few weeks since I last gave an update on F18 and our progress on readying it for inclusion into the monorepo. Last time we discussed this the community challenged us to make the F18 source code look more like an LLVM project and to come up with a plan and schedule for completing this work (http://lists.llvm.org/pipermail/llvm-dev/2020-January/137989.html)

The full list of changes that could be made to make F18 more LLVM-like is very long. We’re interested in identifying what the absolute dealbreakers are that block inclusion into the monorepo and which changes would be acceptable to make after inclusion to the monorepo. We’ve come up with strawman lists for each and would like to propose the following plan of action:

  1. We have captured our strawman proposal for all the changes that need to happen to F18 to make it ready for inclusion into the monorepo on a github project board: https://github.com/orgs/flang-compiler/projects/8 (also repeated at the end of this mail.)
  2. We are working through this list and we believe that we can complete this work in time for a new upstreaming date of 16th March.
  3. We have captured further work that we plan to complete on F18 after merging to the monorepo https://github.com/orgs/flang-compiler/projects/10 (also repeated below)
  4. We believe that we can complete this work before the LLVM11 branching date in June.
  5. After this date, we’ll keep improving the code as we go along and not on any specific timescale.

We’d really appreciate feedback on the two lists of changes, specifically: are these lists complete? Is everyone satisfied that with all the items on https://github.com/orgs/flang-compiler/projects/8, we’d be happy to accept F18 into the monorepo? Are there any further changes that would need to be made to F18 for this to happen?

Thanks
Rich

More info on the lists:

Pre-merge list: https://github.com/orgs/flang-compiler/projects/8

The status today is that many of the items on the pre-merge list are well underway or complete.

  1. Integrate into the monorepo CMake
  2. This will be as an optional project, and default to not building.
  3. This also adds Doxygen infrastructure so we can start to improve interface documentation and continue post-merge.
  4. F18 changes to make it more LLVM-like in code style
  5. Rationalise headers to put public headers in /include and not /lib
  6. Examine F18’s clang-format file and minimise deviations to the LLVM clang-format
  7. Rename all .cc files to .cpp
  8. Capitalize the module directory names in /lib and /include (e.g. /lib/Parser)
  9. Increase use of LLVM APIs and utilities in F18
  10. Switch F18 custom File handling to LLVM’s File handling (helps with non-POSIX platform support)
  11. Change uses of C++ standard stream IO library to LLVM’s equivalent library
  12. Audit use of std::list and consider migrating to a suitable alternative in LLVM’s API
  13. Use llvm_unreachable with an error message for unreachable cases
  14. Convert the regression test suite to using lit rather than ctest
  15. Porting off the custom scripts to FileCheck will continue after this but we think it should not gate inclusion to the monorepo.
  16. Ensure that F18 builds with the same compilers as the rest of the monorepo
  17. One caveat is that we can only support C++17 compilers
  18. We propose to defer Windows support until after we merge
  19. We will specifically also check with the latest LLVM 10 rc

Post-merge list: https://github.com/orgs/flang-compiler/projects/10

This is the work that will happen right away after merging to the monorepo

  1. F18 changes to make it more LLVM-like in code style - We will perform a one-off exercise where we audit the code to find these instances and bring them in line. We’ll look at:
  2. Braces on all single-line if statements
  3. Uses of else-after-return.
  4. Increase use of LLVM APIs and utilities in F18 - We’ll audit the uses of these datatypes and try to move to a suitable LLVM alternative
  5. std::string/std::string_view
  6. std::vector
  7. std::set
  8. std::map
  9. Further work on porting the test suite to make it more LLVM-like
  10. Port lit tests to FileCheck
  11. Port unit tests to gtest
  12. Implement equivalent to clang -verify and port tests to that
  13. Support Windows
  14. Porting to LLVM file I/O is the main blocker - included in the post-merge worklist - but there will be more to do after this.
  15. Isuru Fernando is going to lead this effort
  16. Set up official builders
  17. Arm will handle bots for AArch64
  18. Nvidia will handle X86
  19. Tarique Islam at IBM will set up a builder for Power: http://lists.llvm.org/pipermail/flang-dev/2020-February/000232.html
  20. Any further help from community bot maintainers to cover all the platforms and compilers would be greatly appreciated.

One specific ask in the last round of feedback was on sharing lib/common/Although we see the benefit of doing this exercise, we feel it is a bit too early to start. One design principle we wish to stick to is for the Fortran runtime and compiler align on their implementations. For the specific example of https://github.com/flang-compiler/f18/blob/master/include/flang/common/bit-population-count.h we’d want the compiler and runtime POPCNT intrinsic to align on implementation. The F18 runtime is still a work in progress. We need to decide on how or if this could share code with LLVM libraries and then we can revisit the implementations in include/flang/common.

Future work
After all the above is done, we will continue to bring the code more in line with LLVM style and API usage by fixing things as we find them during development and enforce the new style through code review. A few specific areas that have been mentioned before that we will tackle in this way are:

  1. Add Doxygen style comments to interfaces
  2. Classes, files, names, etc. where a more LLVM-standard naming can be used.
  3. Refactor code to use early exits when suitable
  4. Audit functions in include/flang/common and port to LLVM equivalents (e.g. builtin_popcount)

No direct involvement in this, but just want to comment that the evidence of serious interest and willingness to make widespread changes to comply with community wishes is good to see here. Personally, that greatly increases my confidence that the integration will be a non issue long term, whatever the short term schedule ends up looking like. Â

Philip

I agree, this is really great to see. Thank you all,

-Chris

Hi Richard,

Thanks for the work and mail with a plan. Other than the two comments below I think this plan is quite workable and am in support.

I think the only requests I’d make is:

  • to move the std::string/string_view/StringRef changes to pre-merge unless you’re going to have someone dedicated to handling them post-merge (rather than “time permits”). The C vs C++ ism here is fairly strong and I’d like to get the C-style string handling out fairly quickly. In my personal priority list I’d put this above the std:list migration.

  • large scale renames: For naming issues I think you can automate a lot of it via clang-tidy ahead of time if you wanted to go down that path. I think it could turn it from a fairly arduous task to one that’s a little easier.

Thanks!

-eric

Can you elaborate on this?

  • to move the std::string/string_view/StringRef changes to pre-merge unless you’re going to have someone dedicated to handling them post-merge (rather than “time permits”). The C vs C++ ism here is fairly strong and I’d like to get the C-style string handling out fairly quickly.

I understood this item to be looking into replacing uses of std::string with a more appropriate LLVM data structure where there is one. What is the “C-style string handling” part of it?

Tim

Last I looked there’s a lot of char * based string manipulation in f18. I’d like that moved to std::string/string_view/StringRef uses.

-eric

Hi Eric,

Old flang certainly uses C-style strings but f18 uses std::string with few exceptions. Most of the instances in f18 of “char *” aren’t really strings in the C sense – they’re not null terminated and are really just pointers into raw or cooked source files/streams. I can’t think of an instance where the compiler dynamically allocates an array of characters and uses it as a C string.

  • Steve

Hi Steve, Eric

I believe (please correct me if I’m wrong!) that the only places that C-style string handling and manipulation currently happens is in the functions that interact with the file system and perform IO, as this is currently mostly written with posix IO calls. I am currently working actively on changing these cases to use LLVM’s file IO functions and/or llvm::MemoryBuffer, which is certainly necessary to happen before merging. As part of this, I will transition the string handling to use C++-style strings and manipulation. I’ve already done this for the module writer in this PR for example: https://github.com/flang-compiler/f18/pull/993

Regardless of whether I’m right about that being the only place it happens, I think we should separate out two things here: replacing any C-style string handling with more modern C++ string handling, and replacing uses of std::string/std::string_view with llvm::SmallString/llvm::StringRef where appropriate. The former certainly should happen before the merge (again, regardless of whether it happens in more places than IO). What we meant in the list that Richard sent was that the latter is more open-ended and should happen after the merge.

I hope that helps to clarify what we were talking about.

David Truby

Hi David,

This matches what I was seeing, thanks :slight_smile:

-eric