(Strawman) Costed plan for LLVM-ification of F18

My comments and questions are below in line.

Tim

Hi list

As discussed on previous threads, we need to propose a plan for making F18 more LLVM-like in style and use of LLVM APIs and facilities. David is working on getting the permissions to put this into a github project in LLVM so we can collaborate . In order to make progress ahead of that happening, I’d like to have a go on email.

For each area I would like to capture a list of work items we’ll commit to doing before merge to monorepo. I want us to be happy we can achieve these in a short time period so we can propose a new merging date. I’d also like to create a list of items we’ll commit to fixing after merging to the monorepo and a timeline for getting that done.

Note that I’ll only cover API and coding style work items in this thread. The other initiatives we can discuss in other threads. I’ll just say that we’ll propose cmake changes and porting test suite to lit in the pre-merge stuff and setting up build bots in the post-merge stuff.

When he has access, David can create two projects one for pre-merge and one for post-merge and we’ll put everything in there.

Please consider the below a strawman. All dates can change, all list items can change position, new ones can be added, items can be deleted, etc.

What do people think about these lists? Do we think those two dates are do-able?

Ta

Rich

Proposal:

We will make the following style changes before merging to the monorepo

F18 changes to make it more LLVM-like in code style

  1. Rationalise headers to put public headers in /include and not /lib
  1. Unify F18’s clang-format file to match LLVMs

We should discuss what should be in .clang-format with an eye to make it more like LLVM’s.

  1. Rename all .cc files to .cpp
  1. Switch module names to starting with capital letters

What does “module” mean in this context?

Increase use of LLVM APIs and utilities in F18

a. Switch F18 custom File handling to LLVM’s File handling (helps with non-POSIX platform support)

b. Change uses of to LLVM’s streams

F18 doesn’t use .

c. Migrate use of std::list to a suitable alternative in LLVM’s API

I would say: replace std::list with a better data structure where appropriate.

d. Use llvm_unreachable with an error message for unreachable cases

What does llvm_unreachable do for us?

e. Use llvm::Error instead of error codes if and when error codes are used.

I don’t know of any use of error codes.

We would like to aim for a merge date of Monday 24th Feb to merge to the monorepo with all of the above changes committed to F18 master.

We then propose to make the following changes after merging to the monorepo.

F18 changes to make it more LLVM-like in code style

We will commit to a perform a one-off exercise where we automatically audit the code to find these instances and bring them in line.

  1. Eliminate braces from all single-line if statements

This seems like a really bad idea and would like to see a reason for it.

  1. Eliminate all uses of else-after-return

Doing this blindly is also a bad idea.

  1. Add doxygen infrastructure so we can generate docs

Increase use of LLVM APIs and utilities in F18

a. std::string → StringRef where appropriate

These aren’t similar data structures. std::string owns the string data and StringRef does not. std::string_view is now a standard class for the latter case.

b. std::vector → llvm::SmallVector where appropriate

c. std::set → llvm::SmallSet/llvm::StringSet/llvm::DenseSet where appropriate

d. std::map → llvm::StringMap/llvm::DenseMap where appropriate

As long as “where appropriate” means there is tangible benefit to changing from a standard type to a non-standard one. Not just change for the sake of change.

e. Audit functions in include/flang/common and port to LLVM equivalents (e.g. builtin_popcount)

Assuming we hit the above merge date, we think we can commit to making these changes before the LLVM11 branch is taken in June.

After that date, we will continue to make progress towards LLVM style and API usage by fixing things as we find them during development and enforce the new style through code review.

A few specific areas that have been mentioned before that we will tackle in this way are:

  • Add doxygen style comments and file comments
  • Find more expressive names for classes, files, etc.

Names are already supposed to be expressive. Improvements are always welcome.

  • Refactor code to use early exits

Where it improves the code, not blindly. As with improving names, there is no reason to wait for any specific date to make improvements to the code.

Hi Tim,

I had some input into putting this list together so I’ve added some clarifications to some things inline

Thanks
David Truby

My comments and questions are below in line.

Tim

Hi list

As discussed on previous threads, we need to propose a plan for making F18 more LLVM-like in style and use of LLVM APIs and facilities. David is working on getting the permissions to put this into a github project in LLVM so we can collaborate . In order to make progress ahead of that happening, I'd like to have a go on email.

For each area I would like to capture a list of work items we'll commit to doing before merge to monorepo. I want us to be happy we can achieve these in a short time period so we can propose a new merging date. I'd also like to create a list of items we'll commit to fixing after merging to the monorepo and a timeline for getting that done.

Note that I'll only cover API and coding style work items in this thread. The other initiatives we can discuss in other threads. I'll just say that we'll propose cmake changes and porting test suite to lit in the pre-merge stuff and setting up build bots in the post-merge stuff.

When he has access, David can create two projects one for pre-merge and one for post-merge and we'll put everything in there.

Please consider the below a strawman. All dates can change, all list items can change position, new ones can be added, items can be deleted, etc.

What do people think about these lists? Do we think those two dates are do-able?

Ta
Rich

Proposal:
We will make the following style changes before merging to the monorepo

F18 changes to make it more LLVM-like in code style
1. Rationalise headers to put public headers in /include and not /lib
2. Unify F18's clang-format file to match LLVMs

We should discuss what should be in .clang-format with an eye to make it more like LLVM's.

3. Rename all .cc files to .cpp
4. Switch module names to starting with capital letters

What does "module" mean in this context?
Module in the context of F18 means parser, semantics, evaluate etc.

Increase use of LLVM APIs and utilities in F18
a. Switch F18 custom File handling to LLVM's File handling (helps with non-POSIX platform support)
b. Change uses of <iostream> to LLVM's streams

F18 doesn't use <iostream>.
I think we are (perhaps ambiguously) using <iostream> here to refer to the entire standard stream IO library. E.g. I do see a lot of uses of <ostream> in F18. LLVM has its own replacement for these for various reasons and highly discourages the use of the standard ones.

c. Migrate use of std::list to a suitable alternative in LLVM's API

I would say: replace std::list with a better data structure where appropriate.

d. Use llvm_unreachable with an error message for unreachable cases

What does llvm_unreachable do for us?
llvm_unreachable effectively behaves like an abort in debug builds, however in release builds it adds optimiser hints abstracted across the various supported compilers that the code in question isn’t reachable (so no need to generate a branch there or whatever).

e. Use llvm::Error instead of error codes if and when error codes are used.

I don't know of any use of error codes.
What error handling facility does F18 currently use if error codes and exceptions aren’t used? I am familiar with a few places that optionals are returned, we could consider looking at changing these to using llvm::Expected to provide more context about what has gone wrong.

We would like to aim for a merge date of Monday 24th Feb to merge to the monorepo with all of the above changes committed to F18 master.

We then propose to make the following changes after merging to the monorepo.

F18 changes to make it more LLVM-like in code style
We will commit to a perform a one-off exercise where we automatically audit the code to find these instances and bring them in line.
1. Eliminate braces from all single-line if statements

This seems like a really bad idea and would like to see a reason for it.

2. Eliminate all uses of else-after-return

Doing this blindly is also a bad idea.
The LLVM Coding Standards have a strong preference for this, with justification given here:
https://llvm.org/docs/CodingStandards.html#don-t-use-else-after-a-return

3. Add doxygen infrastructure so we can generate docs

Increase use of LLVM APIs and utilities in F18
a. std::string → StringRef where appropriate

These aren't similar data structures. std::string owns the string data and StringRef does not. std::string_view is now a standard class for the latter case.
LLVM still currently prefers the use of StringRef over std::string_view. I am not sure if there are plans to change this when LLVM switches the majority of its code to C++17. But what we meant here was to make sure that std::string is not being used as a parameter to functions etc, these should be using StringRef (or possibly Twine where appropriate).

b. std::vector → llvm::SmallVector where appropriate
c. std::set → llvm::SmallSet/llvm::StringSet/llvm::DenseSet where appropriate
d. std::map → llvm::StringMap/llvm::DenseMap where appropriate

As long as "where appropriate" means there is tangible benefit to changing from a standard type to a non-standard one. Not just change for the sake of change.
This is indeed what we meant by where appropriate. The LLVM guidelines are perfectly happy for the use of standard data structures in general, however the LLVM specific ones are much better optimised for certain use cases. For example, the DenseMap class in particular is much more efficient than std::map when the key and value are both small objects.

e. Audit functions in include/flang/common and port to LLVM equivalents (e.g. builtin_popcount)

Assuming we hit the above merge date, we think we can commit to making these changes before the LLVM11 branch is taken in June.

After that date, we will continue to make progress towards LLVM style and API usage by fixing things as we find them during development and enforce the new style through code review.

A few specific areas that have been mentioned before that we will tackle in this way are:
   - Add doxygen style comments and file comments
   - Find more expressive names for classes, files, etc.

Names are already supposed to be expressive. Improvements are always welcome.

   - Refactor code to use early exits

Where it improves the code, not blindly. As with improving names, there is no reason to wait for any specific date to make improvements to the code.
Again, early exists are something the LLVM Coding Standards have a strong preference for, with justification given here:
https://llvm.org/docs/CodingStandards.html#early-exits

Thanks for the reply.

Just picking out a few points:

Module in the context of F18 means parser, semantics, evaluate etc.

Those are abstract things. What are the concrete things you want to capitalize? Namespaces? Directory names?

But what we meant here was to make sure that std::string is not being used as a parameter to functions etc, these should be using StringRef (or possibly Twine where appropriate).

What’s the benefit of using StringRef over std::string for parameters?

Regarding “always use early exit”:

The LLVM Coding Standards have a strong preference for this, with justification given here:

https://llvm.org/docs/CodingStandards.html#don-t-use-else-after-a-return

That’s a pretty thin justification. It gives an example where else should not be used and generalizes it to all cases. When you encounter exceptional conditions early exit it certainly appropriate. But in non-exceptional cases it does not always correctly express intent.

Here’s an example from resolve-names.cpp:

if (type.IsAssumedType()) {

return currScope().MakeTypeStarType();

} else if (type.IsUnlimitedPolymorphic()) {

return currScope().MakeClassStarType();

} else {

return currScope().MakeDerivedType(…);

}

You could write this as:

if (type.IsAssumedType()) {

return currScope().MakeTypeStarType();

}

if (type.IsUnlimitedPolymorphic()) {

return currScope().MakeClassStarType();

}

return currScope().MakeDerivedType(…);

That makes it look like MakeDerivedType is the “normal” case and the other two are exceptions.

But that’s wrong – they are all normal and it’s just a return based on a three-way condition.

Tim

Hi,

Thanks for the reply.

Just picking out a few points:

Module in the context of F18 means parser, semantics, evaluate etc.

Those are abstract things. What are the concrete things you want to capitalize? Namespaces? Directory names?
If we were to match LLVM and Clang, the directory names would be capitalised and the namespaces would remain lower case.

But what we meant here was to make sure that std::string is not being used as a parameter to functions etc, these should be using StringRef (or possibly Twine where appropriate).

What's the benefit of using StringRef over std::string for parameters?
If the function being passed the string doesn’t need ownership of it, you avoid an unnecessary copy. It also allows passing of a string literal or C-style character array without requiring an allocation and a copy of the array.
Twine is designed to take arguments that are usually concatenations of strings, and it makes that concatenation very efficient.

Regarding "always use early exit":

The LLVM Coding Standards have a strong preference for this, with justification given here:
https://llvm.org/docs/CodingStandards.html#don-t-use-else-after-a-return

That's a pretty thin justification. It gives an example where else should not be used and generalizes it to all cases. When you encounter exceptional conditions early exit it certainly appropriate. But in non-exceptional cases it does not always correctly express intent.

Here's an example from resolve-names.cpp:
    if (type.IsAssumedType()) {
      return currScope().MakeTypeStarType();
    } else if (type.IsUnlimitedPolymorphic()) {
      return currScope().MakeClassStarType();
    } else {
      return currScope().MakeDerivedType(...);
    }

You could write this as:
    if (type.IsAssumedType()) {
      return currScope().MakeTypeStarType();
    }
    if (type.IsUnlimitedPolymorphic()) {
      return currScope().MakeClassStarType();
    }
    return currScope().MakeDerivedType(...);

That makes it look like MakeDerivedType is the "normal" case and the other two are exceptions.
But that's wrong -- they are all normal and it's just a return based on a three-way condition.

Tim

Thanks for the reply.

Just picking out a few points:

Module in the context of F18 means parser, semantics, evaluate etc.

Those are abstract things. What are the concrete things you want to capitalize? Namespaces? Directory names?

But what we meant here was to make sure that std::string is not being used as a parameter to functions etc, these should be using StringRef (or possibly Twine where appropriate).

What’s the benefit of using StringRef over std::string for parameters?

StringRef is the moral equivalent of std::string_view: it expresses that you just intend to read a sequence of characters without anything else.
Using it for function parameters over a « const string & » allow to decouple the client of the API from the concrete type they hold.
For example at call site you may hold the data in another concrete type than std::string (like llvm::SmallString for example), or you may want to pass a substring to the function. Using std::string requires a copy (and potential heap alloc).

Twine goes a step further and represent the concaténation of multiple things. It is convenient if your API may or may not use the StringRef: it will materialize the concatenation only on demand. It may also be useful to materialize the concatenation directly in the final destination and save a copy.

Many of these data structure are described here
https://www.llvm.org/docs/ProgrammersManual.html#passing-strings-the-stringref-and-twine-classes

Regarding “always use early exit”:

The LLVM Coding Standards have a strong preference for this, with justification given here:

https://llvm.org/docs/CodingStandards.html#don-t-use-else-after-a-return

That’s a pretty thin justification. It gives an example where else should not be used and generalizes it to all cases. When you encounter exceptional conditions early exit it certainly appropriate. But in non-exceptional cases it does not always correctly express intent.

Here’s an example from resolve-names.cpp:

if (type.IsAssumedType()) {

return currScope().MakeTypeStarType();

} else if (type.IsUnlimitedPolymorphic()) {

return currScope().MakeClassStarType();

} else {

return currScope().MakeDerivedType(…);

}

You could write this as:

if (type.IsAssumedType()) {

return currScope().MakeTypeStarType();

}

if (type.IsUnlimitedPolymorphic()) {

return currScope().MakeClassStarType();

}

return currScope().MakeDerivedType(…);

That makes it look like MakeDerivedType is the “normal” case and the other two are exceptions.

But that’s wrong – they are all normal and it’s just a return based on a three-way condition.

As most coding style (like eliding trivial braces), this is always quite subjective and mostly a matter of habit.
When all the codebase is following the same convention you quickly get use to it.

Best,

Timothy Keith via flang-dev <flang-dev@lists.llvm.org> writes:

1. Eliminate braces from all single-line if statements

This seems like a really bad idea and would like to see a reason for it.

FWIW this is one of the few things in LLVM's coding standards that I
strongly disagree with, but for whatever reason others strongly support
it. I just live with it. I would enthusiastically support an RFC to
llvm-dev to change it.

2. Eliminate all uses of else-after-return

Doing this blindly is also a bad idea.

I don't think anyone is saying to do anything "blindly."

b. std::vector → llvm::SmallVector where appropriate
c. std::set → llvm::SmallSet/llvm::StringSet/llvm::DenseSet where appropriate
d. std::map → llvm::StringMap/llvm::DenseMap where appropriate

As long as "where appropriate" means there is tangible benefit to
changing from a standard type to a non-standard one. Not just change
for the sake of change.

IME, LLVM's data structures are much faster/less resource intensive than
the standard library's. That's natural as the standard library is
general-purpose code while LLVM's is tuned for specific use-cases. In
my work I always use LLVM's data structures unless there's a good reason
not to. f18 should follow the LLVM convention here.

                     -David

Hi all

Many thanks for the quick replies so far. Looking at the comments so far I clearly should have been more explicit in some of my items – sorry about that.

Please find below an updated strawman for review. I notice no discussion on the dates I proposed. Are we happy to push for those?

First a few comments of my own:

As long as "where appropriate" means there is tangible benefit to changing from a standard type to a non-standard one. Not just change for the sake of change.

I wonder if we are thinking about this the wrong way around. Our aim to become an LLVM project and we hope that will happen soon. When it does, and we are sitting in the monorepo as flang then all uses of standard C++ types instead of equivalent LLVM types without good technical grounds would look strange or to paraphrase, seem like "differences for the sake of difference".

We need to be thinking of ourselves as an LLVM project. If we adopt that mindset, then rather than looking for a reason to adopt the LLVM types we should really be checking if there is a good reason _not_ to adopt them. I don’t doubt there will be some occasions where we have good justification to not adopt LLVM types and stay with the standard type and we certainly will back up our arguments not to chane. But I don’t think the default position should be not switch to LLVM types.

As with improving names, there is no reason to wait for any specific date to make improvements to the code.

I fully agree with that and we should continue improving F18 regardless of upstreaming. My intention with the statement in my mail is to make explicit to the LLVM community that we don’t intend to perform a line-by-line audit and make every change we could possibly do. We’re being asked to commit to a timeline and I want to make sure that the list of things we need to complete in that timeline is not open ended. I'd like to make explicit what we are not planning to do by some particular date and that we'll do later over time instead.

No else-after-return ...
Here's an example from resolve-names.cpp:

I see where you are coming from here and we can discuss these sorts of details in depth on code review for the specific examples. In cases like these, perhaps there is some refactoring we could do to capture the three way condition in the type class such that the if/elif/else block is not needed at all, or in such a way that the rewrite to if/if/return seems more natural, or perhaps with enough comments it will look ok anyway. All we need to do at this stage is commit to looking at it seriously and state when we'll do that by.

Ta
Rich

Updated Strawman

Hi Rich,

Apologies for my delayed response.

Many thanks for the quick replies so far. Looking at the comments so
far I clearly should have been more explicit in some of my items –
sorry about that.

Please find below an updated strawman for review. I notice no
discussion on the dates I proposed. Are we happy to push for those?

First a few comments of my own:

> As long as "where appropriate" means there is tangible benefit to
> changing from a standard type to a non-standard one. Not just change
> for the sake of change.

I wonder if we are thinking about this the wrong way around. Our aim
to become an LLVM project and we hope that will happen soon. When it
does, and we are sitting in the monorepo as flang then all uses of
standard C++ types instead of equivalent LLVM types without good
technical grounds would look strange or to paraphrase, seem like
"differences for the sake of difference".

We need to be thinking of ourselves as an LLVM project. If we adopt
that mindset, then rather than looking for a reason to adopt the LLVM
types we should really be checking if there is a good reason _not_ to
adopt them. I don’t doubt there will be some occasions where we have
good justification to not adopt LLVM types and stay with the standard
type and we certainly will back up our arguments not to chane. But I
don’t think the default position should be not switch to LLVM types.

+1

I agree.

If we (want to) argue F18 is developed as an LLVM subproject and it is
supposed to be(come) an LLVM subproject, we have to deal with what that
means. Non-F18 folks raised a bunch of concerns during the first merge
attempt for reasons that a decent amount of LLVM developers considered
valid. Addressing these concerns is part of being in this community. I
am glad we started resolving them and it seems a lot of pain points are
almost gone.

No-one in the community is expected to blindly follow things like coding
rules if there is a good reason *not to*, however, the rules are there
for a reason too, even if it is just to settle disputes on how
semantically equivalent code should look like. If we feel they need to
change, we can propose such a change as we propose other changes. If we
feel there is a good reason to make a more widespread exception, maybe
we can do that too. What we cannot, or at least should not do, is to
blindly reject the rules with the argument that the LLVM way needs to
prove it is superior before we adopt it. This is a slippery slope that
starts here at merge time and will continue to cause discussions over
discussions every time a change is made that violates the LLVM rules or
the "F18" rules.

We need to remember, this is a *large community (project)* that would
not work without rules, even if we don't agree with all of them.

> As with improving names, there is no reason to wait for any specific
> date to make improvements to the code.

I fully agree with that and we should continue improving F18
regardless of upstreaming. My intention with the statement in my mail
is to make explicit to the LLVM community that we don’t intend to
perform a line-by-line audit and make every change we could possibly
do. We’re being asked to commit to a timeline and I want to make sure
that the list of things we need to complete in that timeline is not
open ended. I'd like to make explicit what we are not planning to do
by some particular date and that we'll do later over time instead.

> No else-after-return ...
> Here's an example from resolve-names.cpp:

I see where you are coming from here and we can discuss these sorts of
details in depth on code review for the specific examples. In cases
like these, perhaps there is some refactoring we could do to capture
the three way condition in the type class such that the if/elif/else
block is not needed at all, or in such a way that the rewrite to
if/if/return seems more natural, or perhaps with enough comments it
will look ok anyway. All we need to do at this stage is commit to
looking at it seriously and state when we'll do that by.

At the end of the day these things are very much questions of style.

If you want to emphasize no case is special you could even put all in
conditionals and end with `llvm_unreachable` instead. I guess, because
LLVM strongly encourages early exists patterns like that, they are very
common to begin with and I would not necessarily think something is the
default. There is also always the comment option.

Ta
Rich

Updated Strawman

Proposal:
We will make the following style changes before merging to the monorepo

F18 changes to make it more LLVM-like in code style
  1. Rationalise headers to put public headers in /include and not /lib
  2. Examine F18's clang-format file and ensure we are comfortable with any deviations to the LLVM style
  3. Rename all .cc files to .cpp
  4. Capitalize the module directory names in /lib and /include (e.g. /lib/Parser)

1, 3, 4 seem to be easy to do. 2 might cause some discussion but
minimizing the difference would not only align us closer but also
benefit people that want to start working on Flang.

Increase use of LLVM APIs and utilities in F18
  a. Switch F18 custom File handling to LLVM's File handling (helps with non-POSIX platform support)
  b. Change uses of C++ standard stream IO library to LLVM's equivalent library
  c. Migrate use of std::list to a suitable alternative in LLVM's API
  d. Use llvm_unreachable with an error message for unreachable cases

LLVM has custom list data structures as well if we don't want to switch
to SmallVector or std::vector (if we really expect a lot of elements).

We would like to aim for a merge date of Monday 24th Feb to merge to
the monorepo with all of the above changes committed to F18 master.

That would be great!

We then propose to make the following changes after merging to the monorepo.

F18 changes to make it more LLVM-like in code style
We will perform a one-off exercise where we audit the code to find these instances and bring them in line. We'll look at:
  1. Braces on all single-line if statements
  2. Uses of else-after-return.
  3. Add doxygen infrastructure so we can generate docs

I think 3 is in the short term one of the most important changes we need
to focus on. Basically every function/class/method/member that is
modified (post-merge) will need to have proper documentation.

Increase use of LLVM APIs and utilities in F18
  a. std::string/std::string_view → StringRef where appropriate

Same for char* in some places.

  b. std::vector → llvm::SmallVector where appropriate
  c. std::set → llvm::SmallSet/llvm::StringSet/llvm::DenseSet where appropriate
  d. std::map → llvm::StringMap/llvm::DenseMap where appropriate
  e. Audit functions in include/flang/common and port to LLVM equivalents (e.g. builtin_popcount)
  f. Audit use of any error codes and switch to use llvm::Error instead
  
Assuming we hit the above merge date, we think we can commit to making
these changes before the LLVM11 branch is taken in June.
  
After that date, we will continue to make progress towards LLVM style
and API usage by fixing things as we find them during development and
enforce the new style through code review.

Arguably, we should adapt LLVM style though code review now but for sure
post-merge.

A few specific areas that have been mentioned before that we will tackle in this way are:
   - Add doxygen style comments and file comments
   - Classes, files, names, etc. where a more LLVM-standard naming can be used.
   - Refactor code to use early exits

At some point we should also introduce unit tests where direct lit
testing is not possible, e.g., for F18 data structures.

Cheers,
  Johannes

Hi all,

I’ve created a project board and associated GitHub issues to track this list at https://github.com/orgs/flang-compiler/projects/8. Hopefully this will be useful to see what we’re making progress on.
Please let me know if there’s anything that needs adding or fixing on here.

Thanks
David Truby