Flang landing in the monorepo

Hi All,

The flang project (a Fortran compiler) is getting ready to join the
monorepo. We intend to preserve the existing history by rewriting the
existing commits as a linear series of commits on top of llvm-project.

I understand the flang community would like to do this before the LLVM
10 branch in due in mid January, so please speak up soon if you see
anything needing fixing in what I write below.

I've taken into account the discussion raised during the MLIR landing
discussion found at
http://lists.llvm.org/pipermail/llvm-dev/2019-November/136813.html. As
with MLIR, we rewrite the commits so that flang's work all appears to
happen in the flang directory, starting with llvm-project master as it
appears today. The topology of the f18 history was fairly interesting,
which is why I ended up writing a new program to rewrite it rather than
using an existing one.

=== Key links

* Resulting tree of the rewrite:
https://github.com/peterwaller-arm/f18/tree/rewritten-history-v2-llvm-project-merge

* Rewritten history, with flang commits applied on top of llvm-project
master:
https://github.com/peterwaller-arm/f18/commits/rewritten-history-v2-llvm-project-merge

* The history rewriting program is published here:
https://github.com/flang-compiler/f18/pull/854

* Latest mailing list discussion of rewrite on flang-dev:
http://lists.llvm.org/pipermail/flang-dev/2019-December/000122.html

=== Additional considerations

* Existing references to pull request and issue numbers are rewritten so
that they point at the originals as, e.g. flang-compiler/f18#123. This
prevents those patches from generating bogus references to Issues/PRs of
llvm-project if/when those appear in the llvm-project repository.

* Developers using the llvm-project repo, when they pull after this
push, will see 2,700ish commits appear on the tip. These will follow on
as normal commits from wherever master is at the time of the push. The
fetch takes 40s and I see my ".git" directory grow by approximately
90MiB when I simulate this.

* Rewriting and validating the rewritten f18 history is sufficiently
fast that I don't think it will be necessary to pause commits to LLVM.
The script runs in a few seconds. Before this is done though, I think
new commits should no longer be accepted on the original repository.

* You can simulate the experience of the fresh merge with `git remote
add peterwaller-arm https://github.com/peterwaller-arm/f18 && time git
fetch peterwaller-arm rewritten-history-v2-llvm-project-merge`, and then
look at the peterwaller-arm/rewritten-history-v2-llvm-project-merge
branch with git log.

* Remember that you can restrict the "git log" output to what you are
interested in by specifying a directory, e.g. `git log clang/`.

That's all for now. Season's greetings!

- Peter

Hi All,

The flang project (a Fortran compiler) is getting ready to join the
monorepo. We intend to preserve the existing history by rewriting the
existing commits as a linear series of commits on top of llvm-project.

I understand the flang community would like to do this before the LLVM
10 branch in due in mid January, so please speak up soon if you see
anything needing fixing in what I write below.

I've taken into account the discussion raised during the MLIR landing
discussion found at
[llvm-dev] MLIR landing in the monorepo. As
with MLIR, we rewrite the commits so that flang's work all appears to
happen in the flang directory, starting with llvm-project master as it
appears today. The topology of the f18 history was fairly interesting,
which is why I ended up writing a new program to rewrite it rather than
using an existing one.

=== Key links

* Resulting tree of the rewrite:
GitHub - peterwaller-arm/f18 at rewritten-history-v2-llvm-project-merge

* Rewritten history, with flang commits applied on top of llvm-project
master:
Commits · peterwaller-arm/f18 · GitHub

* The history rewriting program is published here:
   Add script to flatten git history for llvm monorepo submission by peterwaller-arm · Pull Request #854 · flang-compiler/f18 · GitHub

* Latest mailing list discussion of rewrite on flang-dev:
[flang-dev] Rewriting f18's history for inclusion in llvm monorepo (third attempt, C rewrite)

=== Additional considerations

* Existing references to pull request and issue numbers are rewritten so
that they point at the originals as, e.g. flang-compiler/f18#123. This
prevents those patches from generating bogus references to Issues/PRs of
llvm-project if/when those appear in the llvm-project repository.

* Developers using the llvm-project repo, when they pull after this
push, will see 2,700ish commits appear on the tip. These will follow on
as normal commits from wherever master is at the time of the push. The
fetch takes 40s and I see my ".git" directory grow by approximately
90MiB when I simulate this.

I think we may want to disable the commit emailer before all these
new commits are pushed. Once you are ready to push, can you pick a
specific time and date for the push and then coordinate with Mike (cc'd)
and myself, so we can avoid spamming the mail server.

Thanks,
Tom

I think it would probably make the most sense to land this as a single merge-commit (from the 2700-commit rewritten history you’ve created) onto llvm-project master, rather than as 2700 individual toplevel commits to master. (Which means: disable the merge-commit prohibition in the github configuration, temporarily, push this commit, and then enable it again).

Hi Peter,

I have a few concerns and questions so far:

  • Supported C++ compilers: It looks like nothing too recent has been used to test? And the flang page still has version 7 listed as the “latest llvm”.
  • “It will be closely aligned with LLVM best practices and written in the style of LLVM and clang”:
  • The directory structure is quite different from clang’s directory structure.
  • IR generation still appears to be text string based?
  • I didn’t see a single reference to ADT or any of llvm’s libraries on a cursory look through the f18 directory (grep -ri ADT and grep -ri llvm)
  • It looks like flang is a C based project and f18 is C++? Looking at the flang directory itself for IR generation is quite confusing and while named C++ in some places is actually just C with few abstractions?

I haven’t done a full reading of the code, this is just a starting point for discussion and review.

Mostly it seems I have a lot of questions and so getting an updated idea of what you expect to merge, what the sources are, and more would be a first start I think. If some of my concerns hold through discussion I can’t see this being merged as-is.

Thanks!

-eric

How about [flang] :slight_smile: The reviewed-on tag in the commit messages has
the issue number so no information is lost.

[flang] Merge pull request flang-compiler#862 from flang...

Original-commit: flang-compiler/f18@8d2b296
Reviewed-on: flang-compiler#862

Hi Eric,

Apologies, I failed to disambiguate clearly, because there are multiple projects named flang. I was referring to the “new” flang, whose repository is currently found at . It will land in the monorepo under a directory called “/flang/”.

f18 has been approved to join, for reference see “[llvm-dev] f18 is accepted as part of LLVM project!”, Chriss Lattner, April 10 2019: .

I would like to emphasize that it will not be turned on by default in the llvm build.

Regards,

  • Peter

Hi,

Adding the pull request number to the individual commits was requested
in https://github.com/flang-compiler/f18/pull/854#discussion_r355516596,
but granted since then we also chose to keep the empty merge commits for
their commit message metadata as well, so maybe it is surplus to
requirements now. I've put your suggestion over on the pull request for
the history rewriting program.
https://github.com/flang-compiler/f18/pull/854#issuecomment-566958262

Regards,

- Peter

I understood that the LLVM project did not accept any merge commit so
far. It seems a shame to make an exception if we don't have to.

Can you elaborate on the perceived benefits of having it as a merge?

It's of course technically straightforward to push either "follow-on
commits" or a merge commit.

I think there could be a benefit to having it as a "follow-on commits",
in that checking out an old flang commit won't have the effect of
deleting the rest of the llvm monorepo. Granted, this may be a slim
benefit since it will be a fairly arbitrary revision of the llvm
monorepo, taken at the point of the initial push.

Hi Tom,

Great suggestion. I've put it on our checklist, which I'm recording on
https://github.com/flang-compiler/f18/issues/876.

Regards,

- Peter

I think it would probably make the most sense to land this as a single
merge-commit (from the 2700-commit rewritten history you’ve created)
onto llvm-project master, rather than as 2700 individual toplevel
commits to master. (Which means: disable the merge-commit prohibition
in the github configuration, temporarily, push this commit, and then
enable it again).

I understood that the LLVM project did not accept any merge commit so
far. It seems a shame to make an exception if we don’t have to

Can you elaborate on the perceived benefits of having it as a merge?

The rule against merge commits is primarily because we want to encourage people to make reasonable separate commits to master which are sensible (reviewable, buildable, etc) in isolation. And, to make it so the history is more easily understandable to humans. It’s not only that we don’t want merge commits – we actually don’t really want people doing merges, in general.

But, here we actually do have a merge, because flang was developed externally. This is an exceptional circumstance (and I’m sure we’ll have a few more). Using a merge commit in this circumstance makes it easier for humans looking at history to understand what’s going on here, rather than harder – because it actually marks the merge as being a merge. That’s the main reason why I think we ought to use a merge commit here.

Additionally (and less importantly), I’m going to presume that many/most of the 2700 commits cannot actually be built. So, having a single merge commit within which the unbuildable sub-commits are contained will also make things better for autobuilders.

It’s of course technically straightforward to push either “follow-on
commits” or a merge commit.

I think there could be a benefit to having it as a “follow-on commits”,
in that checking out an old flang commit won’t have the effect of
deleting the rest of the llvm monorepo. Granted, this may be a slim
benefit since it will be a fairly arbitrary revision of the llvm
monorepo, taken at the point of the initial push.

I’m suggesting to build the history the exact same way you are currently, and then simply merging it onto master with a single merge commit (“git merge --no-ff”), rather than “fast-forwarding” the entire set of commits. So this is not a benefit, it’d be the same either way.

I'm suggesting to build the history the exact same way you are currently, and then simply merging it onto master with a single merge commit ("git merge --no-ff"), rather than "fast-forwarding" the entire set of commits. So this is not a benefit, it'd be the same either way.

Ah, thanks for clarifying. I was imagining merging the 0-parent history. I agree with you.

I think that it would be beneficial for the maintainers of the downstream llvm-project repositories if there was a single merge commit instead of 2700. In my organization, the auto merging infrastructure tests and merges upstream changes one commit at a time, without exceptions, unless there’s a span of commits with a broken build. It’s not cut out to handle 2700 incoming commits, but it can handle a merge commit that merges in 2700 commits just fine, as it’s only merging in the “first-parent” commits from the upstream branch. If the community were to decide to commit 2700 commits directly, we would need to shut down our infrastructure for several hours, and merge things through manually. Additionally, I’m not sure if our CI (http://lab.llvm.org:8080/green/) will be able to handle 2700 commits coming in at the same time, so we’d need to possibly manually hand hold it through as the commits are coming in.

Thanks,
Alex

I think that it would be beneficial for the maintainers of the downstream llvm-project repositories if there was a single merge commit instead of 2700. In my organization, the auto merging infrastructure tests and merges upstream changes one commit at a time, without exceptions, unless there’s a span of commits with a broken build. It’s not cut out to handle 2700 incoming commits, but it can handle a merge commit that merges in 2700 commits just fine, as it’s only merging in the “first-parent” commits from the upstream branch. If the community were to decide to commit 2700 commits directly, we would need to shut down our infrastructure for several hours, and merge things through manually. Additionally, I’m not sure if our CI (http://lab.llvm.org:8080/green/) will be able to handle 2700 commits coming in at the same time, so we’d need to possibly manually hand hold it through as the commits are coming in.

Thanks,
Alex

In order to allow bisecting, we also merge one commit at a time in the CHERI fork of LLVM.
However, we don’t (yet) have the infrastructure to test each individual commit. Therefore, in our case this would just increase the time it takes us to perform the merge (~0-3 seconds per commit, so about 1 hour of CPU time since there shouldn’t be any conflicts) and will generate 2700 additional merge commits in our history.
I would prefer a single merge commit since that is easier for us but I don’t have a strong preference.

Alex

Hi Peter,

Yes, I looked through those sources and a number of my questions around which clang versions have been supported and directory structure. I think the only difference is removing the direct questions about earlier flang, but I still don’t see code generation or uses of llvm libraries that would conform to “written in the style of LLVM and clang”. Can you perhaps point me to where I’m missing these things?

Chris’s earlier acceptance aside I don’t see any evidence of code review as part of that and so I’d expect we’d see more here.

Thanks.

-eric

> I think that it would be beneficial for the maintainers of the downstream
> llvm-project repositories if there was a single merge commit instead of
> 2700. In my organization, the auto merging infrastructure tests and merges
> upstream changes one commit at a time, without exceptions, unless there's a
> span of commits with a broken build. It's not cut out to handle 2700
> incoming commits, but it can handle a merge commit that merges in 2700
> commits just fine, as it's only merging in the "first-parent" commits from
> the upstream branch. If the community were to decide to commit 2700 commits
> directly, we would need to shut down our infrastructure for several hours,
> and merge things through manually. Additionally, I'm not sure if our CI (
> http://lab.llvm.org:8080/green/) will be able to handle 2700 commits
> coming in at the same time, so we'd need to possibly manually hand hold it
> through as the commits are coming in.
>
> Thanks,
> Alex
>

In order to allow bisecting, we also merge one commit at a time in the
CHERI fork of LLVM.
However, we don't (yet) have the infrastructure to test each individual
commit. Therefore, in our case this would just increase the time it takes
us to perform the merge (~0-3 seconds per commit, so about 1 hour of CPU
time since there shouldn't be any conflicts) and will generate 2700
additional merge commits in our history.
I would prefer a single merge commit since that is easier for us but I
don't have a strong preference.

So long as all the commits where in a single batch, we could also work around
this in CHERI-LLVM by merging up to the commit before the batch and then
merging the whole lot as a single merge commit. We'd lose the ability
to bisect within the block, but that would be same as if it landed as a
single commit.

-- Brooks

This is a great summary of both sides of the policy, thanks James. I agree,

-Chris

I take it from this conversation that we should do a --no-ff merge of
the rewritten history.

The final history will look like this:

[llvm project/master] ---------------------> [empty merge commit]
\_ 2,700ish commits _/^

This means that `git log --first-parent` will only show the merge
commit, and not those from the "initial flang" branch.

Currently it looks like I'm the person chosen to execute the rewrite,
merge and push.

So far, I know I need to coordinate with Tom & Mike for the build
emailers and the existing f18 repository to freeze submissions there.

Who is able to suspend merge-commit prohibition in github?
Alternatively, could I (github: peterwaller-arm) be given the power
temporarily whilst executing this? That way I could turn it off only for
the few minutes required to do the push, and give up the permissions
immediately after. Alternatively this could be synchronized with a phone
call or email with someone with the capability. I intend to do this in
the morning GMT (around 10:00am) leaving plenty of time in the workday
to fix problems.

If anyone else wants to coordinate with me with respect to when the
merge happens, please get in touch.

The current plan lives at
<https://github.com/flang-compiler/f18/issues/876&gt;, if there are other
considerations please email me or comment on that thread and I will
update the check-list before it is executed.

The release branch is scheduled for Wed 15th January. I propose to
execute the plan on the Mon 13th. That gives the week beginning 6th Jan
to finalize the plans and gather any remaining feedback.

Is everyone happy with 13th January? Please speak up if not.

I'm on leave until the 6th of January and will respond to any further
comments as soon as I can after that point.

Regards,

- Peter

Yes, I looked through those sources and a number of my questions
around which clang versions have been supported and directory
structure. I think the only difference is removing the direct
questions about earlier flang, but I still don't see code generation
or uses of llvm libraries that would conform to "written in the style
of LLVM and clang". Can you perhaps point me to where I'm missing
these things?

I can't speak with authority on all of these issues.

In terms of clang versions, I understand that clang version 7 and 8 are
currently supported. We would expect it to work with newer LLVM
versions, and
the readme is currently out of date. The intent is that it will work
with all branches of LLVM and the community will build up CI to protect
this.

It's worth mentioning that merging flang in at this point does not
affect the existing LLVM build in any way. Patches to integrate the
build system are expected in the near future, and be subject to the
normal LLVM code review processes.

I understand that code generation is a work in progress and is expected
to start landing in the not too distant future. Other people (Steve
Scalpone, cc'd, and others) can perhaps speak to this more than me.

In terms of using LLVM ADTs, etc, I expect that once flang is part of
the monorepo, there will be a greater usage of those things.

Chris's earlier acceptance aside I don't see any evidence of code
review as part of that and so I'd expect we'd see more here.

Code review has been happening all along in the the f18 github repository:
https://github.com/flang-compiler/f18/pulls

I am operating on the assumption that the code will land with no
additional review by members of the LLVM community, and future code
review will happen with the same mechanisms that the wider LLVM project use.

Regards,

- Peter

Hi Peter,

At this point I’m very confused at the point of landing the code. Outside of the flang name there doesn’t appear to be a single thing that says this is an llvm project. There’s no evidence of llvm code or style guide review, no use of llvm APIs, or design similar to existing front ends. In addition, the license files also don’t appear to match the current license of the project - this appears to be nvidia copyright (though under apache 2 as is correct as far as I can tell).

I’d be very curious in reading the minutes from the board session where this was discussed and what conditions were given for commit to the repository.

-eric

Hi Eric,

The LICENSE file in the f18 github repo reflects the new llvm licensing. We expect to change the file headers to reflect the relicensing in the next few days. The NVIDIA copyright will be removed.

  • Steve