MLIR landing in the monorepo

Hi,

(bcc: mlir@tensorflow.org FYI)

I am following-up on the integration of MLIR in LLVM as a subproject (Re: http://lists.llvm.org/pipermail/llvm-dev/2019-October/135687.html ).

We’re aiming to integrate into the monorepo next month. Right now our intent is for MLIR to live in a top-level directory in parallel to clang, lldb, lld, etc.
Our top option for the integration is to perform a git subtree merge to bring the MLIR history into the monorepo, here is a prototype: https://github.com/joker-eph/llvm-project-with-mlir

If you’re curious to try it, at the moment it needs a specific CMake invocation:

cmake -G Ninja …/llvm/ -DLLVM_TARGETS_TO_BUILD=“host” -DLLVM_EXTERNAL_PROJECTS=mlir -DLLVM_EXTERNAL_MLIR_SOURCE_DIR={path to repo}/mlir/

We’ll hook into -DLLVM_ENABLE_PROJECTS after landing.

Let me know if you have any comment about this!

Cheers,

Hi,

(bcc: mlir@tensorflow.org FYI)

I am following-up on the integration of MLIR in LLVM as a subproject (Re: http://lists.llvm.org/pipermail/llvm-dev/2019-October/135687.html ).

We’re aiming to integrate into the monorepo next month. Right now our intent is for MLIR to live in a top-level directory in parallel to clang, lldb, lld, etc.

Sounds right.

Our top option for the integration is to perform a git subtree merge to bring the MLIR history into the monorepo, here is a prototype: https://github.com/joker-eph/llvm-project-with-mlir

I’ll note that this would be the very-first merge commit on master. I’m not opposed to this, but others may be. (To allow this, we’d temporarily reconfigure github to allow pushing merge-commits, for this one commit, and then disable it again.)

However, another issue is that subtree merges have really weird artifacts when trying to look through history, with e.g. git log . I think I’d really prefer to avoid utilizing a subtree-merge for this.

Hi,

(bcc: mlir@tensorflow.org FYI)

I am following-up on the integration of MLIR in LLVM as a subproject (Re: http://lists.llvm.org/pipermail/llvm-dev/2019-October/135687.html ).

We’re aiming to integrate into the monorepo next month. Right now our intent is for MLIR to live in a top-level directory in parallel to clang, lldb, lld, etc.

Sounds right.

Our top option for the integration is to perform a git subtree merge to bring the MLIR history into the monorepo, here is a prototype: https://github.com/joker-eph/llvm-project-with-mlir

I’ll note that this would be the very-first merge commit on master. I’m not opposed to this, but others may be. (To allow this, we’d temporarily reconfigure github to allow pushing merge-commits, for this one commit, and then disable it again.)

However, another issue is that subtree merges have really weird artifacts when trying to look through history, with e.g. git log . I think I’d really prefer to avoid utilizing a subtree-merge for this.

Right: git blame work well though. The alternative would be to have a commit to move the files under a “mlir” directory and then do a normal merge?
I am interested to hear about other advices here :slight_smile:

Thanks,

Mehdi

The alternative I had in mind would be to rewrite the commits on the branch so that all the files are under an mlir/ subdirectory, and then do a normal merge from that.

Since you are going to rewrite the mlir history anyway, you can
probably delete accidentally checked in large files if any.

* I don't know whether the file CONTRIBUTING.md is still appropriate,
at least for the Code of Conduct, LLVM has its own version.
* g3doc/ seems a very Google specific name. Does `docs/` work?
* bindings/python/pybind.cpp - does it have to be an in-tree plugin?
* The Apache 2 license headers are verbose. LLVM uses
SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception.

Since you are going to rewrite the mlir history anyway, you can
probably delete accidentally checked in large files if any.

Good point, I checked and this is the largest file in the history of the repo as far as I can tell: https://github.com/joker-eph/llvm-project-with-mlir/blob/master/mlir/g3doc/includes/img/view-operation.svg (155kB)

  • I don’t know whether the file CONTRIBUTING.md is still appropriate,
    at least for the Code of Conduct, LLVM has its own version.
  • g3doc/ seems a very Google specific name. Does docs/ work?
  • bindings/python/pybind.cpp - does it have to be an in-tree plugin?
  • The Apache 2 license headers are verbose. LLVM uses
    SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception.

Absolutely! All of these points (except the python bindings) are on my TODO list of things to do at the time of the merge. At the moment there is just a script continuously performing the merge in my test repository so this is an exact view of the current state of the public repo.
I could also try to rewrite the license in the header in all the history of the repository, but I’m not sure it won’t be brittle in practice, I was planning to do an update to all the files before pushing.

I’ll send the final version of the repo next month, I can CC you if you’d like to review this before we push it?

For the python bindings, these are intended to provide some equivalent facility to the LLVM python bindings: https://github.com/joker-eph/llvm-project-with-mlir/tree/master/llvm/bindings/python ; while they need some work at the moment, I think we will want to have bindings though.

Since you are going to rewrite the mlir history anyway, you can
probably delete accidentally checked in large files if any.

Good point, I checked and this is the largest file in the history of the repo as far as I can tell: https://github.com/joker-eph/llvm-project-with-mlir/blob/master/mlir/g3doc/includes/img/view-operation.svg (155kB)

* I don't know whether the file CONTRIBUTING.md is still appropriate,
at least for the Code of Conduct, LLVM has its own version.
* g3doc/ seems a very Google specific name. Does `docs/` work?
* bindings/python/pybind.cpp - does it have to be an in-tree plugin?
* The Apache 2 license headers are verbose. LLVM uses
SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception.

Absolutely! All of these points (except the python bindings) are on my TODO list of things to do at the time of the merge. At the moment there is just a script continuously performing the merge in my test repository so this is an exact view of the current state of the public repo.
I could also try to rewrite the license in the header in all the history of the repository, but I'm not sure it won't be brittle in practice, I was planning to do an update to all the files before pushing.

I'll send the final version of the repo next month, I can CC you if you'd like to review this before we push it?

Sure!

For the python bindings, these are intended to provide some equivalent facility to the LLVM python bindings: https://github.com/joker-eph/llvm-project-with-mlir/tree/master/llvm/bindings/python ; while they need some work at the moment, I think we will want to have bindings though.

Another thing is that after you make CMake work
-DLLVM_ENABLE_PROJECTS='...;mlir;...', it'd be nice to reorder that
commit to the very beginning of the MLIR history. People want to
bisect and build at any commit in the pre-monorepo history. If they
can only build at the last commit, that will still be very
inconvenient.

Another thing. Repository issue/PR references in the descriptions no
longer work.

  Closes #211
  Fix #211

You probably want to change them, probably to the full URI.

I don’t think it is possible: it requires changes in LLVM itself but also the revision pre-merge contain only MLIR and not LLVM: these revision won’t build as is.
I think bisection in the pre-merge history will have to use “commit date” to find a compatible revision in LLVM, and at this point the need for the more verbose syntax I send earlier isn’t the more annoying part.

Hopefully going back far in time is not something that will be common.

The alternative I had in mind would be to rewrite the commits on the branch so that all the files are under an mlir/ subdirectory, and then do a normal merge from that.

OK, this is done now: the repo reflects this.

Good point! I’ll look into this as well.

Since you are going to rewrite the mlir history anyway, you can
probably delete accidentally checked in large files if any.

Good point, I checked and this is the largest file in the history of the repo as far as I can tell: https://github.com/joker-eph/llvm-project-with-mlir/blob/master/mlir/g3doc/includes/img/view-operation.svg (155kB)

  • I don’t know whether the file CONTRIBUTING.md is still appropriate,
    at least for the Code of Conduct, LLVM has its own version.
  • g3doc/ seems a very Google specific name. Does docs/ work?
  • bindings/python/pybind.cpp - does it have to be an in-tree plugin?
  • The Apache 2 license headers are verbose. LLVM uses
    SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception.

Absolutely! All of these points (except the python bindings) are on my TODO list of things to do at the time of the merge. At the moment there is just a script continuously performing the merge in my test repository so this is an exact view of the current state of the public repo.
I could also try to rewrite the license in the header in all the history of the repository, but I’m not sure it won’t be brittle in practice, I was planning to do an update to all the files before pushing.

I’ll send the final version of the repo next month, I can CC you if you’d like to review this before we push it?

For the python bindings, these are intended to provide some equivalent facility to the LLVM python bindings: https://github.com/joker-eph/llvm-project-with-mlir/tree/master/llvm/bindings/python ; while they need some work at the moment, I think we will want to have bindings though.

We actually already have users of these bindings. However, these bindings are using the pybind11 library, not just core Python/C interop functionality. I’m not sure that we would want to bring in an additional dependency.

James Y Knight via llvm-dev <llvm-dev@lists.llvm.org> writes:

However, another issue is that subtree merges have really weird artifacts
when trying to look through history, with e.g. git log <filename>. I think
I'd really prefer to avoid utilizing a subtree-merge for this.

I agree. Subtree-merges are really strange when walking back through
history. If someone checks out an early commit from mlir, all they will
see in their working directory is the mlir sources. All other
components will have disappeared.

I you want to maintain history, you can look at the scripts James, I and
others worked on for the LLVM git migration:

In particular, have a look at this pull request:

This provides a tool to graft an existing git repository into a
subdirectory of the monorepo. It has the option of rewriting commits so
that checkouts of those commits will maintain the other subprojects
alongside. That functionality isn't well tested but I'd be happy to
work on it some more if needed.

I think this could work well for MLIR.

                      -David

James Y Knight via llvm-dev <llvm-dev@lists.llvm.org> writes:

The alternative I had in mind would be to rewrite the commits on the branch
so that all the files are under an mlir/ subdirectory, and then do a normal
merge from that.

I just sent a message about import-downstream-repo.py, shown here:

The default mode of operation does exactly what James says here. It
rewrites the commits so all blobs are under a specific subdirectory.
Then you can do a merge from the rewritten MLIR HEAD.

With the --import-list option you can tell the tool to preserve blobs
from other subprojects alongside the MLIR blobs. I did not test that
functionality much, though.

With default operation, a checkout of an early MLIR commit would should
only an "mlir" directory in the working directory. With --import-list
you'd see all of the other subproject directories, though the contents
of those other directories wouldn't change as you walked back through
early MLIR history.

                      -David

James Y Knight via llvm-dev <llvm-dev@lists.llvm.org> writes:

The alternative I had in mind would be to rewrite the commits on the branch
so that all the files are under an mlir/ subdirectory, and then do a normal
merge from that.

I just sent a message about import-downstream-repo.py, shown here:

https://github.com/jyknight/llvm-git-migration/pull/6/commits

The default mode of operation does exactly what James says here. It
rewrites the commits so all blobs are under a specific subdirectory.
Then you can do a merge from the rewritten MLIR HEAD.

I used git-filter-repo, but that is exactly what I’ve been doing actually.

With the --import-list option you can tell the tool to preserve blobs
from other subprojects alongside the MLIR blobs. I did not test that
functionality much, though.

With default operation, a checkout of an early MLIR commit would should
only an “mlir” directory in the working directory.

Right that what I have right now in the repo, for example: https://github.com/joker-eph/llvm-project-with-mlir/tree/291a8e7ca113c4a8fc597fc0ec1a3a4e4e639f78

With --import-list
you’d see all of the other subproject directories, though the contents
of those other directories wouldn’t change as you walked back through
early MLIR history.

OK, but that seems like a “wrong” history: the state would seem quite misleading to me by mixing a recent LLVM with an old MLIR (and the code wouldn’t be able to build successfully at any of these revisions).
Can you clarify why would you prefer this over just a single mlir directory?

Thanks,

Mehdi AMINI <joker.eph@gmail.com> writes:

The default mode of operation does exactly what James says here. It
rewrites the commits so all blobs are under a specific subdirectory.
Then you can do a merge from the rewritten MLIR HEAD.

I used `git-filter-repo`, but that is exactly what I've been doing actually.

Great!

With default operation, a checkout of an early MLIR commit would should
only an "mlir" directory in the working directory.

Right that what I have right now in the repo, for example:
GitHub - joker-eph/llvm-project-with-mlir at 291a8e7ca113c4a8fc597fc0ec1a3a4e4e639f78

Ok.

With --import-list
you'd see all of the other subproject directories, though the contents
of those other directories wouldn't change as you walked back through
early MLIR history.

OK, but that seems like a "wrong" history: the state would seem quite
misleading to me by mixing a recent LLVM with an old MLIR (and the code
wouldn't be able to build successfully at any of these revisions).
Can you clarify why would you prefer this over just a single `mlir`
directory?

I don't necessarily prefer it, I was just pointing it out as an option.
I'd be perfectly happy going with what you've got now!

                 -David