[RFC] Move py-mlir-release to new top-level repo in the LLVM org

Hi folks, I would like to propose that we create a new top-level repo in the LLVM organization for organizing the Python MLIR Releases (both daily and official numbered releases, whenever we are ready for such a thing) and corresponding pushes to package repositories, etc.

I have prototyped such a release process in a personal repo: https://github.com/stellaraccident/mlir-py-release

Additional development on that release process is currently blocked on more work on the shared library organization in LLVM (discussed here https://lists.llvm.org/pipermail/llvm-dev/2021-January/147567.html and being worked on independently) but it is useful as is and a reasonable starting point for further work.

I would propose that we just fork my current repo into a new one in the LLVM organization and then take the necessary steps to get credentials/permissions/secrets set up in the new context.

Some answers to questions that may come up:

  • Why should this be a repo separate from llvm-project? These kind of automation repos tend to have a lot of “garbage” commits that I think is best if they do not pollute the main repo (and also don’t face contention on automatic jobs bumping things, etc). They also tend to require special permissions and secrets that we will want to more tightly control. They also make use of other GitHub features that it seems like we would like not polluting the main development flow (“Releases” tab, Actions, etc). Also, this is the kind of thing that tends to get revised en-masse periodically, and again, it would be good to not pollute the monorepo.

  • Why not land this in llvm-zorg? llvm-zorg claims to be for “LLVM Testing Infrastructure” and seems well scoped to that statement. What I am managing above is periodic, automated release tooling based on open-source CI systems (currently GitHub Actions), which are fairly standardized across the Python releasing community, easy to set up, etc.

  • What ultimately will the code in this repo do?

  • Have periodic GitHub actions to select new LLVM revisions and schedule daily/snapshot releases.

  • Have manual actions for triggering official, numbered releases.

  • Facilities for building Python wheels for PyPi and house any additional metadata/automation needed for anaconda.

  • Builds releases for all supported operating systems (currently Linux/CentOS7/manylinux2014, MacOS, and Windows) and supported Python versions (currently 3.6, 3.7, 3.8, 3.9).

  • Publish release artifacts on the Releases tab for daily/snapshot releases.

  • Provide a stable reference point for downstream projects that extend MLIR-Python and need to produce version-matched artifacts of their own.- Could this graduate to be more than “MLIR” python? Maybe. I chose the name because that is what I am focused on and didn’t want to grab too much land. But there is nothing stopping this from becoming automation for general LLVM monorepo+incubator Python releasing.

  • What if we don’t do this?

  • Option A: We keep running this in a private repo with the disclaimer that is currently at the top: “Note that this is a prototype of a real MLIR release process being run by a member of the community. These are not official releases of the LLVM Foundation in any way, and they are likely only going to be useful to people actually working on LLVM/MLIR until we get things productionized.” We would miss opportunities for convergence with other projects and would cause things to fragment.

  • Option B: We only publish Python bindings in official LLVM release packages, and only for the Python version they are built with. We don’t release Python binaries through normal package management channels.
    Opinions?

  • Stella

Thanks for working on this Stella!

I’m wondering about the versioning: these packages that we’d publish continuously are necessarily “unstable” to some extent. How do we handle the versioning and the alignment with LLVM releases?

Otherwise I’m fine with a new repo for this, in particular since it involves GitHub action and these can’t be isolated (in terms of notification and such) other than at the repository boundary.

For the LLVM releases, would we need to branch in this repo as well?

Thanks,

Thanks for working on this Stella!

I’m wondering about the versioning: these packages that we’d publish continuously are necessarily “unstable” to some extent. How do we handle the versioning and the alignment with LLVM releases?

The way I envision this is that daily/snapshot releases are just at a floating llvm-project head. Currently, automation bumps that head twice per day by committing a new commit hash to the llvm-project.version file, allocating a new/unique/monotonic snapshot-YYYYMMDD.NN tag and scheduling the build of a new release at that tag (GitHub releases are 1:1 with a tag). The source revision and tag information is published in the built wheels themselves. I had it set up so that a downstream project that was able to do an “import mlir” could also track back to the installed headers and shared libraries that produced it and could be used to build against it, up to and including its own wheels that depend on the exact version published. I ended up backing that last part out, not because it was a bad idea per se, but because I decided it would be more effective to straighten out the shared library situation first, before letting the current arrangement spider (currently the released mlir wheel only contains enough to build/link against the C API). These snapshot releases are unstable but we do retain them as fixed reference points that are tested/available for close to at-head downstreams to stay current.

I expect that once these things mature to the point of being ready for inclusion in official, numbered LLVM releases, we would create a branch in this repo for the release, commit the llvm-project.version file to pin it to the upstream release HEAD, and then schedule an “official” release (something like “llvm-12.0rc1.NN” vs the current “snapshot-YYYYMMDD.NN”), build and deploy it to PyPi. Most of that would be completely shared with the snapshot release pipeline, except there would be a different (or even manual) trigger job, and these would be uploaded to PyPi (snapshot releases just accumulate on the GitHub releases page and can be installed directly from there).

Otherwise I’m fine with a new repo for this, in particular since it involves GitHub action and these can’t be isolated (in terms of notification and such) other than at the repository boundary.

For the LLVM releases, would we need to branch in this repo as well?

A branch would be needed to capture the commit relationship, and then each actual build would be a tag to a commit on that branch.

Hi folks, I would like to propose that we create a new top-level repo in the LLVM organization for organizing the Python MLIR Releases (both daily and official numbered releases, whenever we are ready for such a thing) and corresponding pushes to package repositories, etc.

For those of use that are unfamiliar, can you explain what the "Python MLIR Releases" are?

I have prototyped such a release process in a personal repo: GitHub - stellaraccident/mlir-py-release

Additional development on that release process is currently blocked on more work on the shared library organization in LLVM (discussed here [llvm-dev] [RFC] Modernize CMake LLVM "Components"/libLLVM Facility and being worked on independently) but it is useful as is and a reasonable starting point for further work.

I would propose that we just fork my current repo into a new one in the LLVM organization and then take the necessary steps to get credentials/permissions/secrets set up in the new context.

Some answers to questions that may come up:

  * *Why should this be a repo separate from llvm-project? *These kind
    of automation repos tend to have a lot of "garbage" commits that I
    think is best if they do not pollute the main repo (and also don't
    face contention on automatic jobs bumping things, etc). They also
    tend to require special permissions and secrets that we will want to
    more tightly control. They also make use of other GitHub features
    that it seems like we would like not polluting the main development
    flow ("Releases" tab, Actions, etc). Also, this is the kind of thing
    that tends to get revised en-masse periodically, and again, it would
    be good to not pollute the monorepo.

There really aren't many files in this repo, do you anticipate it growing significantly?

  * *Why not land this in llvm-zorg? *llvm-zorg claims to be for "LLVM
    Testing Infrastructure" and seems well scoped to that statement.
    What I am managing above is periodic, automated release tooling
    based on open-source CI systems (currently GitHub Actions), which
    are fairly standardized across the Python releasing community, easy
    to set up, etc.

llvm-zorg also handles generating the websites. My personal opinion is that it would be OK to try to do this in llvm-zorg, but you're probably better off asking Galina about that. I guess the downside of using llvm-zorg is you don't get the releases tab.

Why did you choose to write the checkout_repo.py script in python rather than using the GitHub checkout action, or writing your own custom action?

  * *What ultimately will the code in this repo do?*
      o Have periodic GitHub actions to select new LLVM revisions and
        schedule daily/snapshot releases.

Do you have any idea of much of the GitHub actions resources this would use? e.g. how many hours per week per Operating System?

      o Have manual actions for triggering official, numbered releases.
      o Facilities for building Python wheels for PyPi and house any
        additional metadata/automation needed for anaconda.
      o Builds releases for all supported operating systems (currently
        Linux/CentOS7/manylinux2014, MacOS, and Windows) and supported
        Python versions (currently 3.6, 3.7, 3.8, 3.9).
      o Publish release artifacts on the Releases tab for daily/snapshot
        releases.
      o Provide a stable reference point for downstream projects that
        extend MLIR-Python and need to produce version-matched artifacts
        of their own.
  * *Could this graduate to be more than "MLIR" python?* Maybe. I chose
    the name because that is what I am focused on and didn't want to
    grab too much land. But there is nothing stopping this from becoming
    automation for general LLVM monorepo+incubator Python releasing.

I think it would be great to generalize this. I would also like to automate parts the main LLVM release, and there seems to be some overlap with what you are doing.

-Tom

Hi folks, I would like to propose that we create a new top-level repo in
the LLVM organization for organizing the Python MLIR Releases (both
daily and official numbered releases, whenever we are ready for such a
thing) and corresponding pushes to package repositories, etc.

For those of use that are unfamiliar, can you explain what the “Python
MLIR Releases” are?

Sure: They are the python wheels and source distributions for the MLIR Python Bindings. The key is that we do them in concordance with how Python packages get released and push them through standard channels for deployment, and this involves some gymnastics (of which, what I have will grow in some complexity as we do this, based on the experience of other projects). They basically include everything such that if you do a “pip install mlir” you get a working package that is able to build and compile MLIR based IR in a variety of forms. An ancillary function of them is to enable downstream Python based projects to extend the system, so it entails distributing enough headers and libraries to make this feasible.

I have prototyped such a release process in a personal repo:
https://github.com/stellaraccident/mlir-py-release

Additional development on that release process is currently blocked on
more work on the shared library organization in LLVM (discussed here
https://lists.llvm.org/pipermail/llvm-dev/2021-January/147567.html and
being worked on independently) but it is useful as is and a reasonable
starting point for further work.

I would propose that we just fork my current repo into a new one in the
LLVM organization and then take the necessary steps to get
credentials/permissions/secrets set up in the new context.

Some answers to questions that may come up:

  • *Why should this be a repo separate from llvm-project? *These kind
    of automation repos tend to have a lot of “garbage” commits that I
    think is best if they do not pollute the main repo (and also don’t
    face contention on automatic jobs bumping things, etc). They also
    tend to require special permissions and secrets that we will want to
    more tightly control. They also make use of other GitHub features
    that it seems like we would like not polluting the main development
    flow (“Releases” tab, Actions, etc). Also, this is the kind of thing
    that tends to get revised en-masse periodically, and again, it would
    be good to not pollute the monorepo.

There really aren’t many files in this repo, do you anticipate it
growing significantly?

Not terribly so. Just from some personal experience, the ways things are done for Python packaging are somewhat… esoteric… from a normal C++ build flow and necessitate certain directory layouts and such that I felt were better left to their own thing (it is something that you want to do exactly as everyone else does it).

  • *Why not land this in llvm-zorg? *llvm-zorg claims to be for “LLVM
    Testing Infrastructure” and seems well scoped to that statement.
    What I am managing above is periodic, automated release tooling
    based on open-source CI systems (currently GitHub Actions), which
    are fairly standardized across the Python releasing community, easy
    to set up, etc.

llvm-zorg also handles generating the websites. My personal opinion is
that it would be OK to try to do this in llvm-zorg, but you’re probably
better off asking Galina about that. I guess the downside of using
llvm-zorg is you don’t get the releases tab.

That is a good reason to put it there. One of the actions that is not implemented yet is for generating API docs (which is done post build/install for the Python side, because it introspects a running system).

The releases page is actually pretty important. For snapshot builds, python’s pip can just scrape it directly for published, installable artifacts and without it, we would need to roll our own place to stash such things.

Why did you choose to write the checkout_repo.py script in python rather
than using the GitHub checkout action, or writing your own custom action?

Good question - that was a limitation in my knowledge at the time (need to source the version from a file). Consider that a TODO to eliminate.

  • What ultimately will the code in this repo do?
    o Have periodic GitHub actions to select new LLVM revisions and
    schedule daily/snapshot releases.

Do you have any idea of much of the GitHub actions resources this would
use? e.g. how many hours per week per Operating System?

Currently, each snapshot builds for about 30m on the free 2-core setups per OS. However, this isn’t presently compiling as much of LLVM that will ultimately be needed. I have automation for another project where we do build more/most of the backends as well, and that builds for 1.25-1.5 hours per snapshot (and builds a fair bit more things unrelated to LLVM, so just an upper bound estimate). On my other project, I found that each minor python version added (of which, there are probably ~4 LTS at any given time) added about 1min to each build.

So if we are doing 2 snapshots a day and being conservative, 28 hours/week/OS?

I’m not running tests yet, so that will come with some costs. We will probably choose to run just the python bindings tests per python version (which are really cheap) and then run the full regression suite once per OS.

o Have manual actions for triggering official, numbered releases.
o Facilities for building Python wheels for PyPi and house any
additional metadata/automation needed for anaconda.
o Builds releases for all supported operating systems (currently
Linux/CentOS7/manylinux2014, MacOS, and Windows) and supported
Python versions (currently 3.6, 3.7, 3.8, 3.9).
o Publish release artifacts on the Releases tab for daily/snapshot
releases.
o Provide a stable reference point for downstream projects that
extend MLIR-Python and need to produce version-matched artifacts
of their own.

  • Could this graduate to be more than “MLIR” python? Maybe. I chose
    the name because that is what I am focused on and didn’t want to
    grab too much land. But there is nothing stopping this from becoming
    automation for general LLVM monorepo+incubator Python releasing.

I think it would be great to generalize this. I would also like to
automate parts the main LLVM release, and there seems to be some overlap
with what you are doing.

Agreed. I actually found this quite easy to prototype. I think I spent a grand total of ~a day on what is there (which isn’t done yet, but isn’t super far off). It then took me ~3 days to adapt it to IREE (https://github.com/google/iree), which is much more complicated (as it has to build LLVM, a bunch of deps and TensorFlow).

     > Hi folks, I would like to propose that we create a new top-level
    repo in
     > the LLVM organization for organizing the Python MLIR Releases (both
     > daily and official numbered releases, whenever we are ready for
    such a
     > thing) and corresponding pushes to package repositories, etc.
     >

    For those of use that are unfamiliar, can you explain what the "Python
    MLIR Releases" are?

Sure: They are the python wheels and source distributions for the [MLIR Python Bindings](MLIR Python Bindings - MLIR). The key is that we do them in concordance with how Python packages get released and push them through standard channels for deployment, and this involves some gymnastics (of which, what I have will grow in some complexity as we do this, based on the experience of other projects). They basically include everything such that if you do a "pip install mlir" you get a working package that is able to build and compile MLIR based IR in a variety of forms. An ancillary function of them is to enable downstream Python based projects to extend the system, so it entails distributing enough headers and libraries to make this feasible.

Ok, so it's this python code: llvm-project/mlir/lib/Bindings/Python ?

     > I have prototyped such a release process in a personal repo:
     > GitHub - stellaraccident/mlir-py-release
     >
     > Additional development on that release process is currently
    blocked on
     > more work on the shared library organization in LLVM (discussed here
     >
    [llvm-dev] [RFC] Modernize CMake LLVM "Components"/libLLVM Facility and
     > being worked on independently) but it is useful as is and a
    reasonable
     > starting point for further work.
     >
     > I would propose that we just fork my current repo into a new one
    in the
     > LLVM organization and then take the necessary steps to get
     > credentials/permissions/secrets set up in the new context.
     >
     > Some answers to questions that may come up:
     >
     > * *Why should this be a repo separate from llvm-project? *These
    kind
     > of automation repos tend to have a lot of "garbage" commits
    that I
     > think is best if they do not pollute the main repo (and also
    don't
     > face contention on automatic jobs bumping things, etc). They also
     > tend to require special permissions and secrets that we will
    want to
     > more tightly control. They also make use of other GitHub features
     > that it seems like we would like not polluting the main
    development
     > flow ("Releases" tab, Actions, etc). Also, this is the kind
    of thing
     > that tends to get revised en-masse periodically, and again,
    it would
     > be good to not pollute the monorepo.

    There really aren't many files in this repo, do you anticipate it
    growing significantly?

Not terribly so. Just from some personal experience, the ways things are done for Python packaging are somewhat... esoteric... from a normal C++ build flow and necessitate certain directory layouts and such that I felt were better left to their own thing (it is something that you want to do exactly as everyone else does it).

     > * *Why not land this in llvm-zorg? *llvm-zorg claims to be for
    "LLVM
     > Testing Infrastructure" and seems well scoped to that statement.
     > What I am managing above is periodic, automated release tooling
     > based on open-source CI systems (currently GitHub Actions), which
     > are fairly standardized across the Python releasing
    community, easy
     > to set up, etc.

    llvm-zorg also handles generating the websites. My personal opinion is
    that it would be OK to try to do this in llvm-zorg, but you're probably
    better off asking Galina about that. I guess the downside of using
    llvm-zorg is you don't get the releases tab.

That is a good reason to put it there. One of the actions that is not implemented yet is for generating API docs (which is done post build/install for the Python side, because it introspects a running system).

The releases page is actually pretty important. For snapshot builds, python's pip can just scrape it directly for published, installable artifacts and without it, we would need to roll our own place to stash such things.

Could you have the GitHub action directly submit the package to pip rather than having it scrape the release page? If we could, would there be any reason to have a release page? Would users be downloading from the release page or from pip?

    Why did you choose to write the checkout_repo.py script in python
    rather
    than using the GitHub checkout action, or writing your own custom
    action?

Good question - that was a limitation in my knowledge at the time (need to source the version from a file). Consider that a TODO to eliminate.

If you need anything more complicated than some of the builtin actions, you an add them to the llvm/actions repo.

-Tom

Hi folks, I would like to propose that we create a new top-level
repo in
the LLVM organization for organizing the Python MLIR Releases (both
daily and official numbered releases, whenever we are ready for
such a
thing) and corresponding pushes to package repositories, etc.

For those of use that are unfamiliar, can you explain what the “Python
MLIR Releases” are?

Sure: They are the python wheels and source distributions for the MLIR
Python Bindings
. The key
is that we do them in concordance with how Python packages get released
and push them through standard channels for deployment, and this
involves some gymnastics (of which, what I have will grow in some
complexity as we do this, based on the experience of other projects).
They basically include everything such that if you do a “pip install
mlir” you get a working package that is able to build and compile MLIR
based IR in a variety of forms. An ancillary function of them is to
enable downstream Python based projects to extend the system, so it
entails distributing enough headers and libraries to make this feasible.

Ok, so it’s this python code: llvm-project/mlir/lib/Bindings/Python ?

I have prototyped such a release process in a personal repo:
https://github.com/stellaraccident/mlir-py-release

Additional development on that release process is currently
blocked on
more work on the shared library organization in LLVM (discussed here

https://lists.llvm.org/pipermail/llvm-dev/2021-January/147567.html and

being worked on independently) but it is useful as is and a
reasonable
starting point for further work.

I would propose that we just fork my current repo into a new one
in the
LLVM organization and then take the necessary steps to get
credentials/permissions/secrets set up in the new context.

Some answers to questions that may come up:

  • *Why should this be a repo separate from llvm-project? *These
    kind
    of automation repos tend to have a lot of “garbage” commits
    that I
    think is best if they do not pollute the main repo (and also
    don’t
    face contention on automatic jobs bumping things, etc). They also
    tend to require special permissions and secrets that we will
    want to
    more tightly control. They also make use of other GitHub features
    that it seems like we would like not polluting the main
    development
    flow (“Releases” tab, Actions, etc). Also, this is the kind
    of thing
    that tends to get revised en-masse periodically, and again,
    it would
    be good to not pollute the monorepo.

There really aren’t many files in this repo, do you anticipate it
growing significantly?

Not terribly so. Just from some personal experience, the ways things are
done for Python packaging are somewhat… esoteric… from a normal C++
build flow and necessitate certain directory layouts and such that I
felt were better left to their own thing (it is something that you want
to do exactly as everyone else does it).

  • *Why not land this in llvm-zorg? *llvm-zorg claims to be for
    “LLVM
    Testing Infrastructure” and seems well scoped to that statement.
    What I am managing above is periodic, automated release tooling
    based on open-source CI systems (currently GitHub Actions), which
    are fairly standardized across the Python releasing
    community, easy
    to set up, etc.

llvm-zorg also handles generating the websites. My personal opinion is
that it would be OK to try to do this in llvm-zorg, but you’re probably
better off asking Galina about that. I guess the downside of using
llvm-zorg is you don’t get the releases tab.

That is a good reason to put it there. One of the actions that is not
implemented yet is for generating API docs (which is done post
build/install for the Python side, because it introspects a running system).

The releases page is actually pretty important. For snapshot builds,
python’s pip can just scrape it directly for published, installable
artifacts and without it, we would need to roll our own place to stash
such things.

Could you have the GitHub action directly submit the package to pip
rather than having it scrape the release page? If we could, would there
be any reason to have a release page? Would users be downloading from
the release page or from pip?

My team’s preference while we are very pre-release like we are is to not pollute the pip namespace until we’re sure we have what we want. Deploying to the local project’s release page is a good way to have some people be able to use it earlier but also still have an appropriate barrier to entry that matches where its at in the life-cycle.

Personal preference.

Some projects end up always deploying from their release page because they can’t comply with PyPi policies (usually around distro version, dependencies, etc), but I’ve charted this out and think we will stay compliant.

Why did you choose to write the checkout_repo.py script in python
rather
than using the GitHub checkout action, or writing your own custom
action?

Good question - that was a limitation in my knowledge at the time (need
to source the version from a file). Consider that a TODO to eliminate.

If you need anything more complicated than some of the builtin actions,
you an add them to the llvm/actions repo.

Nice, thanks.

I forgot to include an update on this: A month ago, Tom and I discussed on Discord and thought that it would be fine to implement this support in the monorepo with GitHub Actions (vs in a new repo).