RFC: Staging area proposal for new backends

Hi,

I would like to follow up on the recent discussion on the mailing list
about requirements for new backends[1] by submitting the following
proposal for a staging area for new LLVM backends. This proposal
incorporates ideas from Owen, Chandler, and others who chimed in on
the original thread, and I hope the LLVM developers will be able to
come to a consensus on this proposal or a modified version, so the
project is able to accept new backends.

The goals of the staging area will be:
  1. Facilitate communication between the LLVM project and backend
     developers
  2. Ensure that new backends meet LLVM standards
  3. Give the backend more exposure to users and prospective developers

++ Staging Area:

Similar to the Linux kernel, the staging area for new backends will
be in the main LLVM tree, with building of the backend being disabled
by default. There will also be a TODO file in the backend's root
directory, that contains a list of improvements that are required to
promote the backend out of the staging area. The backend will be
assigned a steward who's role will be to guide the backend through
the staging process and help solicit feedback from other developers.

There are several advantages to having the staging area be in the main
tree as opposed to a separate branch:

1. It will be easier for LLVM developers to become familiar with the
   new backend and identify areas for improvement.

  If the new backend is in the main tree, LLVM developers are more
  likely to encounter it in their day to day development. Imagine a
  scenario where a developer makes a change to LLVM core that impacts
  several backends. The developer may grep the code looking for
  backends that make use of the feature that they have added or
  changed. If the new backend is in tree and uses that feature,
  the developer will see the code and might take a few moments to
  read through. While doing this the developer may notice an area for
  improvement for the backend and can update the backends's TODO file.
  The end result of this is that the LLVM developer has been able to
  provide some feedback with a minimal time commitment on their part.

  If the backend were staged in a separate tree, this kind of
  simple review would not be possible, and I would be concerned that
  developers would be too busy to ever get around to checking out
  the staging tree.

2. It will allow the backend developers to always develop against TOT.

  Developing against TOT is the recommended development procedure for
  anyone working on LLVM, and this is regularly reiterated on the
  mailing list. If the new backend is included in the main tree,
  the backend developers will have no choice but to work against TOT.

3. It will make it easier for end users and distributions to test and
   also make it easier for new contributors.

  New backends will be more visible to the public if they are in the
  main tree. This will mean more users, an expanded testing base, and
  more potential developers which will lead to a higher quality backend.

++ Promotion/Demotion from staging area:

  After a period of time, or when the tasks in the TODO file have been
  completed, the backend developers or the steward can initiate the
  review process. The review process will be conducted by either the
  steward, a committee, or some select developers, who will decide
  (maybe by vote in the case of a committee) whether the backend
  should be:

  - Promoted = Build of backend will be enabled by default.
  - Extended = Backend remains in the staging area.
  - Demoted = Removed from the main tree
    (I can't really think of any disadvantages to having a backend be
    in the main tree as long as its not being built by default, so maybe
    demotion would be reserved for cases of long term absence
    of maintainership)

  The Promoted/Extended/Demoted decision will be made using the
  following criteria (These won't necessarily all be absolutely
  required, they merely serve as a way for a backend's progress to
  be measured) :

  - Progress towards completion of TODO tasks
  - Active maintainership
  - Use of incremental development techniques
  - Adherence to LLVM coding style
  - Usage of modern LLVM features
  - Quality and quantity of regression tests
  - Availability of buildbots
  - Size of user base
  - Other criteria deemed important by LLVM developers

  - Contributions to core LLVM
    In the previous mailing list discussions there were differing
    opinions of how important contributing to the core LLVM code is
    for having a backend accepted. It seems like a good middle ground
    would be that backends should be free of code that works around
    bugs or deficiency in core LLVM and instead fix the problem in
    shared code, and also should make an effort to push optimization
    passes that may be useful to other targets into the shared parts
    of the code.

++ What is needed from the LLVM developers:

  In order to make this staging program successful, the LLVM project
  will need to appoint a "code owner" for the staging process, who
  backend developers can contact when they are interested in getting
  the backend included in the main tree. An LLVM developer will also
  be needed to act as a steward for the new backend and help guide
  the backend developers through the process.

Looking forward to comments on this proposal.

Thanks,
Tom Stellard

[1] http://lists.cs.uiuc.edu/pipermail/llvm-commits/Week-of-Mon-20120716/146560.html

I hope the official git repos could become staging-ready.
I guess Tom would prefer git :wink:

Unfortunately, for now, most of our buildbots are not git-ready.

...Takumi

Hi,

For this proposal it shouldn't matter what VCS is used, but my
assumption was that the staging area would be in the SVN tree.
The git mirrors should still be able to pick up staged backends, right?

-Tom

The goals of the staging area will be:
1. Facilitate communication between the LLVM project and backend
    developers
2. Ensure that new backends meet LLVM standards
3. Give the backend more exposure to users and prospective developers

FWIW, I really like this idea or concept, but we have to be careful for it to be done right. This is also more general than just backends: experimental optimizers and runtime libraries can also benefit from something like this.

++ Staging Area:

Similar to the Linux kernel, the staging area for new backends will
be in the main LLVM tree, with building of the backend being disabled
by default.

Makes sense. Them being in the main tree isn't a problem as long as they aren't built by default, tested in buildbots, etc.

1. It will be easier for LLVM developers to become familiar with the
  new backend and identify areas for improvement.

If the new backend is in the main tree, LLVM developers are more
likely to encounter it in their day to day development. Imagine a
scenario where a developer makes a change to LLVM core that impacts
several backends. The developer may grep the code looking for
backends that make use of the feature that they have added or
changed.

This makes sense, but it should not be a *requirement* that API changes don't break experimental backends. This would be the responsibility of the contributors/owner of that backend and/or steward to make sure it keeps building. Of course it's great for someone to update all the experimental backends if they want, but it shouldn't be a requirement.

If the backend were staged in a separate tree, this kind of
simple review would not be possible, and I would be concerned that
developers would be too busy to ever get around to checking out
the staging tree.

Yep.

2. It will allow the backend developers to always develop against TOT.

Developing against TOT is the recommended development procedure for
anyone working on LLVM, and this is regularly reiterated on the
mailing list. If the new backend is included in the main tree,
the backend developers will have no choice but to work against TOT.

+1!!!

3. It will make it easier for end users and distributions to test and
  also make it easier for new contributors.

New backends will be more visible to the public if they are in the
main tree. This will mean more users, an expanded testing base, and
more potential developers which will lead to a higher quality backend.

I'd also add:

4. Infrastructure enhancements that are only required for an experimental backend can be implemented in the main tree, even if that infrastructure isn't needed by other targets. Of course, these changes need to meet the standard quality bar for general code in the compiler.

In the past, we've had some general infrastructure features get denied because they didn't relate to any targets in-tree.

++ What is needed from the LLVM developers:

In order to make this staging program successful, the LLVM project
will need to appoint a "code owner" for the staging process, who
backend developers can contact when they are interested in getting
the backend included in the main tree. An LLVM developer will also
be needed to act as a steward for the new backend and help guide
the backend developers through the process.

I think we also need to define the minimum quality bar for a backend to be included.

Also, if we do this, can we demote CellSPU? :slight_smile:

-Chris

The goals of the staging area will be:

  1. Facilitate communication between the LLVM project and backend
    developers
  2. Ensure that new backends meet LLVM standards
  3. Give the backend more exposure to users and prospective developers

FWIW, I really like this idea or concept, but we have to be careful for it to be done right. This is also more general than just backends: experimental optimizers and runtime libraries can also benefit from something like this.

++ Staging Area:

Similar to the Linux kernel, the staging area for new backends will
be in the main LLVM tree, with building of the backend being disabled
by default.

Makes sense. Them being in the main tree isn’t a problem as long as they aren’t built by default, tested in buildbots, etc.

So will new back-ends be allowed to begin development in the “staging area?” Let’s say I want to develop a new back-end. Would I be able to start writing it in the LLVM main tree, or does it need to progress to a certain point in an external repository first?

  1. It will be easier for LLVM developers to become familiar with the
    new backend and identify areas for improvement.

If the new backend is in the main tree, LLVM developers are more
likely to encounter it in their day to day development. Imagine a
scenario where a developer makes a change to LLVM core that impacts
several backends. The developer may grep the code looking for
backends that make use of the feature that they have added or
changed.

This makes sense, but it should not be a requirement that API changes don’t break experimental backends. This would be the responsibility of the contributors/owner of that backend and/or steward to make sure it keeps building. Of course it’s great for someone to update all the experimental backends if they want, but it shouldn’t be a requirement.

If the backend were staged in a separate tree, this kind of
simple review would not be possible, and I would be concerned that
developers would be too busy to ever get around to checking out
the staging tree.

Yep.

  1. It will allow the backend developers to always develop against TOT.

Developing against TOT is the recommended development procedure for
anyone working on LLVM, and this is regularly reiterated on the
mailing list. If the new backend is included in the main tree,
the backend developers will have no choice but to work against TOT.

+1!!!

  1. It will make it easier for end users and distributions to test and
    also make it easier for new contributors.

New backends will be more visible to the public if they are in the
main tree. This will mean more users, an expanded testing base, and
more potential developers which will lead to a higher quality backend.

I’d also add:

  1. Infrastructure enhancements that are only required for an experimental backend can be implemented in the main tree, even if that infrastructure isn’t needed by other targets. Of course, these changes need to meet the standard quality bar for general code in the compiler.

In the past, we’ve had some general infrastructure features get denied because they didn’t relate to any targets in-tree.

This is definitely a great point. Part of the problem we had with the NVPTX back-end was that we needed some LLVM core changes, but only the NVPTX back-end would use these changes. It was a chicken-and-egg problem; this would solve that problem.

++ What is needed from the LLVM developers:

In order to make this staging program successful, the LLVM project
will need to appoint a “code owner” for the staging process, who
backend developers can contact when they are interested in getting
the backend included in the main tree. An LLVM developer will also
be needed to act as a steward for the new backend and help guide
the backend developers through the process.

I think we also need to define the minimum quality bar for a backend to be included.

This could be the trickiest part overall. In this sense, X86 and ARM are easy because reference assemblers/hardware/simulators are so easily available. These back-ends can also be verified through the LLVM test suite. My understanding of Tom’s R600 back-end is that it only works in conjunction with Mesa/Gallium, so verification on real hardware cannot easily be done (please correct me if I am wrong here).

On one hand, we could establish a minimum set of LLVM IR or SDAG that must be handled by the back-end and verified through unit test cases. Having 100% LLVM IR/SDAG coverage is not reasonable simply because some concepts (e.g. exception handling, all intrinsics, etc.) are not present in the target hardware, VM, etc…

My feeling is that this should remain somewhat generic and be established on a back-end by back-end basis, based on the target hardware and what can feasibly be supported.

Assuming you have supported hardware (AMD HD2XXX - HD6XXX GPUs), all you
need to do to verify the R600 compiler is download the latest development
version of Mesa[1], configure with --enable-r600-llvm-compiler (if you
want to use it for 3D apps) and/or --enable-opencl (if you want to use
it for OpenCL programs, this requires a patched clang/llvm at the moment
see the build instructions[2]), and then build and install.

We have a test suite called piglit[3] that we use for regression testing.
Not all the piglit tests stress the compiler, but a majority of them do,
so that would probably be the best way to verify the compiler.

A more fun way to verify the compiler would be to try playing your
favorite 3D game and check to see if the rendering matches a reference
OpenGL implementation.

While this isn't really a substitute for tests in the LLVM test suite,
if you have the hardware and are comfortable compiling your Mesa from
source then it is relatively easy to verify on real hardware.

[1] http://cgit.freedesktop.org/mesa/mesa/
[2] http://dri.freedesktop.org/wiki/GalliumCompute#How_to_Install
[3] http://cgit.freedesktop.org/piglit

-Tom

Hi Justin,

Chris Lattner <clattner@apple.com> writes:

FWIW, I really like this idea or concept, but we have to be careful
for it to be done right. This is also more general than just backends:
experimental optimizers and runtime libraries can also benefit from
something like this.

Absolutely. In particular:

In the past, we've had some general infrastructure features get denied
because they didn't relate to any targets in-tree.

The same has been true for some proposed analysis and optimization
passes. I would love to see a way to work on "experimental" passes
against trunk with the ability to move infrastructure pieces to
production state while still working on the pass.

Right now, there's a chicken-and-egg problem. Can I get infrastructure
changes through review if the pass that uses them isn't ready and
therefore isn't visible to other developers? Given the desire for
incremental development, how do I get a pass approved without the needed
infrastructure already in place?

A staging process for passes analogous to the proposal for backends
would solve that problem, I think. It also gives the pass writers a
more concrete idea of the requirements to move into production.

                            -Dave

<dag@cray.com> writes:

Chris Lattner <clattner@apple.com> writes:

FWIW, I really like this idea or concept, but we have to be careful
for it to be done right. This is also more general than just backends:
experimental optimizers and runtime libraries can also benefit from
something like this.

Absolutely. In particular:

In the past, we've had some general infrastructure features get denied
because they didn't relate to any targets in-tree.

(...)
A staging process for passes analogous to the proposal for backends
would solve that problem, I think. It also gives the pass writers a
more concrete idea of the requirements to move into production.

FWIW, other (significant) compiler project (QuickC-- [1]) used grading
system for the very same purpose based on three values [quality,
importance, urgency], with the descriptive levels. This might give an
overview what is wrong with the piece of the code or what would be
needed to make it better, how important the piece of code is, and how
urgent would be to improve it.

Two cents,

Hi,

I would like to try to keep the staging area discussion going. There
seems to be a general consensus that a staging area for backends and also
new features would be acceptable for the LLVM project. What actions
are required to make the staging area a reality? Is more discussion
needed? Is anyone willing to volunteer to be the "Code Owner" for the
staging area, to help move the process forward?

Thanks,
Tom

Hi,

I would like to try to keep the staging area discussion going.  There
seems to be a general consensus that a staging area for backends and also
new features would be acceptable for the LLVM project.  What actions
are required to make the staging area a reality?  Is more discussion
needed? Is anyone willing to volunteer to be the "Code Owner" for the
staging area, to help move the process forward?

Thanks,
Tom

The main issue I see is defining exactly how to incorporate the “staged” back-ends into the build system. I see a couple of possibilities.

  1. Fully-integrate the “staged” back-ends into the CMake/Autotools build systems, but exclude them from the “all” pseudo-target.
  2. Maintain a separate list of “staged” back-ends, and create an additional option that must be set in order to build them (with an appropriate warning).

We also need to come up with a plan regarding cutting releases. When 3.2 is branched, will all “staged” back-ends be removed? Or will they be left in the distribution so interested parties can build them?

Beyond that, I don’t believe we have established the criteria for back-end promotion. Then again, it may make more sense to use R600 as a guinea pig and address issues as they come up.

>Hi,
>
>I would like to try to keep the staging area discussion going. There
>seems to be a general consensus that a staging area for backends and also
>new features would be acceptable for the LLVM project. What actions
>are required to make the staging area a reality? Is more discussion
>needed? Is anyone willing to volunteer to be the "Code Owner" for the
>staging area, to help move the process forward?
>
>Thanks,
>Tom

The main issue I see is defining exactly how to incorporate the
"staged" back-ends into the build system. I see a couple of
possibilities.

1. Fully-integrate the "staged" back-ends into the CMake/Autotools
   build systems, but exclude them from the "all" pseudo-target.
2. Maintain a separate list of "staged" back-ends, and create an
   additional option that must be set in order to build them (with an
   appropriate warning).

I think it would be good if each backend / feature its own enable flag
rather than lumping them all together into one. This way if one feature
or backend in the staging area was being neglected, then it wouldn't
affect all the others. Do you have any particular preference?

We also need to come up with a plan regarding cutting releases. When
3.2 is branched, will all "staged" back-ends be removed? Or will
they be left in the distribution so interested parties can build
them?

I can't really think of any disadvantages to keeping staged backends
in releases. Being in a release would help a backend get more exposure
and increase the number of users/testers it would get.

Beyond that, I don't believe we have established the criteria for
back-end promotion. Then again, it may make more sense to use R600
as a guinea pig and address issues as they come up.

Here are a few objective criteria that have been mention to me.
- Test Cases
- Assembly Printer
- No pre-MC AsmPrinter or CodeEmitter

There are several subjective ones as well, like coding style and
robustnesses, but those are a little harder to define. Can you think
of any others?

-Tom

I agree, there is no reason to remove it from the source drop of the release. The binaries produced for each release shouldn’t include them though.

-Chris

We also need to come up with a plan regarding cutting releases. When

3.2 is branched, will all “staged” back-ends be removed? Or will

they be left in the distribution so interested parties can build

them?

I can’t really think of any disadvantages to keeping staged backends
in releases. Being in a release would help a backend get more exposure
and increase the number of users/testers it would get.

I agree, there is no reason to remove it from the source drop of the release. The binaries produced for each release shouldn’t include them though.

Sounds good to me, I agree that more exposure is best.

In terms of build system integration, I think it makes sense to do the following:

  1. Add a ENABLE_EXPERIMENTAL or ENABLE_STAGING flag that allows experimental features to be built (default: OFF)
  2. Add an LLVM_STAGING_TARGETS list that contains all of the staging back-ends
  3. Allow LLVM_TARGETS_TO_BUILD to contain a back-end from LLVM_STAGING_TARGETS only if ENABLE_STAGING is ON

This will allow the default configuration to not only skip the staging back-ends, but prohibit them from being built without explicitly setting ENABLE_STAGING. Further, this will allow picking-and-choosing which staging back-ends to build.

Is there any reason not to try this out with the R600 target?

>>
>>> We also need to come up with a plan regarding cutting releases. When
>>> 3.2 is branched, will all "staged" back-ends be removed? Or will
>>> they be left in the distribution so interested parties can build
>>> them?
>>>
>>
>> I can't really think of any disadvantages to keeping staged backends
>> in releases. Being in a release would help a backend get more exposure
>> and increase the number of users/testers it would get.
>
> I agree, there is no reason to remove it from the source drop of the release. The binaries produced for each release shouldn't include them though.

Sounds good to me, I agree that more exposure is best.

In terms of build system integration, I think it makes sense to do the following:

Add a ENABLE_EXPERIMENTAL or ENABLE_STAGING flag that allows experimental features to be built (default: OFF)
Add an LLVM_STAGING_TARGETS list that contains all of the staging back-ends
Allow LLVM_TARGETS_TO_BUILD to contain a back-end from LLVM_STAGING_TARGETS *only* if ENABLE_STAGING is ON

This will allow the default configuration to not only skip the staging back-ends, but prohibit them from being built without explicitly setting ENABLE_STAGING. Further, this will allow picking-and-choosing which staging back-ends to build.

Hi,

I've submitted a patch[1] that adds this option.

[1] http://lists.cs.uiuc.edu/pipermail/llvm-commits/Week-of-Mon-20120730/147282.html

Could we perhaps call these LLVM_ENABLE_EXPERIMENTAL and LLVM_EXPERIMENTAL_TARGETS, instead of STAGING? I don't know how clear the meaning of "staging" will be to someone who hasn't been following this discussion. (Someone who isn't a compiler guru might wonder whether "staging" has some special meaning in the field of compilers.)

The only places that "staging" is currently mentioned in the LLVM sources are:

* the comments in lib/MC/WinCOFFObjectWriter.cpp, which mention "staging data for a COFF relocation entry".

* the comments in utils/llvm-compilers-check, where "staging" means something slightly different from this proposal.

Cheers,
Dave.

>> Add a ENABLE_EXPERIMENTAL or ENABLE_STAGING flag that allows experimental features to be built (default: OFF)
>> Add an LLVM_STAGING_TARGETS list that contains all of the staging back-ends
>> Allow LLVM_TARGETS_TO_BUILD to contain a back-end from LLVM_STAGING_TARGETS *only* if ENABLE_STAGING is ON
>
> I've submitted a patch[1] that adds this option.
>
> [1] http://lists.cs.uiuc.edu/pipermail/llvm-commits/Week-of-Mon-20120730/147282.html

Could we perhaps call these LLVM_ENABLE_EXPERIMENTAL and LLVM_EXPERIMENTAL_TARGETS, instead of STAGING? I don't know how clear the meaning of "staging" will be to someone who hasn't been following this discussion. (Someone who isn't a compiler guru might wonder whether "staging" has some special meaning in the field of compilers.)

I think this makes sense. I'll update the patches.

-Tom

Add a ENABLE_EXPERIMENTAL or ENABLE_STAGING flag that allows experimental features to be built (default: OFF)
Add an LLVM_STAGING_TARGETS list that contains all of the staging back-ends
Allow LLVM_TARGETS_TO_BUILD to contain a back-end from LLVM_STAGING_TARGETS *only* if ENABLE_STAGING is ON

I've submitted a patch[1] that adds this option.

[1] http://lists.cs.uiuc.edu/pipermail/llvm-commits/Week-of-Mon-20120730/147282.html

Could we perhaps call these LLVM_ENABLE_EXPERIMENTAL and LLVM_EXPERIMENTAL_TARGETS, instead of STAGING? I don't know how clear the meaning of "staging" will be to someone who hasn't been following this discussion. (Someone who isn't a compiler guru might wonder whether "staging" has some special meaning in the field of compilers.)

The only places that "staging" is currently mentioned in the LLVM sources are:

* the comments in lib/MC/WinCOFFObjectWriter.cpp, which mention "staging data for a COFF relocation entry".

* the comments in utils/llvm-compilers-check, where "staging" means something slightly different from this proposal.

That's a fair point. I'm okay with calling them EXPERIMENTAL instead of STAGED.

Hi,

Now that --enable-experimental-targets build flags have been added to
the build systems. What needs to be done in order to get the R600
backend added as an experimental target? I've posted an updated
version of the backend to llvm-commits[1], that addresses many of the
criticisms of the backend, but I haven't received any feedback, and I
feel like the submission process has stalled. It seems like the
problem might be that there is not really an established process for
adding an experimental target to the tree. So, I'd like to try to
re-open this discussion, what steps need to be taken to add an
experimental target to the tree?

-Tom

[1] http://lists.cs.uiuc.edu/pipermail/llvm-commits/Week-of-Mon-20120827/149491.html

Dear Tom,

Looks like setting LCOMMDirectiveType in AMDGPUMCAsmInfo.cpp is not
needed anymore? I commented it out, and then LLVM got compiled fine.

- D.