RFC: Add bitcode tests to test-suite

Hi all,

TL;DR: Add *.bc to test-suite; llc *.bc; run some.

We would like to propose adding bitcode tests to the llvm test-suite.

Recent LLVM bugs [2-4] prompted us to look into upstreaming a subset of the tests the Halide library [1] is running and we’d like the community’s feedback on moving forward with this.

Halide uses LLVM and can generate bitcode, but we cannot add C++ tests to test-suite without including the library itself.
This proposal is also potentially useful for other cases where there is no C++ front-end.

As a first step we are interested in adding a set of correctness tests, for testing the IR without running the tests. Since these tests are generated, they are not instrumented like the .ll files in trunk, however we believe checking that llc runs without errors is still useful.
The bitcode files for Halide may also be large, so including them as regression tests is not an option. If the smaller tests are found to be valuable or covering cases no other tests cover, we can instrument them and move them into the llvm trunk further along, but that is not the goal of this proposal.
In addition, we’re not sure whether the format for the tests should be .ll or .bc, we’re open to either.

After this first step, we’re interested in upstreaming bitcode tests and also running them.
We are very interested in tests for multiple architectures, aarch64 in particular, since this is where we have seen things break. This may motivate adding .ll files rather than .bc in order to include the “RUN:” target.
Where would these tests reside and with what directory structure? (similar to test/CodeGen?)

Suggestion on what’s the best approach for extending the test-suite framework for this proposal are more than welcome.

This is just the high-level overview to start off the discussion, I’m sure there are many more aspects to touch on. Looking forward to your feedback!

Thanks,
Alina

[1] http://halide-lang.org/
[2] Broken: r259800 => Fixed: r260131
[3] Broken: r260569 => Fixed: r260701

[4] https://llvm.org/bugs/show_bug.cgi?id=26642

From: "Alina Sbirlea via llvm-dev" <llvm-dev@lists.llvm.org>
To: "llvm-dev" <llvm-dev@lists.llvm.org>
Sent: Wednesday, February 17, 2016 7:25:17 PM
Subject: [llvm-dev] RFC: Add bitcode tests to test-suite

Hi all,

TL;DR: Add *.bc to test-suite; llc *.bc; run some.

We would like to propose adding bitcode tests to the llvm test-suite.

Recent LLVM bugs [2-4] prompted us to look into upstreaming a subset
of the tests the Halide library [1] is running and we'd like the
community's feedback on moving forward with this.

Halide uses LLVM and can generate bitcode, but we cannot add C++
tests to test-suite without including the library itself.
This proposal is also potentially useful for other cases where there
is no C++ front-end.

As a first step we are interested in adding a set of correctness
tests, for testing the IR without running the tests. Since these
tests are generated, they are not instrumented like the .ll files in
trunk, however we believe checking that llc runs without errors is
still useful.
The bitcode files for Halide may also be large, so including them as
regression tests is not an option. If the smaller tests are found to
be valuable or covering cases no other tests cover, we can
instrument them and move them into the llvm trunk further along, but
that is not the goal of this proposal.
In addition, we're not sure whether the format for the tests should
be .ll or .bc, we're open to either.

After this first step, we're interested in upstreaming bitcode tests
and also running them.
We are very interested in tests for multiple architectures, aarch64
in particular, since this is where we have seen things break. This
may motivate adding .ll files rather than .bc in order to include
the "RUN:" target.
Where would these tests reside and with what directory structure?
(similar to test/CodeGen?)

Suggestion on what's the best approach for extending the test-suite
framework for this proposal are more than welcome.

We already have architecture-specific tests in the test suite (e.g. SingleSource/UnitTests/Vector/{SSE,Altivec,etc.}, and Clang can deal with IR inputs. I suppose you need to compile some corresponding runtime library, but this does not seem like a big deal either. Mechanically, I don't see this as particularly complicated. I think the real question is: Is this the best way to have a kind of 'halide buildbot' that can inform the LLVM developer community?

-Hal

Some perhaps relevant aspects that make testing users of LLVM like Halide challenging:

Halide uses the LLVM C++ APIs, but there isn’t a good way to lock-step update it. So if we were to directly test Halide, it wouldn’t link against the new LLVM.

Practically speaking though, the LLVM IR generated by Halide should continue to work with newer LLVM optimizations and code generation. So the idea would be to snapshot the IR in bitcode (which is at least reasonably stable) so that we could replay the tests as LLVM changes. We can freshen the bitcode by re-generating it periodically so it doesn’t drift too far from what Halide actually uses.

The interesting questions IMO are:

  1. Are folks happy using bitcode as the format here? I agree with Hal that it should be easy since Clang will actually Do The Right Thing if given a bitcode input.

  2. Are folks happy with non-execution tests in some cases? I think Alina is looking at whether we can get a runtime library that will allow some of these to actually execute, but at least some of the tests are just snap-shots of a JIT, and would need the full Halide libraries (and introspection) to execute usefully.

-Chandler

I’m a bit confused as to what’s being proposed for immediate action. Is the proposal essentially to add a set of binary bitcode files and ensure that running each of them through LLC does not trigger any assertions? If so, arranging that as an external build bot would seem entirely reasonable. If on the other hand, we were testing for equivalence of the output assembly, that would probably NOT be okay, just because of false positive rate.

(Running with the assumption I rephrased the proposal correctly… if not, this will be totally off topic.)

Beyond Halide, this sounds like a potentially useful general mechanism for integration testing (not unit testing). We (upstream llvm) have something sorta similar today in the “self host clang, see what breaks” workflow. We (my downstream team) have a similar mechanism where we’ve collected a corpus of IR files from key benchmarks (regenerated weekly), that we use to stress test tricky commits before submission.

Standardizing such a mechanism and policy around it’s usage seems useful. My major concerns are a) keeping the corpus small enough to be useful, b) not having them intermixed with unit tests and c) making sure that the frontend authors pay the primary cost to maintain them.

Philip

From: "Chandler Carruth" <chandlerc@google.com>
To: "Hal Finkel" <hfinkel@anl.gov>, "Alina Sbirlea" <alina.sbirlea@gmail.com>
Cc: "llvm-dev" <llvm-dev@lists.llvm.org>
Sent: Wednesday, February 17, 2016 9:34:24 PM
Subject: Re: [llvm-dev] RFC: Add bitcode tests to test-suite

Some perhaps relevant aspects that make testing users of LLVM like
Halide challenging:

Halide uses the LLVM C++ APIs, but there isn't a good way to
lock-step update it. So if we were to directly test Halide, it
wouldn't link against the new LLVM.

Practically speaking though, the LLVM IR generated by Halide should
continue to work with newer LLVM optimizations and code generation.
So the idea would be to snapshot the IR in bitcode (which is at
least reasonably stable) so that we could replay the tests as LLVM
changes. We can freshen the bitcode by re-generating it periodically
so it doesn't drift too far from what Halide actually uses.

The interesting questions IMO are:

1) Are folks happy using bitcode as the format here? I agree with Hal
that it should be easy since Clang will actually Do The Right Thing
if given a bitcode input.

2) Are folks happy with non-execution tests in some cases? I think
Alina is looking at whether we can get a runtime library that will
allow some of these to actually execute, but at least some of the
tests are just snap-shots of a JIT, and would need the full Halide
libraries (and introspection) to execute usefully.

As far as I can tell, Halide is < 100K LOC and has no external dependencies other than LLVM itself. I think we should just add it to the test suite. I realize that means the community updating it for API changes, but if the additional test coverage is as significant as I suspect, and the project authors will help and are responsive, that seems worthwhile. It is a JIT and a heavy generator of vector code, two areas in which our story on regular upstream testing coverage is not great.

-Hal

I think the real question is: Is this the best way to have a kind of ‘halide buildbot’ that can inform the LLVM developer community?

Halide already has a buildbot running every few hours which is being used to inform LLVM developer community when something breaks. It would be a lot more useful however to have the tests in an LLVM repository to inform LLVM devs which test broke right away. You’re right that the underlying reason is the fact that Halide has test coverage of areas currently not covered.

As far as I can tell, Halide is < 100K LOC and has no external dependencies other than LLVM itself. I think we should just add it to the test suite. I realize that means the community updating it for API changes, but if the additional test coverage is as significant as I suspect, and the project authors will help and are responsive, that seems worthwhile. It is a JIT and a heavy generator of vector code, two areas in which our story on regular upstream testing coverage is not great.

Halide can do both JIT and AOT compilation. Would the community be happy to have non-execution tests for the JITted tests and execution tests for the AOT ones? This would in theory use a small set of Halide and not need the entire library, which is what we are trying to avoid here.
The approach is meant to not clutter test-suite with a sizable amount of code but still get the test coverage offered by Halide.

I think the real question is: Is this the best way to have a kind of ‘halide buildbot’ that can inform the LLVM developer community?

Halide already has a buildbot running every few hours which is being used to inform LLVM developer community when something breaks. It would be a lot more useful however to have the tests in an LLVM repository to inform LLVM devs which test broke right away. You’re right that the underlying reason is the fact that Halide has test coverage of areas currently not covered.

It is not clear to me why your “Halide buildbot” is not enough?

As far as I can tell, Halide is < 100K LOC and has no external dependencies other than LLVM itself. I think we should just add it to the test suite.

Which suite are we talking about? Is it https://llvm.org/svn/llvm-project/test-suite/trunk/ ?

I realize that means the community updating it for API changes,

The only project we are maintaining on top of LLVM is clang.
There is nothing in the test-suite repository that link to LLVM (AFAIK), changing this means to add a requirement to build the the test-suite when changing some API in LLVM (even for “NFC” changes).

but if the additional test coverage is as significant as I suspect, and the project authors will help and are responsive, that seems worthwhile. It is a JIT and a heavy generator of vector code, two areas in which our story on regular upstream testing coverage is not great.

Keeping Halide, and the Halide test-suite in a totally separate repository, with a continuous integration with LLVM trunk, provides the same coverage in practice.

Also the test-suite has a mechanism for “External” suites, which could be used here.

I'm a bit confused as to what's being proposed for immediate action. Is
the proposal essentially to add a set of binary bitcode files and ensure
that running each of them through LLC does not trigger any assertions? If
so, arranging that as an external build bot would seem entirely
reasonable. If on the other hand, we were testing for equivalence of the
output assembly, that would probably NOT be okay, just because of false
positive rate.

No, we are not planning for equivalence of output assembly.

(Running with the assumption I rephrased the proposal correctly... if not,
this will be totally off topic.)

Beyond Halide, this sounds like a potentially useful general mechanism for
integration testing (not unit testing). We (upstream llvm) have something
sorta similar today in the "self host clang, see what breaks" workflow. We
(my downstream team) have a similar mechanism where we've collected a
corpus of IR files from key benchmarks (regenerated weekly), that we use to
stress test tricky commits before submission.

Standardizing such a mechanism and policy around it's usage seems useful.
My major concerns are a) keeping the corpus small enough to be useful, b)
not having them intermixed with unit tests and c) making sure that the
*frontend* authors pay the primary cost to maintain them.

From my side I would like to have this standardized. As I said, I think

this proposal's benefits are not just for Halide.
a) I believe the frontend authors need to ensure this. For Halide we would
like to include a larger set of unit tests and a small set of large
applications. The unit tests are meant to make sure no assertions are
triggered. The applications are meant to run and compare with reference
results.
Do you consider such a corpus of IR files reasonable?
b) I agree. How/where should these be integrated into test-suite?
c) Primarily the tests would be maintained by the frontend authors, but may
be updated by llvm devs. For example, in the event of a change in llvm IR
the easiest would be to have the tests as bitcode files.

>I think the real question is: Is this the best way to have a kind of
'halide buildbot' that can inform the LLVM developer community?

Halide already has a buildbot running every few hours which is being used
to inform LLVM developer community when something breaks. It would be a lot
more useful however to have the tests in an LLVM repository to inform LLVM
devs which test broke right away. You're right that the underlying reason
is the fact that Halide has test coverage of areas currently not covered.

It is not clear to me why your "Halide buildbot" is not enough?

> As far as I can tell, Halide is < 100K LOC and has no external
dependencies other than LLVM itself. I think we should just add it to the
test suite.

Which suite are we talking about? Is it
https://llvm.org/svn/llvm-project/test-suite/trunk/ ?

I realize that means the community updating it for API changes,

The only project we are maintaining on top of LLVM is clang.
There is nothing in the test-suite repository that link to LLVM (AFAIK),
changing this means to add a requirement to build the the test-suite when
changing some API in LLVM (even for "NFC" changes).

Yeah, it is out of the question for us to wholesale import Halide and for
us to take on any kind of maintenance burden for updating it.

-- Sean Silva

From: "Sean Silva via llvm-dev" <llvm-dev@lists.llvm.org>
To: "Mehdi Amini" <mehdi.amini@apple.com>
Cc: "llvm-dev" <llvm-dev@lists.llvm.org>
Sent: Thursday, February 18, 2016 12:13:06 AM
Subject: Re: [llvm-dev] RFC: Add bitcode tests to test-suite

>

> > >I think the real question is: Is this the best way to have a
> > >kind
> > >of
> > >'halide buildbot' that can inform the LLVM developer community?
>

> > Halide already has a buildbot running every few hours which is
> > being
> > used to inform LLVM developer community when something breaks. It
> > would be a lot more useful however to have the tests in an LLVM
> > repository to inform LLVM devs which test broke right away.
> > You're
> > right that the underlying reason is the fact that Halide has test
> > coverage of areas currently not covered.
>

> It is not clear to me why your "Halide buildbot" is not enough?

> > > As far as I can tell, Halide is < 100K LOC and has no external
> > > dependencies other than LLVM itself. I think we should just add
> > > it
> > > to the test suite.
>

> Which suite are we talking about? Is it
> https://llvm.org/svn/llvm-project/test-suite/trunk/ ?

> > I realize that means the community updating it for API changes,
>

> The only project we are maintaining on top of LLVM is clang.

> There is nothing in the test-suite repository that link to LLVM
> (AFAIK), changing this means to add a requirement to build the the
> test-suite when changing some API in LLVM (even for "NFC" changes).

Yeah, it is out of the question for us to wholesale import Halide and
for us to take on any kind of maintenance burden for updating it.

I disagree. Under the circumstances, the benefits of the added testing coverage might easily out weight the maintenance burden of updating it (especially, as I said, if the project developers are active in assisting). It might not, but I raised the point because it is certainly worth considering.

-Hal

From: "Mehdi Amini" <mehdi.amini@apple.com>
To: "Alina Sbirlea" <alina.sbirlea@gmail.com>
Cc: "Hal Finkel" <hfinkel@anl.gov>, "llvm-dev"
<llvm-dev@lists.llvm.org>
Sent: Wednesday, February 17, 2016 11:55:46 PM
Subject: Re: [llvm-dev] RFC: Add bitcode tests to test-suite

> >I think the real question is: Is this the best way to have a kind
> >of
> >'halide buildbot' that can inform the LLVM developer community?

> Halide already has a buildbot running every few hours which is
> being
> used to inform LLVM developer community when something breaks. It
> would be a lot more useful however to have the tests in an LLVM
> repository to inform LLVM devs which test broke right away. You're
> right that the underlying reason is the fact that Halide has test
> coverage of areas currently not covered.

It is not clear to me why your "Halide buildbot" is not enough?

> > As far as I can tell, Halide is < 100K LOC and has no external
> > dependencies other than LLVM itself. I think we should just add
> > it
> > to the test suite.

Which suite are we talking about? Is it
https://llvm.org/svn/llvm-project/test-suite/trunk/ ?

> I realize that means the community updating it for API changes,

The only project we are maintaining on top of LLVM is clang.
There is nothing in the test-suite repository that link to LLVM
(AFAIK), changing this means to add a requirement to build the the
test-suite when changing some API in LLVM (even for "NFC" changes).

> but if the additional test coverage is as significant as I suspect,
> and the project authors will help and are responsive, that seems
> worthwhile. It is a JIT and a heavy generator of vector code, two
> areas in which our story on regular upstream testing coverage is
> not
> great.

Keeping Halide, and the Halide test-suite in a totally separate
repository, with a continuous integration with LLVM trunk, provides
the same coverage in practice.

Also the test-suite has a mechanism for "External" suites, which
could be used here.

I think this is a good idea.

-Hal

From: "Hal Finkel via llvm-dev" <llvm-dev@lists.llvm.org>
To: "Mehdi Amini" <mehdi.amini@apple.com>
Cc: "llvm-dev" <llvm-dev@lists.llvm.org>
Sent: Thursday, February 18, 2016 6:51:33 AM
Subject: Re: [llvm-dev] RFC: Add bitcode tests to test-suite

> From: "Mehdi Amini" <mehdi.amini@apple.com>
> To: "Alina Sbirlea" <alina.sbirlea@gmail.com>
> Cc: "Hal Finkel" <hfinkel@anl.gov>, "llvm-dev"
> <llvm-dev@lists.llvm.org>
> Sent: Wednesday, February 17, 2016 11:55:46 PM
> Subject: Re: [llvm-dev] RFC: Add bitcode tests to test-suite

>

> > >I think the real question is: Is this the best way to have a
> > >kind
> > >of
> > >'halide buildbot' that can inform the LLVM developer community?
>

> > Halide already has a buildbot running every few hours which is
> > being
> > used to inform LLVM developer community when something breaks. It
> > would be a lot more useful however to have the tests in an LLVM
> > repository to inform LLVM devs which test broke right away.
> > You're
> > right that the underlying reason is the fact that Halide has test
> > coverage of areas currently not covered.
>
> It is not clear to me why your "Halide buildbot" is not enough?

> > > As far as I can tell, Halide is < 100K LOC and has no external
> > > dependencies other than LLVM itself. I think we should just add
> > > it
> > > to the test suite.
>
> Which suite are we talking about? Is it
> https://llvm.org/svn/llvm-project/test-suite/trunk/ ?

> > I realize that means the community updating it for API changes,
>
> The only project we are maintaining on top of LLVM is clang.
> There is nothing in the test-suite repository that link to LLVM
> (AFAIK), changing this means to add a requirement to build the the
> test-suite when changing some API in LLVM (even for "NFC" changes).

> > but if the additional test coverage is as significant as I
> > suspect,
> > and the project authors will help and are responsive, that seems
> > worthwhile. It is a JIT and a heavy generator of vector code, two
> > areas in which our story on regular upstream testing coverage is
> > not
> > great.
>

> Keeping Halide, and the Halide test-suite in a totally separate
> repository, with a continuous integration with LLVM trunk, provides
> the same coverage in practice.

> Also the test-suite has a mechanism for "External" suites, which
> could be used here.

I think this is a good idea.

However, we'd need to ignore build failures (instead of treating them as actual problems) because we can't block in-tree API updates on out-of-tree projects.

-Hal

Hi Chandler, et al.,

While this proposal to put IR into the test suite technically non-problematic, I've convinced myself that this is a suboptimal direction for the LLVM project. Here's what I think would be better:

- We create a test-suite/Frontends directory, and open this directory to actively-maintained external frontends, subject to the following restrictions:

   - The frontend must be actively maintained, and the project must agree to actively maintain the test-suite version
   - The frontend must use the LLVM API (either C or C++) - no printing textual IR
   - The frontend must have no significant (non-optional) dependencies outside of LLVM itself, or things on which LLVM itself depends
   - The frontend must have regression tests and benchmarks/correctness tests providing significant coverage of the frontend and its associated code generation

Here's the quid pro quo:

   - The LLVM community gains additional testing coverage (which we definitely need)
   - The LLVM community gains extra insight into how its APIs are being used (hopefully allowing us to make more-informed decisions about how to update them)

   - The frontend gains free API updates
   - The frontend's use of LLVM will be more stable

This involves extra work for everybody, but will help us all deliver higher-quality products. Plus, given the constant discussions about the difficulty for external projects to follow API updates, etc., this is a good way to help address those difficulties.

The fact that Halide will provide extra coverage of our vector code generation (aside from whatever we happen to produce from our autovectorizers), and our JIT infrastructure, makes it a good candidate for this. Intel's ispc, POCL, (maybe whatever bit of Mesa uses LLVM), etc. would also be natural candidates should the projects be interested.

Thanks again,
Hal

Hi Chandler, et al.,

While this proposal to put IR into the test suite technically
non-problematic, I've convinced myself that this is a suboptimal direction
for the LLVM project. Here's what I think would be better:

- We create a test-suite/Frontends directory, and open this directory to
actively-maintained external frontends, subject to the following
restrictions:

   - The frontend must be actively maintained, and the project must agree
to actively maintain the test-suite version
   - The frontend must use the LLVM API (either C or C++) - no printing
textual IR
   - The frontend must have no significant (non-optional) dependencies
outside of LLVM itself, or things on which LLVM itself depends
   - The frontend must have regression tests and benchmarks/correctness
tests providing significant coverage of the frontend and its associated
code generation

Here's the quid pro quo:

   - The LLVM community gains additional testing coverage (which we
definitely need)
   - The LLVM community gains extra insight into how its APIs are being
used (hopefully allowing us to make more-informed decisions about how to
update them)

   - The frontend gains free API updates
   - The frontend's use of LLVM will be more stable

This involves extra work for everybody, but will help us all deliver
higher-quality products. Plus, given the constant discussions about the
difficulty for external projects to follow API updates, etc., this is a
good way to help address those difficulties.

Given the extra cost it incurs on LLVM developers changing APIs, this seems
like a problematic tradeoff/not necessarily good. LLVM moves quickly
because it can/it is beneficial, this causes pain/cost to out-of-tree
projects. Moving that cost to LLVM would simply make LLVM move more
slowly/API changes would be made less frequently. I'm not sure that's the
right tradeoff - the LLVM project would end up paying the cost of the
external project but not gaining the advantages of the project being part
of the LLVM Project umbrella (the developers wouldn't be contributing back
to the LLVM community/codebase, etc as they would if it were more of a
Clang, or LLD, etc).

Hi Chandler, et al.,

While this proposal to put IR into the test suite technically non-problematic, I've convinced myself that this is a suboptimal direction for the LLVM project. Here's what I think would be better:

  - We create a test-suite/Frontends directory, and open this directory to actively-maintained external frontends, subject to the following restrictions:

    - The frontend must be actively maintained, and the project must agree to actively maintain the test-suite version
    - The frontend must use the LLVM API (either C or C++) - no printing textual IR
    - The frontend must have no significant (non-optional) dependencies outside of LLVM itself, or things on which LLVM itself depends
    - The frontend must have regression tests and benchmarks/correctness tests providing significant coverage of the frontend and its associated code generation

Here's the quid pro quo:

    - The LLVM community gains additional testing coverage (which we definitely need)
    - The LLVM community gains extra insight into how its APIs are being used (hopefully allowing us to make more-informed decisions about how to update them)

    - The frontend gains free API updates
    - The frontend's use of LLVM will be more stable

This involves extra work for everybody, but will help us all deliver higher-quality products. Plus, given the constant discussions about the difficulty for external projects to follow API updates, etc., this is a good way to help address those difficulties.

The fact that Halide will provide extra coverage of our vector code generation (aside from whatever we happen to produce from our autovectorizers), and our JIT infrastructure, makes it a good candidate for this. Intel's ispc, POCL, (maybe whatever bit of Mesa uses LLVM), etc. would also be natural candidates should the projects be interested.

I think this is a really bad tradeoff and am strongly opposed to this proposal.

If we want to focus on improving test coverage, that's reasonable, but doing so at the cost of requiring LLVM contributors to maintain everyone's frontend is not a reasonable approach.

A couple of alternate approaches which might be worth considering:
1) The IR corpus approach mentioned previously. So long as external teams are willing to update the corpus regularly (weekly), this gives most of the backend coverage with none of the maintenance burden.
2) Use coverage information to determine which code paths Halide covers which are not covered by existing unit tests. Work to improve those unit tests. Using something along the lines with a mutation testing (i.e. change the source code and see what breaks), combined with test reduction (bugpoint), could greatly improve our test coverage in tree fairly quickly. This would require a lot of work from a single contributor, but that's much better than requiring a lot of work from all contributors.

Hi Chandler, et al.,

While this proposal to put IR into the test suite technically
non-problematic, I've convinced myself that this is a suboptimal direction
for the LLVM project. Here's what I think would be better:

  - We create a test-suite/Frontends directory, and open this directory
to actively-maintained external frontends, subject to the following
restrictions:

    - The frontend must be actively maintained, and the project must
agree to actively maintain the test-suite version
    - The frontend must use the LLVM API (either C or C++) - no printing
textual IR
    - The frontend must have no significant (non-optional) dependencies
outside of LLVM itself, or things on which LLVM itself depends
    - The frontend must have regression tests and benchmarks/correctness
tests providing significant coverage of the frontend and its associated
code generation

Here's the quid pro quo:

    - The LLVM community gains additional testing coverage (which we
definitely need)
    - The LLVM community gains extra insight into how its APIs are being
used (hopefully allowing us to make more-informed decisions about how to
update them)

    - The frontend gains free API updates
    - The frontend's use of LLVM will be more stable

This involves extra work for everybody, but will help us all deliver
higher-quality products. Plus, given the constant discussions about the
difficulty for external projects to follow API updates, etc., this is a
good way to help address those difficulties.

The fact that Halide will provide extra coverage of our vector code
generation (aside from whatever we happen to produce from our
autovectorizers), and our JIT infrastructure, makes it a good candidate for
this. Intel's ispc, POCL, (maybe whatever bit of Mesa uses LLVM), etc.
would also be natural candidates should the projects be interested.

I think this is a really bad tradeoff and am strongly opposed to this
proposal.

If we want to focus on improving test coverage, that's reasonable, but
doing so at the cost of requiring LLVM contributors to maintain everyone's
frontend is not a reasonable approach.

A couple of alternate approaches which might be worth considering:
1) The IR corpus approach mentioned previously. So long as external teams
are willing to update the corpus regularly (weekly), this gives most of the
backend coverage with none of the maintenance burden.

Why weekly? & why not bitcode, that would be long lasting? (still, updating
it regularly would be helpful, but in theory we should keep working on the
same bitcode for a fairly long timeframe & means when I go and make
breaking IR changes I don't have to add the test-suite to the list of
things I need to fix :))

2) Use coverage information to determine which code paths Halide covers
which are not covered by existing unit tests. Work to improve those unit
tests. Using something along the lines with a mutation testing (i.e.
change the source code and see what breaks), combined with test reduction
(bugpoint), could greatly improve our test coverage in tree fairly
quickly. This would require a lot of work from a single contributor, but
that's much better than requiring a lot of work from all contributors.

While this would be awesome (& I'd love to see some LLVM/Clang-based
mutation testing tools, and to improve our test coverage using them) that
seems like a pretty big investment that I'm not sure anyone is signing up
for just now.

I should have written bitcode to start with. :slight_smile: All of your points are sound. I said “weekly” mostly as a placeholder for requiring active involvement from the frontend and as a means to keep the two projects roughly in sync. If Halide started generating radically different IR all of a sudden, we want the bitcode tests to reflect that. Fair point. However, before we ask the entire project to sign up for a lot of work, asking some particular motivated person to do so seems reasonable. :slight_smile: I’ll also note that I was thinking of a very simple version initially. Something on the order of “replace all untested lines with llvm_unreachable, reduce one test, rerun coverage, repeat”. This could be done mostly manually and would yield a lot of improvement.

Hi Chandler, et al.,

While this proposal to put IR into the test suite technically
non-problematic, I've convinced myself that this is a suboptimal direction
for the LLVM project. Here's what I think would be better:

  - We create a test-suite/Frontends directory, and open this directory
to actively-maintained external frontends, subject to the following
restrictions:

    - The frontend must be actively maintained, and the project must
agree to actively maintain the test-suite version
    - The frontend must use the LLVM API (either C or C++) - no printing
textual IR
    - The frontend must have no significant (non-optional) dependencies
outside of LLVM itself, or things on which LLVM itself depends
    - The frontend must have regression tests and benchmarks/correctness
tests providing significant coverage of the frontend and its associated
code generation

Here's the quid pro quo:

    - The LLVM community gains additional testing coverage (which we
definitely need)
    - The LLVM community gains extra insight into how its APIs are being
used (hopefully allowing us to make more-informed decisions about how to
update them)

    - The frontend gains free API updates
    - The frontend's use of LLVM will be more stable

This involves extra work for everybody, but will help us all deliver
higher-quality products. Plus, given the constant discussions about the
difficulty for external projects to follow API updates, etc., this is a
good way to help address those difficulties.

The fact that Halide will provide extra coverage of our vector code
generation (aside from whatever we happen to produce from our
autovectorizers), and our JIT infrastructure, makes it a good candidate for
this. Intel's ispc, POCL, (maybe whatever bit of Mesa uses LLVM), etc.
would also be natural candidates should the projects be interested.

I think this is a really bad tradeoff and am strongly opposed to this
proposal.

If we want to focus on improving test coverage, that's reasonable, but
doing so at the cost of requiring LLVM contributors to maintain everyone's
frontend is not a reasonable approach.

A couple of alternate approaches which might be worth considering:
1) The IR corpus approach mentioned previously. So long as external
teams are willing to update the corpus regularly (weekly), this gives most
of the backend coverage with none of the maintenance burden.

Why weekly? & why not bitcode, that would be long lasting? (still,
updating it regularly would be helpful, but in theory we should keep
working on the same bitcode for a fairly long timeframe & means when I go
and make breaking IR changes I don't have to add the test-suite to the list
of things I need to fix :))

I should have written bitcode to start with. :slight_smile: All of your points are
sound.

I said "weekly" mostly as a placeholder for requiring active involvement
from the frontend and as a means to keep the two projects roughly in sync.
If Halide started generating radically different IR all of a sudden, we
want the bitcode tests to reflect that.

Fair enough - I imagine this'd look a lot like retiring old backends. If
someone's not updating it it's mostly their loss, but once it's enough of a
burden on the LLVM project, we just remove it.

2) Use coverage information to determine which code paths Halide covers
which are not covered by existing unit tests. Work to improve those unit
tests. Using something along the lines with a mutation testing (i.e.
change the source code and see what breaks), combined with test reduction
(bugpoint), could greatly improve our test coverage in tree fairly
quickly. This would require a lot of work from a single contributor, but
that's much better than requiring a lot of work from all contributors.

While this would be awesome (& I'd love to see some LLVM/Clang-based
mutation testing tools, and to improve our test coverage using them) that
seems like a pretty big investment that I'm not sure anyone is signing up
for just now.

Fair point. However, before we ask the entire project to sign up for a
lot of work, asking some particular motivated person to do so seems
reasonable. :slight_smile:

Sure - I suspect, realistically, that neither of the expensive options is
really the way to go, though.

I'll also note that I was thinking of a very simple version initially.
Something on the order of "replace all untested lines with
llvm_unreachable, reduce one test, rerun coverage, repeat". This could be
done mostly manually and would yield a lot of improvement.

That sounds more or less like coverage based fuzzing, which we have (in
asan/libFuzzer). Mutation testing's a bit more involved, but would be fun
to have.

- Dave

I support the view of Philip on this topic.
I think that having the bitcode without the burden of the API and linking is the best trade-off for LLVM.

About the bitcode, I'm not convince that it needs to be updated that often: we're interested in coverage for LLVM, not trying to validate that we support correctly frontend X or Y in a particular version. I'd even argue that if Halide version 13 generates a very different IR than Halide 12, then we should keep the Halide 12 generated bitcode in a separate directory because it is likely to stress LLVM differently.

I have more questions for Alina. What kind of tests do you have:

- "the compiler takes the bitcode and generates code without crashing"
- "the compiled test runs without crashing"
- "the compiled test will produce an output that be checked against a reference"
- "the compiled test is meaningful as a benchmarks"

All these different aspects of testing can be interesting, but knowing what we're talking may influence the way forward.

As mentioned before, the test-suite has a mechanism of "external" suites, it may be limited right now but could probably be expanded. Ideally there could be multiple repositories that would just be checked out independently (think about the way we build LLVM alone or LLVM+clang+libcxx+compiler-rt+...).

Hi Chandler, et al.,

While this proposal to put IR into the test suite technically
non-problematic, I've convinced myself that this is a suboptimal direction
for the LLVM project. Here's what I think would be better:

  - We create a test-suite/Frontends directory, and open this directory
to actively-maintained external frontends, subject to the following
restrictions:

    - The frontend must be actively maintained, and the project must
agree to actively maintain the test-suite version
    - The frontend must use the LLVM API (either C or C++) - no
printing textual IR
    - The frontend must have no significant (non-optional) dependencies
outside of LLVM itself, or things on which LLVM itself depends
    - The frontend must have regression tests and
benchmarks/correctness tests providing significant coverage of the frontend
and its associated code generation

Here's the quid pro quo:

    - The LLVM community gains additional testing coverage (which we
definitely need)
    - The LLVM community gains extra insight into how its APIs are
being used (hopefully allowing us to make more-informed decisions about how
to update them)

    - The frontend gains free API updates
    - The frontend's use of LLVM will be more stable

This involves extra work for everybody, but will help us all deliver
higher-quality products. Plus, given the constant discussions about the
difficulty for external projects to follow API updates, etc., this is a
good way to help address those difficulties.

The fact that Halide will provide extra coverage of our vector code
generation (aside from whatever we happen to produce from our
autovectorizers), and our JIT infrastructure, makes it a good candidate for
this. Intel's ispc, POCL, (maybe whatever bit of Mesa uses LLVM), etc.
would also be natural candidates should the projects be interested.

I think this is a really bad tradeoff and am strongly opposed to this
proposal.

If we want to focus on improving test coverage, that's reasonable, but
doing so at the cost of requiring LLVM contributors to maintain everyone's
frontend is not a reasonable approach.

A couple of alternate approaches which might be worth considering:
1) The IR corpus approach mentioned previously. So long as external
teams are willing to update the corpus regularly (weekly), this gives most
of the backend coverage with none of the maintenance burden.

Why weekly? & why not bitcode, that would be long lasting? (still,
updating it regularly would be helpful, but in theory we should keep
working on the same bitcode for a fairly long timeframe & means when I go
and make breaking IR changes I don't have to add the test-suite to the list
of things I need to fix :))

I should have written bitcode to start with. :slight_smile: All of your points are
sound.

I said "weekly" mostly as a placeholder for requiring active involvement
from the frontend and as a means to keep the two projects roughly in sync.
If Halide started generating radically different IR all of a sudden, we
want the bitcode tests to reflect that.

Fair enough - I imagine this'd look a lot like retiring old backends. If
someone's not updating it it's mostly their loss, but once it's enough of a
burden on the LLVM project, we just remove it.

Sure, that's more than reasonable!

To clarify, I don't want the discussion to drift in the direction of
whether to add Halide or other front-ends to LLVM. That's a very involved
decision that would affect all the community and it's not a burden we want
to add. Halide is maintained separately.

The proposal at hand is what is the best way to get test coverage when the
C++ front-end is missing, independent of where the tests come from.
i.e. What is the best test format? Is it ok to have non-runnable tests
checking just the IR?

2) Use coverage information to determine which code paths Halide covers
which are not covered by existing unit tests. Work to improve those unit
tests. Using something along the lines with a mutation testing (i.e.
change the source code and see what breaks), combined with test reduction
(bugpoint), could greatly improve our test coverage in tree fairly
quickly. This would require a lot of work from a single contributor, but
that's much better than requiring a lot of work from all contributors.

While this would be awesome (& I'd love to see some LLVM/Clang-based
mutation testing tools, and to improve our test coverage using them) that
seems like a pretty big investment that I'm not sure anyone is signing up
for just now.

Fair point. However, before we ask the entire project to sign up for a
lot of work, asking some particular motivated person to do so seems
reasonable. :slight_smile:

Sure - I suspect, realistically, that neither of the expensive options is
really the way to go, though.

I'll also note that I was thinking of a very simple version initially.
Something on the order of "replace all untested lines with
llvm_unreachable, reduce one test, rerun coverage, repeat". This could be
done mostly manually and would yield a lot of improvement.

That sounds more or less like coverage based fuzzing, which we have (in
asan/libFuzzer). Mutation testing's a bit more involved, but would be fun
to have.

Agreed on both your points. While these options would be nice, they are not
the immediate solution.
If at some point someone signs up to make this happen, we can revisit the
decision of including the corpus of IR tests.