RFC Adding Fortran tests to LLVM Test Suite

Hello,

In support of ongoing efforts with the new Flang compiler that was recently added to LLVM Project, we plan to expand the LLVM Test Suite to include additional Fortran tests. This will require some infrastructure work, e.g., to specify a Fortran compiler and flags which will then enable the Fortran tests.

We are focusing on tests in the following area:

  • “smaller” language-centric tests,

  • high-performance computing proxy-apps (particularly from Department of Energy projects), similar to the C/C++ proxy apps we already have,

  • OpenMP tests: multi-threaded and GPU offload (once the OpenMP/parallelism testing support was merged into the LLVM Test Suite, separate thread).

As a first step we’ll include the necessary CMake glue for the Fortran SPEC benchmarks, similar to CMake files we use to run the C/C++ ones.

Comments are welcomes, particularly with respect to LLVM developers interested in collaborating on this effort and ideas for Fortran test suites.

Thanks,

Nick

In support of ongoing efforts with the new Flang compiler that was recently added to LLVM Project, we plan to expand the LLVM Test Suite to include additional Fortran tests. This will require some infrastructure work, e.g., to specify a Fortran compiler and flags which will then enable the Fortran tests.

Hi Nick,

This sounds like a no-brainer to me (ie. obviously good thing to have).

We are focusing on tests in the following area:

  • “smaller” language-centric tests,

We already have some for C and C++, it would be nice to reuse the same infrastructure (or make it better while you’re at it). It’s a good proof of concept to the rest of the infrastructure that building Fortran programs works in the test-suite. I’d start here and only add those in the first patch.

  • high-performance computing proxy-apps (particularly from Department of Energy projects), similar to the C/C++ proxy apps we already have,

We had some trouble with the results of those programs in the past, when checking them (ie. diff) against the “golden standard”. You should be able to change the driver program (the main function) to output some values or aggregations that don’t change the precision of the calculations on different architectures and OSs, but still can detect differences if the numbers change too much. It’s a fine tuning process that takes a while but makes for robust checks.

Some programs used to compute a hash of the output and compare that, but it was really hard to find out what’s wrong by just looking at hashes. It also doesn’t help with natural floating point variations between targets, so best to avoid those.

  • OpenMP tests: multi-threaded and GPU offload (once the OpenMP/parallelism testing support was merged into the LLVM Test Suite, separate thread).

Sounds great! Two birds. :slight_smile:

As a first step we’ll include the necessary CMake glue for the Fortran SPEC benchmarks, similar to CMake files we use to run the C/C++ ones.

I imagined the first step to make it work on the test-suite itself, not on the SPEC link. I believe these are two different objectives. One is to make the test-suite validate Fortran code generation at a higher level than just the Lit tests and make sure the upstream code is on par by having public buildbots on it, while the other is to improve downstream testing on non-public benchmarks that will not have a direct effect on commits and general progress validation.

Both are important, but the upstream part is more important for the upstream community. I’d favour that to gain community support and do the SPEC part on the side.

cheers,
–renato

In support of ongoing efforts with the new Flang compiler that was
recently added to LLVM Project, we plan to expand the LLVM Test Suite to
include additional Fortran tests. This will require some infrastructure
work, e.g., to specify a Fortran compiler and flags which will then enable
the Fortran tests.

Hi Nick,

This sounds like a no-brainer to me (ie. obviously good thing to have).

We are focusing on tests in the following area:

- "smaller" language-centric tests,

We already have some for C and C++, it would be nice to reuse the same
infrastructure (or make it better while you're at it). It's a good proof of
concept to the rest of the infrastructure that building Fortran programs
works in the test-suite. I'd start here and only add those in the first
patch.

- high-performance computing proxy-apps (particularly from Department of

Energy projects), similar to the C/C++ proxy apps we already have,

We had some trouble with the results of those programs in the past, when
checking them (ie. diff) against the "golden standard". You should be able
to change the driver program (the main function) to output some values or
aggregations that don't change the precision of the calculations on
different architectures and OSs, but still can detect differences if the
numbers change too much. It's a fine tuning process that takes a while but
makes for robust checks.

Some programs used to compute a hash of the output and compare that, but it
was really hard to find out what's wrong by just looking at hashes. It also
doesn't help with natural floating point variations between targets, so
best to avoid those.

- OpenMP tests: multi-threaded and GPU offload (once the OpenMP/parallelism

testing support was merged into the LLVM Test Suite, separate thread).

Sounds great! Two birds. :slight_smile:

As a first step we'll include the necessary CMake glue for the Fortran SPEC

benchmarks, similar to CMake files we use to run the C/C++ ones.

I imagined the first step to make it work on the test-suite itself, not on
the SPEC link. I believe these are two different objectives. One is to make
the test-suite validate Fortran code generation at a higher level than just
the Lit tests and make sure the upstream code is on par by having public
buildbots on it, while the other is to improve downstream testing on
non-public benchmarks that will not have a direct effect on commits and
general progress validation.

Both are important, but the upstream part is more important for the
upstream community. I'd favour that to gain community support and do the
SPEC part on the side.

FWIW, adding any Fortran benchmark will only make sense while adding
also test suite capabilities. So the first step is compiling Fortran
via the test suite, but the first real tests/benchmarks were supposed
to be the SPEC ones because they are the easiest to add, it literally
just takes a modification of the existing CMAke file (I hope :wink: ).

Including some micro benchmarks with the Fortran plumbing is probably
a good idea to make sure "it works". Then we'll work on adding benchmarks
as described.

Does that sound reasonable?

~ Johannes

FWIW, adding any Fortran benchmark will only make sense while adding
also test suite capabilities. So the first step is compiling Fortran
via the test suite, but the first real tests/benchmarks were supposed
to be the SPEC ones because they are the easiest to add, it literally
just takes a modification of the existing CMAke file (I hope :wink: ).

Not everyone can run SPEC, regardless of how easy it is to integrate it with the test-suite.

If you only add infrastructure to build Fortran programs inside SPEC, then your change would be biased towards an external benchmark that is private to some companies.

To make sure that the infrastructure you create for building Fortran programs across all areas of the test-suite, you need to “show your work” to the upstream community first by making it work for everyone.

Public build-bots will start building those tests and benchmarks (remember, it’s not just benchmarks in there), and you’ll need some time to adjust strategy until it all works across the board.

Then adding your side would be a straightforward step. But adding it as a first thing, could mean other parts will take a lot longer to come, or to be broken upstream and disabled, or never to come.

We had a similar situation in the test-suite for a long time regarding cross-builds. There was some qemu support for a while that in the end only worked in a convoluted way for some downstream project.

We don’t want that to be the case for Fortran. If Fortran is truly upstream, then its benchmarks and (at the very least) tests need to be upstream as well. SPEC is “in addition” to that, because it’s private.

If you want support from the community, you have to give something that is beneficial to the community, however important your part is to your project.

Including some micro benchmarks with the Fortran plumbing is probably
a good idea to make sure “it works”. Then we’ll work on adding benchmarks
as described.

I fear “it works” will just end up “it works for SPEC and a few dummy examples”, which will bitrot like other examples in the past.

cheers,
–renato

FWIW, adding any Fortran benchmark will only make sense while adding
also test suite capabilities. So the first step is compiling Fortran
via the test suite, but the first real tests/benchmarks were supposed
to be the SPEC ones because they are the easiest to add, it literally
just takes a modification of the existing CMAke file (I hope :wink: ).

Not everyone can run SPEC, regardless of how easy it is to integrate it
with the test-suite.

Agreed, unclear if this was ever a question.

If you only add infrastructure to build Fortran programs inside SPEC, then
your change would be biased towards an external benchmark that is private
to some companies.

That doesn't make any sense to me.
Nobody suggested to change anything "inside SPEC".
That is not how the external tests work. External tests integrate
into the LLVM Test Suite harness just as other tests do. Getting
any Fortran SPEC code to work would mean we added all the
infrastructure in the LLVM Test Suite harness to deal with Fortran
files.

As I mentioned in the last email, there will likely be some micro
benchmarks included as part of the Fortran plumbing so we can make
sure it works even if you do not have SPEC lying around. This is no
different to any other addition in the test suite or LLVM in general.

To make sure that the infrastructure you create for building Fortran
programs across all areas of the test-suite, you need to "show your work"
to the upstream community first by making it work for everyone.

Sure. That is what I said in the last email explicitly.

Public build-bots will start building those tests and benchmarks (remember,
it's not just benchmarks in there), and you'll need some time to adjust
strategy until it all works across the board.

Strategy: If you don't set it up to run Fortran codes, it won't.

Then adding your side would be a straightforward step. But adding it as a
first thing, could mean other parts will take a lot longer to come, or to
be broken upstream and disabled, or never to come.

We had a similar situation in the test-suite for a long time regarding
cross-builds. There was some qemu support for a while that in the end only
worked in a convoluted way for some downstream project.

We don't want that to be the case for Fortran. If Fortran is
truly upstream, then its benchmarks and (at the very least) tests need to
be upstream as well. SPEC is "in addition" to that, because it's private.

If you want support from the community, you have to give something that is
beneficial to the community, however important your part is to your project.

Fortran benchmark support in the LLVM Test Suite, and literally
everything else mentioned in the initial RFC, is beneficial to the
community. SPEC support is not something harmful.

Including some micro benchmarks with the Fortran plumbing is probably

a good idea to make sure "it works". Then we'll work on adding benchmarks
as described.

I fear "it works" will just end up "it works for SPEC and a few dummy
examples", which will bitrot like other examples in the past.

How did you come to that conclusion after the initial RFC explicitly listed
other benchmarks and apps we want to include in to the test suite?

~ Johannes

If you only add infrastructure to build Fortran programs inside SPEC, then
your change would be biased towards an external benchmark that is private
to some companies.

That doesn’t make any sense to me.
Nobody suggested to change anything “inside SPEC”.

Good part of your reply assumes I meant what you say above. I didn’t.

We’re talking past each other. Let me try again.

As I said on my original reply, I’m very supportive of the initiative to add Fortran to the test suite. To add tests, benchmarks and openmp. This is very good news.

But the test-suite doesn’t have a core ownership, a group that has a plan and implements all the parts of a bigger design goal. For many years we have tried to unify tests and benchmarks, Kristof did a great job rallying people around and so many other people contributed, but once it “works”, people stop paying attention.

I just want to make sure that the overall support for Fortran in the test-suite is focused on building tests, benchmarks and other tools that are available upstream to all users.

If adding Fortran support on the existing SPEC scripts is orthogonal, then it shouldn’t be part of this discussion. If it’s not, then it shouldn’t be the main driver for the rest of the infrastructure.

Public build-bots will start building those tests and benchmarks (remember,
it’s not just benchmarks in there), and you’ll need some time to adjust
strategy until it all works across the board.

Strategy: If you don’t set it up to run Fortran codes, it won’t.

I’m going to take this as a tongue-in-cheek comment. The reducionism here isn’t really helpful.

Fortran is just the language, but there are architectures and operating systems that need adjusting, too.

Fortran benchmark support in the LLVM Test Suite, and literally
everything else mentioned in the initial RFC, is beneficial to the
community. SPEC support is not something harmful.

We definitely agree on that.

How did you come to that conclusion after the initial RFC explicitly listed
other benchmarks and apps we want to include in to the test suite?

The original RFC was very clear. Your response was less so.

On my reply to the RFC, I said I worry that we’re focusing on SPEC too early. I’d rather make sure it works upstream before adding SPEC to the mix.

The reason I tried to convey (and clearly failed) is that the test-suite isn’t a robust and well designed infrastructure, but a patch-work from different approaches over the years, which seems to “work fine” with what we have.

I may have read that wrong, but it sounded to me as if you were defending the prioritisation of SPEC “and some micro benchmarks” over the rest of the proposal.

I think that’s a mistake, because it risks being the main thing that gets added and then not much else comes later (priorities change, etc).

If my interpretation is wrong, I apologise and we can ignore our past exchange. I’m still very supportive of this RFC. :slight_smile:

cheers,
–renato

If you only add infrastructure to build Fortran programs inside SPEC,

then

your change would be biased towards an external benchmark that is private
to some companies.

That doesn't make any sense to me.
Nobody suggested to change anything "inside SPEC".

Good part of your reply assumes I meant what you say above. I didn't.

We're talking past each other. Let me try again.

As I said on my original reply, I'm very supportive of the initiative to
add Fortran to the test suite. To add tests, benchmarks and openmp. This is
very good news.

But the test-suite doesn't have a core ownership, a group that has a plan
and implements all the parts of a bigger design goal. For many years we
have tried to unify tests and benchmarks, Kristof did a great job rallying
people around and so many other people contributed, but once it "works",
people stop paying attention.

I just want to make sure that the overall support for Fortran in the
test-suite is focused on building tests, benchmarks and other tools that
are available upstream to all users.

If adding Fortran support on the existing SPEC scripts is orthogonal, then
it shouldn't be part of this discussion. If it's not, then it shouldn't be
the main driver for the rest of the infrastructure.

Public build-bots will start building those tests and benchmarks
(remember,

it's not just benchmarks in there), and you'll need some time to adjust
strategy until it all works across the board.

Strategy: If you don't set it up to run Fortran codes, it won't.

I'm going to take this as a tongue-in-cheek comment. The reducionism here
isn't really helpful.

Fortran is just the language, but there are architectures and operating
systems that need adjusting, too.

Fortran benchmark support in the LLVM Test Suite, and literally

everything else mentioned in the initial RFC, is beneficial to the
community. SPEC support is not something harmful.

We definitely agree on that.

How did you come to that conclusion after the initial RFC explicitly listed

other benchmarks and apps we want to include in to the test suite?

The original RFC was very clear. Your response was less so.

On my reply to the RFC, I said I worry that we're focusing on SPEC too
early. I'd rather make sure it works upstream before adding SPEC to the
mix.

The reason I tried to convey (and clearly failed) is that the test-suite
isn't a robust and well designed infrastructure, but a patch-work from
different approaches over the years, which seems to "work fine" with what
we have.

I may have read that wrong, but it sounded to me as if you were defending
the prioritisation of SPEC "and some micro benchmarks" over the rest of the
proposal.

I think that's a mistake, because it risks being the main thing that gets
added and then not much else comes later (priorities change, etc).

If my interpretation is wrong, I apologise and we can ignore our past
exchange. I'm still very supportive of this RFC. :slight_smile:

The way I understand you emails is that you argue against the roadmap because
it lists SPEC as a first proper benchmark/app. This is actually on purpose:

SPEC is a well tested external benchmark suite with existing support in the
LLVM test suite and it allows for stable results with existing compilers. We
know what compiler works with SPEC, we know the expected outputs, we know how
to select different input sizes, we know how to glue it to the Test suite, etc.

The alternative is to bring in new benchmarks/apps which have multiple other
challenges, as you noted before. In order for us to test the support of the
Fortran plumbing with non-trivial programs, SPEC seems like an ideal candidate.

I say this because Nick was working on compiling existing benchmarks and apps
with Flang (sema only) and that often entails dealing with complex undocumented
and unmaintained build systems. That is on top of potential issues wrt. nuerical
stability, non-standard compliant code, ...

Don't get me wrong, adding other benchmarks is already part of the road map we are
committed to. We recently added the C/C++ proxy apps, and we are working on
Parallelism/OpenMP (+offloading) support. This is not a one-off effort.

Please also note that we asked in the mail for benchmark/app ideas so we know
what to look at next. We are certainly committed to work on this well past SPEC
support. I know that is true for the ANL people, DOE people, and I'm very certain
also for the wider Flang community.

~ Johannes

P.S. We're heading into a long Thanksgiving weekend, unclear how reactive I'll be
the next two days. I hope you'll also have a nice and relaxing weekend :slight_smile:

I don’t disagree with your roadmap. If I’m reading correctly, SPEC is only the first benchmark, not the first program.

My point was to add the language tests, and perhaps one small program as a benchmark, to test the infrastructure. SPEC could come in the same batch, to show that the CMake glue works for all parts.

I wouldn’t add CMake glue with SPEC only, as a first step. That’s all I’m saying.

Cheers,
Renato

Renato,

Sorry for not replying right away. After the Thanksgiving break, I was in meeting for most of this week and only now catching up on e-mail.

Thanks for raising your concerns and point taken.

I am new to the llvm-test-suite and spent most of the day looking through it.

As I was planning this out in my head, I do think in the first differential we would add the CMake plumbing and a simple opens source program or two (e.g. hand-coded GEMM or something along those lines). SPEC would come in the next batch.

On a related note, are there any buildbots running the llvm-test-suite or are folks just running llvm-test-suite manually?

As I was planning this out in my head, I do think in the first differential we would add the CMake plumbing and a simple opens source program or two (e.g. hand-coded GEMM or something along those lines). SPEC would come in the next batch.

That sounds like a good initial plan. Do you also plan to add more OSS applications, benchmarks and tests right after SPEC?

The main reason for the test-suite is not benchmark, but end-to-end regression testing on multiple architectures. We can only claim to support languages and targets if we can compile entire applications, run them and get the expected results.

So having a set of language tests and some real world applications on the test-suite will be required for us to claim Fortran support of any kind.

Those are also your best friends in making sure all the work you’ve done on the front-end doesn’t regress on any target, from release to release, or during normal development.

That’s why I’m pushing to have those tests and applications as soon as possible. It’s for Flang’s own benefit.

On a related note, are there any buildbots running the llvm-test-suite or are folks just running llvm-test-suite manually?

Plenty, on both testing and benchmark modes. It’s also part of the release process on all architectures.

You should add support for both testing and benchmark modes, so that the benchmarks (not SPEC) get regression tested as well as be able to report and compare performance numbers.

cheers,
–renato

Renato, See replies below.

Yes, I am working on identifying some OSS applications and benchmarks.

Awesome!

Are they in some public dashboard like this one? http://lab.llvm.org:8011/#/console

Yes, though it’s not really obvious (or even easy) to find that out. :frowning:

You can look at the configuration file:
https://github.com/llvm/llvm-zorg/blob/master/buildbot/osuosl/master/config/builders.py

If the builder is using ClangBuilder, just check the ones that have the option: runTestSuite=True

I can see Arm, AArch64, PPC, IBM-Z, X86.

The ones with testsuite_flags=['--benchmarking-only'] will be running in benchmark mode and submitting the results to the LNT server.

Other builder types may be doing it, but I don’t know about those.

The goal is to have an analogous set-up to what is done now for C/C++. I am just getting up to speed now on it.

I want to make sure I’m not trying to push you, just giving you the runes that aren’t obvious when you first start working on it.

We have a number of people (including me, many times) excited to add tests or infrastructure to the test-suite only to give up half-way because of the mess it is.

Folks really did a good number on it in the past few years (CMake support, statistical analysis, benchmark mode), so it’s probably a lot easier now than when I had to work on it.

Good luck and thanks for taking on such a thankless task. :slight_smile:

cheers,
–renato

Yes, I am working on identifying some OSS applications and benchmarks.

Awesome!

Are they in some public dashboard like this one? http://lab.llvm.org:8011/#/console

Yes, though it’s not really obvious (or even easy) to find that out. :frowning:

You can look at the configuration file:
https://github.com/llvm/llvm-zorg/blob/master/buildbot/osuosl/master/config/builders.py

If the builder is using ClangBuilder, just check the ones that have the option: runTestSuite=True

I can see Arm, AArch64, PPC, IBM-Z, X86.

Note however that many of those aren’t building flang :slight_smile: You might find this builder of more interest, since it’s both building flang and running the test-suite. That’s pretty slow and heavy though, so if you start adding fortran support to the test-suite I’d be happy to add it to some of our lighter, fortran-focused builders (such as this one). Feel free to contact me on the flang slack or via email if there’s anything I can help with.

The ones with testsuite_flags=['--benchmarking-only'] will be running in benchmark mode and submitting the results to the LNT server.

Other builder types may be doing it, but I don’t know about those.

The goal is to have an analogous set-up to what is done now for C/C++. I am just getting up to speed now on it.

I want to make sure I’m not trying to push you, just giving you the runes that aren’t obvious when you first start working on it.

We have a number of people (including me, many times) excited to add tests or infrastructure to the test-suite only to give up half-way because of the mess it is.

Folks really did a good number on it in the past few years (CMake support, statistical analysis, benchmark mode), so it’s probably a lot easier now than when I had to work on it.

Good luck and thanks for taking on such a thankless task. :slight_smile:

+1, I was considering adding FCVS to the test-suite, but you might get to it before I find the time :wink: