Testing Best Practices/Goals (in the context of compiler-rt)

dblaikie · February 10, 2016, 7:06pm

Recently had a bit of a digression in a review thread related to some tests going in to compiler-rt ( http://lists.llvm.org/pipermail/llvm-commits/Week-of-Mon-20160208/330759.html ) and there seems to be some disconnect at least between my expectations and reality. So I figured I’d have a bit of a discussion out here on the dev lists where there’s a bit more visibility.

My basic expectation is that the lit tests in any LLVM project except the test-suite are targeted tests intended to test only the functionality in the project. This seems like a pretty well accepted doctrine across most LLVM projects - most visibly in Clang, where we make a concerted effort not to have tests that execute LLVM optimizations, etc.

There are exceptions/middle ground to this - DIBuilder is in LLVM, but essentially tested in Clang rather than writing LLVM unit tests. It’s somewhat unavoidable that any of the IR building code (IRBuilder, DIBuilder, IR asm printing, etc) is ‘tested’ incidentally in Clang in process of testing Clang’s IR generation. But these are seen as incidental, not intentionally trying to cover LLVM with Clang tests (we don’t add a Clang test if we add a new feature to IRBuilder just to test the IRBuilder).

Another case with some middle ground are things like linker tests and objdump, dwarfdump, etc - in theory to isolate the test we would checkin binaries (or the textual object representation lld had for a while, etc) to test those tools. Some tests instead checkin assembly and assemble it with llvm-mc. Again, not to cover llvm-mc, but on the assumption that llvm-mc is tested, and just using it as a tool to make tests easier to maintain.

So I was surprised to find that the compiler-rt lit tests seem to diverge from this philosophy & contain more intentional end-to-end tests (eg: adding a test there when making a fix to Clang to add a counter to a function that was otherwise missing a counter - I’d expect that to be tested in Clang and that there would already be coverage in compiler-rt for “if a function has a counter, does compiler-rt do the right thing with that counter” (testing whatever code in compiler-rt needs to be tested)).

Am I off base here? Are compiler-rt’s tests fundamentally different to the rest of the LLVM project? Why? Should they continue to be?

Richard_Smith · February 10, 2016, 8:57pm

If a change is made to Clang to make it emit different IR, there
should be a test for that in Clang's test suite. Adding a test to
compiler-rt should generally not be considered sufficient, especially
as plenty of people hack on Clang without compiler-rt checked out /
without running the compiler-rt tests.

On the compiler-rt side, taking the sanitizer tests as an example: I
would view testing that the compiler produces the right machine code
as being an incidental part of the unit test coverage in the same way
that an objdump test covering llvm-mc is incidental coverage -- it
should not be the point of the test, but it may be the most direct way
to demonstrate that the runtime library behaves in the right way in
the presence of the expected set of calls to it.

If the compiler-rt maintainers want to go beyond that and include
end-to-end tests alongside their unit tests, and doing so doesn't make
their tests fragile, I think that's their call, but it might make
organizational sense to separate those tests out into a different
directory at least.

bogner · February 10, 2016, 10:11pm

David Blaikie via cfe-dev <cfe-dev@lists.llvm.org> writes:

Recently had a bit of a digression in a review thread related to some tests
going in to compiler-rt (
[compiler-rt] r260041 - Add coverage tests (defaulted constructors/destructor)
) and there seems to be some disconnect at least between my expectations
and reality. So I figured I'd have a bit of a discussion out here on the
dev lists where there's a bit more visibility.

My basic expectation is that the lit tests in any LLVM project except the
test-suite are targeted tests intended to test only the functionality in
the project. This seems like a pretty well accepted doctrine across most
LLVM projects - most visibly in Clang, where we make a concerted effort not
to have tests that execute LLVM optimizations, etc.

There are exceptions/middle ground to this - DIBuilder is in LLVM, but
essentially tested in Clang rather than writing LLVM unit tests. It's
somewhat unavoidable that any of the IR building code (IRBuilder,
DIBuilder, IR asm printing, etc) is 'tested' incidentally in Clang in
process of testing Clang's IR generation. But these are seen as incidental,
not intentionally trying to cover LLVM with Clang tests (we don't add a
Clang test if we add a new feature to IRBuilder just to test the IRBuilder).

Another case with some middle ground are things like linker tests and
objdump, dwarfdump, etc - in theory to isolate the test we would checkin
binaries (or the textual object representation lld had for a while, etc) to
test those tools. Some tests instead checkin assembly and assemble it with
llvm-mc. Again, not to cover llvm-mc, but on the assumption that llvm-mc is
tested, and just using it as a tool to make tests easier to maintain.

So I was surprised to find that the compiler-rt lit tests seem to diverge
from this philosophy & contain more intentional end-to-end tests (eg:
adding a test there when making a fix to Clang to add a counter to a
function that was otherwise missing a counter - I'd expect that to be
tested in Clang and that there would already be coverage in compiler-rt for
"if a function has a counter, does compiler-rt do the right thing with that
counter" (testing whatever code in compiler-rt needs to be tested)).

Am I off base here? Are compiler-rt's tests fundamentally different to the
rest of the LLVM project? Why? Should they continue to be?

I think there's a bit of grey area in terms testing the runtime -
generally it's pretty hard to use the runtime without a fairly
end-to-end test, so tests of the runtime often end up looking pretty
close to an end-to-end test.

That said, I don't think that should be used as an excuse to sneak
arbitrary end-to-end tests into compiler-rt. We should absolutely write
tests in clang and llvm that we're inputting what we expect to the
runtime and try to keep the tests in compiler-rt as focused on just
exercising the runtime code as possible.

IIUC, the correct place for integration tests in general is somewhere
like test-suite, but I think it's a bit intimidating to some people to
add new tests there (Are there docs on this?). I suspect some of the
profiling related tests in compiler-rt are doing a bit much and should
graduate to a spot in the test-suite (but I don't have time to volunteer
to do the work, unfortunately).

Xinliang_David_Li · February 10, 2016, 10:33pm

David Blaikie via cfe-dev <cfe-dev@lists.llvm.org> writes:
> Recently had a bit of a digression in a review thread related to some
tests
> going in to compiler-rt (
>
[compiler-rt] r260041 - Add coverage tests (defaulted constructors/destructor)
> ) and there seems to be some disconnect at least between my expectations
> and reality. So I figured I'd have a bit of a discussion out here on the
> dev lists where there's a bit more visibility.
>
> My basic expectation is that the lit tests in any LLVM project except the
> test-suite are targeted tests intended to test only the functionality in
> the project. This seems like a pretty well accepted doctrine across most
> LLVM projects - most visibly in Clang, where we make a concerted effort
not
> to have tests that execute LLVM optimizations, etc.
>
> There are exceptions/middle ground to this - DIBuilder is in LLVM, but
> essentially tested in Clang rather than writing LLVM unit tests. It's
> somewhat unavoidable that any of the IR building code (IRBuilder,
> DIBuilder, IR asm printing, etc) is 'tested' incidentally in Clang in
> process of testing Clang's IR generation. But these are seen as
incidental,
> not intentionally trying to cover LLVM with Clang tests (we don't add a
> Clang test if we add a new feature to IRBuilder just to test the
IRBuilder).
>
> Another case with some middle ground are things like linker tests and
> objdump, dwarfdump, etc - in theory to isolate the test we would checkin
> binaries (or the textual object representation lld had for a while, etc)
to
> test those tools. Some tests instead checkin assembly and assemble it
with
> llvm-mc. Again, not to cover llvm-mc, but on the assumption that llvm-mc
is
> tested, and just using it as a tool to make tests easier to maintain.
>
> So I was surprised to find that the compiler-rt lit tests seem to diverge
> from this philosophy & contain more intentional end-to-end tests (eg:
> adding a test there when making a fix to Clang to add a counter to a
> function that was otherwise missing a counter - I'd expect that to be
> tested in Clang and that there would already be coverage in compiler-rt
for
> "if a function has a counter, does compiler-rt do the right thing with
that
> counter" (testing whatever code in compiler-rt needs to be tested)).
>
> Am I off base here? Are compiler-rt's tests fundamentally different to
the
> rest of the LLVM project? Why? Should they continue to be?

I think there's a bit of grey area in terms testing the runtime -
generally it's pretty hard to use the runtime without a fairly
end-to-end test, so tests of the runtime often end up looking pretty
close to an end-to-end test.

That said, I don't think that should be used as an excuse to sneak
arbitrary end-to-end tests into compiler-rt. We should absolutely write
tests in clang and llvm that we're inputting what we expect to the
runtime and try to keep the tests in compiler-rt as focused on just
exercising the runtime code as possible.

Yes, we should not use compiler-rt tests as an excuse of not adding
clang/LLVM test. The latter should always be added if possible -- they are
platform independent and is the first level of defense. runtime test's
focus is also more on the runtime lib itself and interaction between
runtime, compiler, binutils and other tools.

David

Alexey_Samsonov1 · February 10, 2016, 11:55pm

I mostly agree with what Richard and Justin said. Adding a few notes about
the general strategy we use:

(1) lit tests which look "end-to-end" proved to be way more convenient for
testing runtime libraries than unit tests. We do have
the latter, and use them to provide test coverage for utility functions,
but we quite often accompany fix to the runtime library with
"end-to-end" small reproducer extracted from the real-world code that
exposed the issue.
Incidentally, this tests a whole lot of other functionality: Clang driver,
frontend, LLVM passes, etc, but it's not the intent of the test.
These tests are sometimes platform-specific and poorly portable, but they
are more reliable (we make the same steps as the
user of the compiler), and serve the purpose of documentation.

(2) If we change LLVM instrumentation - we add a test to LLVM. If we change
Clang code generation or driver behavior - we add
a test to Clang. No excuses here.

(3) Sometimes we still add a compiler-rt test for the change in LLVM or
Clang: e.g. if we enhance Clang frontend to teach UBSan
to detecting yet another kind of overflow, it makes sense to add a test to
UBSan test-suite that demonstrates it, in addition to
Clang test verifying that we emit a call to UBSan runtime. Also,
compiler-rt test would allow us to verify that the actual error report
we present to the user is sane.

(4) True, we're intimidated by test-suite I feel that current use of
compiler-rt test suite to check compiler-rt libs better follows
the doctrine described by David. Also, there's significant complexity in
compiler-rt test suite that narrows the tests executed
to those supported by current host.

dblaikie · February 11, 2016, 9:44pm

David Blaikie via cfe-dev <cfe-dev@lists.llvm.org> writes:
> Recently had a bit of a digression in a review thread related to some
tests
> going in to compiler-rt (
>
[compiler-rt] r260041 - Add coverage tests (defaulted constructors/destructor)
> ) and there seems to be some disconnect at least between my expectations
> and reality. So I figured I'd have a bit of a discussion out here on the
> dev lists where there's a bit more visibility.
>
> My basic expectation is that the lit tests in any LLVM project except
the
> test-suite are targeted tests intended to test only the functionality in
> the project. This seems like a pretty well accepted doctrine across most
> LLVM projects - most visibly in Clang, where we make a concerted effort
not
> to have tests that execute LLVM optimizations, etc.
>
> There are exceptions/middle ground to this - DIBuilder is in LLVM, but
> essentially tested in Clang rather than writing LLVM unit tests. It's
> somewhat unavoidable that any of the IR building code (IRBuilder,
> DIBuilder, IR asm printing, etc) is 'tested' incidentally in Clang in
> process of testing Clang's IR generation. But these are seen as
incidental,
> not intentionally trying to cover LLVM with Clang tests (we don't add a
> Clang test if we add a new feature to IRBuilder just to test the
IRBuilder).
>
> Another case with some middle ground are things like linker tests and
> objdump, dwarfdump, etc - in theory to isolate the test we would checkin
> binaries (or the textual object representation lld had for a while,
etc) to
> test those tools. Some tests instead checkin assembly and assemble it
with
> llvm-mc. Again, not to cover llvm-mc, but on the assumption that
llvm-mc is
> tested, and just using it as a tool to make tests easier to maintain.
>
> So I was surprised to find that the compiler-rt lit tests seem to
diverge
> from this philosophy & contain more intentional end-to-end tests (eg:
> adding a test there when making a fix to Clang to add a counter to a
> function that was otherwise missing a counter - I'd expect that to be
> tested in Clang and that there would already be coverage in compiler-rt
for
> "if a function has a counter, does compiler-rt do the right thing with
that
> counter" (testing whatever code in compiler-rt needs to be tested)).
>
> Am I off base here? Are compiler-rt's tests fundamentally different to
the
> rest of the LLVM project? Why? Should they continue to be?

I think there's a bit of grey area in terms testing the runtime -
generally it's pretty hard to use the runtime without a fairly
end-to-end test, so tests of the runtime often end up looking pretty
close to an end-to-end test.

That said, I don't think that should be used as an excuse to sneak
arbitrary end-to-end tests into compiler-rt. We should absolutely write
tests in clang and llvm that we're inputting what we expect to the
runtime and try to keep the tests in compiler-rt as focused on just
exercising the runtime code as possible.

Yes, we should not use compiler-rt tests as an excuse of not adding
clang/LLVM test. The latter should always be added if possible -- they are
platform independent and is the first level of defense. runtime test's
focus is also more on the runtime lib itself and interaction between
runtime, compiler, binutils and other tools.

This latter bit is where there's some disagreement.

The LLVM project generally expects the "make check" style tests to be fast
and narrow in scope - LLVM tests test LLVM, Clang tests Clang, and it would
seem compiler-rt tests should be just about testing the runtime lib,
specifically. Sure, that means making sure it can handle the diversity of
input/output, etc. But if two inputs to Clang produce the same output we
only write one LLVM test for that output and I would expect the same of
compiler-rt - if two inputs to Clang produce the same sort of output (a
function with a counter, in the example we started with) I'd expect only
one test for compiler-rt, to check that it can handle a function with a
counter.

(generally: if a change wasn't made to compiler-rt, I wouldn't expect a
test to be added (except to makeup for cases of missing compiler-rt
coverage))

dblaikie · February 11, 2016, 9:50pm

I mostly agree with what Richard and Justin said. Adding a few notes about
the general strategy we use:

(1) lit tests which look "end-to-end" proved to be way more convenient for
testing runtime libraries than unit tests.

We do have

the latter, and use them to provide test coverage for utility functions,
but we quite often accompany fix to the runtime library with
"end-to-end" small reproducer extracted from the real-world code that
exposed the issue.
Incidentally, this tests a whole lot of other functionality: Clang driver,
frontend, LLVM passes, etc, but it's not the intent of the test.

Indeed - this is analogous to the tests for, say, LLD that use llvm-mc to
produce the inputs rather than checking in object files. That area is open
to some discussion as to just how many tools we should rope in/how isolated
we should make tests (eg: maybe building the json object file format was
going too far towards isolation? Not clear - opinions differ). But the
point of the test is to test the compiler-rt functionality that was
added/removed/modified.

I think most people are in agreement with that, while acknowledging the
fuzzy line about how isolated we might be.

These tests are sometimes platform-specific and poorly portable, but they
are more reliable (we make the same steps as the
user of the compiler), and serve the purpose of documentation.

(2) If we change LLVM instrumentation - we add a test to LLVM. If we
change Clang code generation or driver behavior - we add
a test to Clang. No excuses here.

(3) Sometimes we still add a compiler-rt test for the change in LLVM or
Clang: e.g. if we enhance Clang frontend to teach UBSan
to detecting yet another kind of overflow, it makes sense to add a test to
UBSan test-suite that demonstrates it, in addition to
Clang test verifying that we emit a call to UBSan runtime. Also,
compiler-rt test would allow us to verify that the actual error report
we present to the user is sane.

This bit ^ is a bit unclear to me. If there was no change to the UBSan
runtime, and the code generated by Clang is equivalent/similar to an
existing use of the UBSan runtime - what is it that the new compiler-rt
test is providing? (perhaps you could give a concrete example you had in
mind to look at?)

(4) True, we're intimidated by test-suite I feel that current use of
compiler-rt test suite to check compiler-rt libs better follows
the doctrine described by David.

Which David? (I guess David Li, not me)

I think maybe what could be worth doing would be separating out the
broader/intentionally "end to end" sort of tests from the ones intended to
test compiler-rt in relative isolation.

Most importantly, I'd expect only the latter to run in a "make check-all"
run, as we do for Clang/LLVM, etc.

Alexey_Samsonov1 · February 13, 2016, 1:43am

I mostly agree with what Richard and Justin said. Adding a few notes
about the general strategy we use:

(1) lit tests which look "end-to-end" proved to be way more convenient
for testing runtime libraries than unit tests.

We do have

the latter, and use them to provide test coverage for utility functions,
but we quite often accompany fix to the runtime library with
"end-to-end" small reproducer extracted from the real-world code that
exposed the issue.
Incidentally, this tests a whole lot of other functionality: Clang
driver, frontend, LLVM passes, etc, but it's not the intent of the test.

Indeed - this is analogous to the tests for, say, LLD that use llvm-mc to
produce the inputs rather than checking in object files. That area is open
to some discussion as to just how many tools we should rope in/how isolated
we should make tests (eg: maybe building the json object file format was
going too far towards isolation? Not clear - opinions differ). But the
point of the test is to test the compiler-rt functionality that was
added/removed/modified.

I think most people are in agreement with that, while acknowledging the
fuzzy line about how isolated we might be.

Yes.

These tests are sometimes platform-specific and poorly portable, but they
are more reliable (we make the same steps as the
user of the compiler), and serve the purpose of documentation.

(2) If we change LLVM instrumentation - we add a test to LLVM. If we
change Clang code generation or driver behavior - we add
a test to Clang. No excuses here.

(3) Sometimes we still add a compiler-rt test for the change in LLVM or
Clang: e.g. if we enhance Clang frontend to teach UBSan
to detecting yet another kind of overflow, it makes sense to add a test
to UBSan test-suite that demonstrates it, in addition to
Clang test verifying that we emit a call to UBSan runtime. Also,
compiler-rt test would allow us to verify that the actual error report
we present to the user is sane.

This bit ^ is a bit unclear to me. If there was no change to the UBSan
runtime, and the code generated by Clang is equivalent/similar to an
existing use of the UBSan runtime - what is it that the new compiler-rt
test is providing? (perhaps you could give a concrete example you had in
mind to look at?)

See r235568 (change to Clang) followed by r235569 (change to compiler-rt
test). Now, it's a cheat because I'm fixing test, not adding it. However, I
would have definitely added it, if it was missing. In this case, a change
to Clang
instrumentation (arguments passed to UBSan runtime callbacks) improved the
user-facing part of the tool, and compiler-rt test suite is a good place to
verify that.

You may argue that Clang test would have been enough (I disagree with
that), or that it qualifies as "adding coverage" (maybe).

(4) True, we're intimidated by test-suite I feel that current use of
compiler-rt test suite to check compiler-rt libs better follows
the doctrine described by David.

Which David? (I guess David Li, not me)

Nope, paragraph 2 from your original email.

I think maybe what could be worth doing would be separating out the
broader/intentionally "end to end" sort of tests from the ones intended to
test compiler-rt in relative isolation.

It's really hard to draw the line here, even some of compiler-rt unit tests
require instrumentation, therefore depend on new features of Clang/LLVM.
Unlike builtins, which are
trivial to test in isolation, testing sanitizer runtimes in isolation (w/o
compiler) is often hard to implement (we tried to do so for TSan, but found
unit tests extremely hard to write),
and is barely useful - compiler-rt runtimes don't consist of modules (like
LLVMCodeGen and LLVMMC for instance), and are never used w/o the compiler
anyway.

Most importantly, I'd expect only the latter to run in a "make check-all"
run, as we do for Clang/LLVM, etc.

And now we're getting to the goals Why would such a change be good? Do
you worry about the time it takes to execute the test suite?

_sean_silva · February 13, 2016, 1:48am

I mostly agree with what Richard and Justin said. Adding a few notes
about the general strategy we use:

(1) lit tests which look "end-to-end" proved to be way more convenient
for testing runtime libraries than unit tests.

We do have

the latter, and use them to provide test coverage for utility functions,
but we quite often accompany fix to the runtime library with
"end-to-end" small reproducer extracted from the real-world code that
exposed the issue.
Incidentally, this tests a whole lot of other functionality: Clang
driver, frontend, LLVM passes, etc, but it's not the intent of the test.

Indeed - this is analogous to the tests for, say, LLD that use llvm-mc to
produce the inputs rather than checking in object files. That area is open
to some discussion as to just how many tools we should rope in/how isolated
we should make tests (eg: maybe building the json object file format was
going too far towards isolation? Not clear - opinions differ). But the
point of the test is to test the compiler-rt functionality that was
added/removed/modified.

I think most people are in agreement with that, while acknowledging the
fuzzy line about how isolated we might be.

Yes.

These tests are sometimes platform-specific and poorly portable, but
they are more reliable (we make the same steps as the
user of the compiler), and serve the purpose of documentation.

(2) If we change LLVM instrumentation - we add a test to LLVM. If we
change Clang code generation or driver behavior - we add
a test to Clang. No excuses here.

(3) Sometimes we still add a compiler-rt test for the change in LLVM or
Clang: e.g. if we enhance Clang frontend to teach UBSan
to detecting yet another kind of overflow, it makes sense to add a test
to UBSan test-suite that demonstrates it, in addition to
Clang test verifying that we emit a call to UBSan runtime. Also,
compiler-rt test would allow us to verify that the actual error report
we present to the user is sane.

This bit ^ is a bit unclear to me. If there was no change to the UBSan
runtime, and the code generated by Clang is equivalent/similar to an
existing use of the UBSan runtime - what is it that the new compiler-rt
test is providing? (perhaps you could give a concrete example you had in
mind to look at?)

See r235568 (change to Clang) followed by r235569 (change to compiler-rt
test). Now, it's a cheat because I'm fixing test, not adding it. However, I
would have definitely added it, if it was missing. In this case, a change
to Clang
instrumentation (arguments passed to UBSan runtime callbacks) improved the
user-facing part of the tool, and compiler-rt test suite is a good place to
verify that.

You may argue that Clang test would have been enough (I disagree with
that), or that it qualifies as "adding coverage" (maybe).

Yeah, verifying the intended end-user experience is important (for changes
that are done primarily for the purpose of changing the end-user
experience).

-- Sean Silva

dblaikie · February 17, 2016, 4:45pm

I mostly agree with what Richard and Justin said. Adding a few notes
about the general strategy we use:

(1) lit tests which look "end-to-end" proved to be way more convenient
for testing runtime libraries than unit tests.

We do have

the latter, and use them to provide test coverage for utility functions,
but we quite often accompany fix to the runtime library with
"end-to-end" small reproducer extracted from the real-world code that
exposed the issue.
Incidentally, this tests a whole lot of other functionality: Clang
driver, frontend, LLVM passes, etc, but it's not the intent of the test.

Indeed - this is analogous to the tests for, say, LLD that use llvm-mc to
produce the inputs rather than checking in object files. That area is open
to some discussion as to just how many tools we should rope in/how isolated
we should make tests (eg: maybe building the json object file format was
going too far towards isolation? Not clear - opinions differ). But the
point of the test is to test the compiler-rt functionality that was
added/removed/modified.

I think most people are in agreement with that, while acknowledging the
fuzzy line about how isolated we might be.

Yes.

These tests are sometimes platform-specific and poorly portable, but
they are more reliable (we make the same steps as the
user of the compiler), and serve the purpose of documentation.

(2) If we change LLVM instrumentation - we add a test to LLVM. If we
change Clang code generation or driver behavior - we add
a test to Clang. No excuses here.

(3) Sometimes we still add a compiler-rt test for the change in LLVM or
Clang: e.g. if we enhance Clang frontend to teach UBSan
to detecting yet another kind of overflow, it makes sense to add a test
to UBSan test-suite that demonstrates it, in addition to
Clang test verifying that we emit a call to UBSan runtime. Also,
compiler-rt test would allow us to verify that the actual error report
we present to the user is sane.

This bit ^ is a bit unclear to me. If there was no change to the UBSan
runtime, and the code generated by Clang is equivalent/similar to an
existing use of the UBSan runtime - what is it that the new compiler-rt
test is providing? (perhaps you could give a concrete example you had in
mind to look at?)

See r235568 (change to Clang) followed by r235569 (change to compiler-rt
test). Now, it's a cheat because I'm fixing test, not adding it. However, I
would have definitely added it, if it was missing.

Right, I think the difference here is "if it was missing" - the test case
itself seems like it could be a reasonable one (are there other tests of
the same compiler-rt functionality? (I assume the compiler-rt functionality
is the implementation of sadd/ssub?))

In this case, a change to Clang
instrumentation (arguments passed to UBSan runtime callbacks) improved the
user-facing part of the tool, and compiler-rt test suite is a good place to
verify that.

This seems like the problematic part - changes to LLVM improve the
user-facing part of Clang, but we don't add end-to-end tests of that, as a
general rule. I'm trying to understand why the difference between that and
compiler-rt

dblaikie · February 17, 2016, 4:46pm

I mostly agree with what Richard and Justin said. Adding a few notes
about the general strategy we use:

(1) lit tests which look "end-to-end" proved to be way more convenient
for testing runtime libraries than unit tests.

We do have

the latter, and use them to provide test coverage for utility
functions, but we quite often accompany fix to the runtime library with
"end-to-end" small reproducer extracted from the real-world code that
exposed the issue.
Incidentally, this tests a whole lot of other functionality: Clang
driver, frontend, LLVM passes, etc, but it's not the intent of the test.

Indeed - this is analogous to the tests for, say, LLD that use llvm-mc
to produce the inputs rather than checking in object files. That area is
open to some discussion as to just how many tools we should rope in/how
isolated we should make tests (eg: maybe building the json object file
format was going too far towards isolation? Not clear - opinions differ).
But the point of the test is to test the compiler-rt functionality that was
added/removed/modified.

I think most people are in agreement with that, while acknowledging the
fuzzy line about how isolated we might be.

Yes.

These tests are sometimes platform-specific and poorly portable, but
they are more reliable (we make the same steps as the
user of the compiler), and serve the purpose of documentation.

(2) If we change LLVM instrumentation - we add a test to LLVM. If we
change Clang code generation or driver behavior - we add
a test to Clang. No excuses here.

(3) Sometimes we still add a compiler-rt test for the change in LLVM or
Clang: e.g. if we enhance Clang frontend to teach UBSan
to detecting yet another kind of overflow, it makes sense to add a test
to UBSan test-suite that demonstrates it, in addition to
Clang test verifying that we emit a call to UBSan runtime. Also,
compiler-rt test would allow us to verify that the actual error report
we present to the user is sane.

This bit ^ is a bit unclear to me. If there was no change to the UBSan
runtime, and the code generated by Clang is equivalent/similar to an
existing use of the UBSan runtime - what is it that the new compiler-rt
test is providing? (perhaps you could give a concrete example you had in
mind to look at?)

See r235568 (change to Clang) followed by r235569 (change to compiler-rt
test). Now, it's a cheat because I'm fixing test, not adding it. However, I
would have definitely added it, if it was missing. In this case, a change
to Clang
instrumentation (arguments passed to UBSan runtime callbacks) improved
the user-facing part of the tool, and compiler-rt test suite is a good
place to verify that.

You may argue that Clang test would have been enough (I disagree with
that), or that it qualifies as "adding coverage" (maybe).

Yeah, verifying the intended end-user experience is important (for changes
that are done primarily for the purpose of changing the end-user
experience).

This seems like the problematic part - changes to LLVM improve the
user-facing part of Clang (& Swift and any other LLVM user), but we don't
add end-to-end tests of that, as a general rule. I'm trying to understand
why the difference between that and compiler-rt.

dblaikie · February 26, 2016, 8:31pm

Ping

_sean_silva · February 26, 2016, 9:07pm

I mostly agree with what Richard and Justin said. Adding a few notes
about the general strategy we use:

(1) lit tests which look "end-to-end" proved to be way more convenient
for testing runtime libraries than unit tests.

We do have

the latter, and use them to provide test coverage for utility
functions, but we quite often accompany fix to the runtime library with
"end-to-end" small reproducer extracted from the real-world code that
exposed the issue.
Incidentally, this tests a whole lot of other functionality: Clang
driver, frontend, LLVM passes, etc, but it's not the intent of the test.

Indeed - this is analogous to the tests for, say, LLD that use llvm-mc
to produce the inputs rather than checking in object files. That area is
open to some discussion as to just how many tools we should rope in/how
isolated we should make tests (eg: maybe building the json object file
format was going too far towards isolation? Not clear - opinions differ).
But the point of the test is to test the compiler-rt functionality that was
added/removed/modified.

I think most people are in agreement with that, while acknowledging the
fuzzy line about how isolated we might be.

Yes.

These tests are sometimes platform-specific and poorly portable, but
they are more reliable (we make the same steps as the
user of the compiler), and serve the purpose of documentation.

(2) If we change LLVM instrumentation - we add a test to LLVM. If we
change Clang code generation or driver behavior - we add
a test to Clang. No excuses here.

(3) Sometimes we still add a compiler-rt test for the change in LLVM or
Clang: e.g. if we enhance Clang frontend to teach UBSan
to detecting yet another kind of overflow, it makes sense to add a test
to UBSan test-suite that demonstrates it, in addition to
Clang test verifying that we emit a call to UBSan runtime. Also,
compiler-rt test would allow us to verify that the actual error report
we present to the user is sane.

This bit ^ is a bit unclear to me. If there was no change to the UBSan
runtime, and the code generated by Clang is equivalent/similar to an
existing use of the UBSan runtime - what is it that the new compiler-rt
test is providing? (perhaps you could give a concrete example you had in
mind to look at?)

See r235568 (change to Clang) followed by r235569 (change to compiler-rt
test). Now, it's a cheat because I'm fixing test, not adding it. However, I
would have definitely added it, if it was missing.

Right, I think the difference here is "if it was missing" - the test case
itself seems like it could be a reasonable one (are there other tests of
the same compiler-rt functionality? (I assume the compiler-rt functionality
is the implementation of sadd/ssub?))

In this case, a change to Clang
instrumentation (arguments passed to UBSan runtime callbacks) improved
the user-facing part of the tool, and compiler-rt test suite is a good
place to verify that.

This seems like the problematic part - changes to LLVM improve the
user-facing part of Clang, but we don't add end-to-end tests of that, as a
general rule. I'm trying to understand why the difference between that and
compiler-rt

In what way do changes in LLVM change the user-facing part of Clang?
It obviously depends on how broadly one defines user-facing. Is a 1%
performance improvement from a particular optimization user-facing? Is
better debug info accuracy user-facing? I'm not sure. But it seems clear
that "the user sees a diagnostic or not" definitely is.

Also, I think part of this is that in compiler-rt there are usually more
moving parts we don't control. E.g. it isn't just the interface between
LLVM and clang. The information needs to pass through archivers, linkers,
runtime loaders, etc. that all may have issues that affect whether the user
sees the final result. In general the interface between LLVM and clang has
no middlemen so there really isn't anything to check.

-- Sean Silva

dblaikie · February 26, 2016, 9:11pm

I mostly agree with what Richard and Justin said. Adding a few notes
about the general strategy we use:

(1) lit tests which look "end-to-end" proved to be way more convenient
for testing runtime libraries than unit tests.

We do have

the latter, and use them to provide test coverage for utility
functions, but we quite often accompany fix to the runtime library with
"end-to-end" small reproducer extracted from the real-world code that
exposed the issue.
Incidentally, this tests a whole lot of other functionality: Clang
driver, frontend, LLVM passes, etc, but it's not the intent of the test.

Indeed - this is analogous to the tests for, say, LLD that use llvm-mc
to produce the inputs rather than checking in object files. That area is
open to some discussion as to just how many tools we should rope in/how
isolated we should make tests (eg: maybe building the json object file
format was going too far towards isolation? Not clear - opinions differ).
But the point of the test is to test the compiler-rt functionality that was
added/removed/modified.

I think most people are in agreement with that, while acknowledging the
fuzzy line about how isolated we might be.

Yes.

These tests are sometimes platform-specific and poorly portable, but
they are more reliable (we make the same steps as the
user of the compiler), and serve the purpose of documentation.

(2) If we change LLVM instrumentation - we add a test to LLVM. If we
change Clang code generation or driver behavior - we add
a test to Clang. No excuses here.

(3) Sometimes we still add a compiler-rt test for the change in LLVM
or Clang: e.g. if we enhance Clang frontend to teach UBSan
to detecting yet another kind of overflow, it makes sense to add a
test to UBSan test-suite that demonstrates it, in addition to
Clang test verifying that we emit a call to UBSan runtime. Also,
compiler-rt test would allow us to verify that the actual error report
we present to the user is sane.

This bit ^ is a bit unclear to me. If there was no change to the UBSan
runtime, and the code generated by Clang is equivalent/similar to an
existing use of the UBSan runtime - what is it that the new compiler-rt
test is providing? (perhaps you could give a concrete example you had in
mind to look at?)

See r235568 (change to Clang) followed by r235569 (change to compiler-rt
test). Now, it's a cheat because I'm fixing test, not adding it. However, I
would have definitely added it, if it was missing.

Right, I think the difference here is "if it was missing" - the test case
itself seems like it could be a reasonable one (are there other tests of
the same compiler-rt functionality? (I assume the compiler-rt functionality
is the implementation of sadd/ssub?))

In this case, a change to Clang
instrumentation (arguments passed to UBSan runtime callbacks) improved
the user-facing part of the tool, and compiler-rt test suite is a good
place to verify that.

This seems like the problematic part - changes to LLVM improve the
user-facing part of Clang, but we don't add end-to-end tests of that, as a
general rule. I'm trying to understand why the difference between that and
compiler-rt

In what way do changes in LLVM change the user-facing part of Clang?
It obviously depends on how broadly one defines user-facing. Is a 1%
performance improvement from a particular optimization user-facing? Is
better debug info accuracy user-facing? I'm not sure. But it seems clear
that "the user sees a diagnostic or not" definitely is.

There's more than just performance in LLVM - ABI features, and yes, I'd
argue some pieces of debug info are pretty user facing (as are some
optimizations). We also have the remarks system in place now. (also the
compiler crashing (or not) is pretty user facing).

Also, I think part of this is that in compiler-rt there are usually more
moving parts we don't control. E.g. it isn't just the interface between
LLVM and clang. The information needs to pass through archivers, linkers,
runtime loaders, etc. that all may have issues that affect whether the user
sees the final result. In general the interface between LLVM and clang has
no middlemen so there really isn't anything to check.

Correctness/behavior of the compiler depends on those things too (linkers,
loaders, etc) to produce the final working product the user requested. If
we emitted symbols with the wrong linkage we could produce linker errors,
drop important entities, etc. But we don't generally test that the output
of LLVM/Clang produces the right binary when linked, we test that it
produces the right linkages on the resulting entities.

- David

Alexey_Samsonov1 · February 26, 2016, 9:21pm

Sorry, lost track of this thread.

I mostly agree with what Richard and Justin said. Adding a few notes
about the general strategy we use:

(1) lit tests which look "end-to-end" proved to be way more convenient
for testing runtime libraries than unit tests.

We do have

the latter, and use them to provide test coverage for utility
functions, but we quite often accompany fix to the runtime library with
"end-to-end" small reproducer extracted from the real-world code that
exposed the issue.
Incidentally, this tests a whole lot of other functionality: Clang
driver, frontend, LLVM passes, etc, but it's not the intent of the test.

Indeed - this is analogous to the tests for, say, LLD that use llvm-mc
to produce the inputs rather than checking in object files. That area is
open to some discussion as to just how many tools we should rope in/how
isolated we should make tests (eg: maybe building the json object file
format was going too far towards isolation? Not clear - opinions differ).
But the point of the test is to test the compiler-rt functionality that was
added/removed/modified.

I think most people are in agreement with that, while acknowledging the
fuzzy line about how isolated we might be.

Yes.

These tests are sometimes platform-specific and poorly portable, but
they are more reliable (we make the same steps as the
user of the compiler), and serve the purpose of documentation.

(2) If we change LLVM instrumentation - we add a test to LLVM. If we
change Clang code generation or driver behavior - we add
a test to Clang. No excuses here.

(3) Sometimes we still add a compiler-rt test for the change in LLVM or
Clang: e.g. if we enhance Clang frontend to teach UBSan
to detecting yet another kind of overflow, it makes sense to add a test
to UBSan test-suite that demonstrates it, in addition to
Clang test verifying that we emit a call to UBSan runtime. Also,
compiler-rt test would allow us to verify that the actual error report
we present to the user is sane.

This bit ^ is a bit unclear to me. If there was no change to the UBSan
runtime, and the code generated by Clang is equivalent/similar to an
existing use of the UBSan runtime - what is it that the new compiler-rt
test is providing? (perhaps you could give a concrete example you had in
mind to look at?)

See r235568 (change to Clang) followed by r235569 (change to compiler-rt
test). Now, it's a cheat because I'm fixing test, not adding it. However, I
would have definitely added it, if it was missing.

Right, I think the difference here is "if it was missing" - the test case
itself seems like it could be a reasonable one (are there other tests of
the same compiler-rt functionality? (I assume the compiler-rt functionality
is the implementation of sadd/ssub?))

No, that's the point - sadd/ssub is handled solely in Clang/LLVM, and UBSan
runtime is only responsible for reporting an already-detected issue. From
this POV you can actually consider majority of UBSan lit tests
"end-to-end": they check that driver understands -fsanitize= flags,
necessary instrumentation is generated, callbacks into UBSan runtime are
invoked at runtime, and they print properly formatted message.

In this case, a change to Clang
instrumentation (arguments passed to UBSan runtime callbacks) improved
the user-facing part of the tool, and compiler-rt test suite is a good
place to verify that.

This seems like the problematic part - changes to LLVM improve the
user-facing part of Clang, but we don't add end-to-end tests of that, as a
general rule. I'm trying to understand why the difference between that and
compiler-rt

I would say the difference is the information we print to the screen
Normally, Clang prints nothing to stdout or stderr. If it prints compiler
warnings/errors, header search paths, etc., this behavior is coded in the
Clang itself, and doesn't depend on the LLVM layer.
The major goal of sanitizers (not all of compiler-rt, for sure) is to
provide information in a user-friendly way, and it's actively influenced by
Clang and/or LLVM layers.

These printed reports is something we can check against really
conveniently. True, Clang can be "improved" by, say, a change in LLVM
inliner, but how would you write a robust test for that doesn't inspect
implementation details (which the whole LLVM IR is, to some extent)? OTOH,
if a change to LLVM gained ASan new capabilities, we can easily verify that
with a snippet of C++ code.

Alexey_Samsonov1 · February 26, 2016, 9:27pm

I mostly agree with what Richard and Justin said. Adding a few notes
about the general strategy we use:

(1) lit tests which look "end-to-end" proved to be way more
convenient for testing runtime libraries than unit tests.

We do have

the latter, and use them to provide test coverage for utility
functions, but we quite often accompany fix to the runtime library with
"end-to-end" small reproducer extracted from the real-world code that
exposed the issue.
Incidentally, this tests a whole lot of other functionality: Clang
driver, frontend, LLVM passes, etc, but it's not the intent of the test.

Indeed - this is analogous to the tests for, say, LLD that use llvm-mc
to produce the inputs rather than checking in object files. That area is
open to some discussion as to just how many tools we should rope in/how
isolated we should make tests (eg: maybe building the json object file
format was going too far towards isolation? Not clear - opinions differ).
But the point of the test is to test the compiler-rt functionality that was
added/removed/modified.

I think most people are in agreement with that, while acknowledging
the fuzzy line about how isolated we might be.

Yes.

These tests are sometimes platform-specific and poorly portable, but
they are more reliable (we make the same steps as the
user of the compiler), and serve the purpose of documentation.

(2) If we change LLVM instrumentation - we add a test to LLVM. If we
change Clang code generation or driver behavior - we add
a test to Clang. No excuses here.

(3) Sometimes we still add a compiler-rt test for the change in LLVM
or Clang: e.g. if we enhance Clang frontend to teach UBSan
to detecting yet another kind of overflow, it makes sense to add a
test to UBSan test-suite that demonstrates it, in addition to
Clang test verifying that we emit a call to UBSan runtime. Also,
compiler-rt test would allow us to verify that the actual error report
we present to the user is sane.

This bit ^ is a bit unclear to me. If there was no change to the UBSan
runtime, and the code generated by Clang is equivalent/similar to an
existing use of the UBSan runtime - what is it that the new compiler-rt
test is providing? (perhaps you could give a concrete example you had in
mind to look at?)

See r235568 (change to Clang) followed by r235569 (change to
compiler-rt test). Now, it's a cheat because I'm fixing test, not adding
it. However, I would have definitely added it, if it was missing.

Right, I think the difference here is "if it was missing" - the test
case itself seems like it could be a reasonable one (are there other tests
of the same compiler-rt functionality? (I assume the compiler-rt
functionality is the implementation of sadd/ssub?))

In this case, a change to Clang
instrumentation (arguments passed to UBSan runtime callbacks) improved
the user-facing part of the tool, and compiler-rt test suite is a good
place to verify that.

This seems like the problematic part - changes to LLVM improve the
user-facing part of Clang, but we don't add end-to-end tests of that, as a
general rule. I'm trying to understand why the difference between that and
compiler-rt

In what way do changes in LLVM change the user-facing part of Clang?
It obviously depends on how broadly one defines user-facing. Is a 1%
performance improvement from a particular optimization user-facing? Is
better debug info accuracy user-facing? I'm not sure. But it seems clear
that "the user sees a diagnostic or not" definitely is.

There's more than just performance in LLVM - ABI features, and yes, I'd
argue some pieces of debug info are pretty user facing (as are some
optimizations). We also have the remarks system in place now. (also the
compiler crashing (or not) is pretty user facing).

Also, I think part of this is that in compiler-rt there are usually more
moving parts we don't control. E.g. it isn't just the interface between
LLVM and clang. The information needs to pass through archivers, linkers,
runtime loaders, etc. that all may have issues that affect whether the user
sees the final result. In general the interface between LLVM and clang has
no middlemen so there really isn't anything to check.

Correctness/behavior of the compiler depends on those things too (linkers,
loaders, etc) to produce the final working product the user requested. If
we emitted symbols with the wrong linkage we could produce linker errors,
drop important entities, etc. But we don't generally test that the output
of LLVM/Clang produces the right binary when linked, we test that it
produces the right linkages on the resulting entities.

Sure, but verifying the Clang compiles/links a certain source file
successfully can easily live in a dedicated test-suite, even after it's
moved to GitHub, because probably it would be compiler-agnostic, and
wouldn't change frequently.

That said, I wouldn't mind having more Clang end-to-end tests, but the
major issue with actually invoking linkers and using system libraries is
portability.
These portability requirements are a lesser issue for sanitizers, because
we support only a handful of platforms.

_sean_silva · February 26, 2016, 9:31pm

I mostly agree with what Richard and Justin said. Adding a few notes
about the general strategy we use:

(1) lit tests which look "end-to-end" proved to be way more
convenient for testing runtime libraries than unit tests.

We do have

the latter, and use them to provide test coverage for utility
functions, but we quite often accompany fix to the runtime library with
"end-to-end" small reproducer extracted from the real-world code that
exposed the issue.
Incidentally, this tests a whole lot of other functionality: Clang
driver, frontend, LLVM passes, etc, but it's not the intent of the test.

Indeed - this is analogous to the tests for, say, LLD that use llvm-mc
to produce the inputs rather than checking in object files. That area is
open to some discussion as to just how many tools we should rope in/how
isolated we should make tests (eg: maybe building the json object file
format was going too far towards isolation? Not clear - opinions differ).
But the point of the test is to test the compiler-rt functionality that was
added/removed/modified.

I think most people are in agreement with that, while acknowledging
the fuzzy line about how isolated we might be.

Yes.

These tests are sometimes platform-specific and poorly portable, but
they are more reliable (we make the same steps as the
user of the compiler), and serve the purpose of documentation.

(2) If we change LLVM instrumentation - we add a test to LLVM. If we
change Clang code generation or driver behavior - we add
a test to Clang. No excuses here.

(3) Sometimes we still add a compiler-rt test for the change in LLVM
or Clang: e.g. if we enhance Clang frontend to teach UBSan
to detecting yet another kind of overflow, it makes sense to add a
test to UBSan test-suite that demonstrates it, in addition to
Clang test verifying that we emit a call to UBSan runtime. Also,
compiler-rt test would allow us to verify that the actual error report
we present to the user is sane.

This bit ^ is a bit unclear to me. If there was no change to the UBSan
runtime, and the code generated by Clang is equivalent/similar to an
existing use of the UBSan runtime - what is it that the new compiler-rt
test is providing? (perhaps you could give a concrete example you had in
mind to look at?)

See r235568 (change to Clang) followed by r235569 (change to
compiler-rt test). Now, it's a cheat because I'm fixing test, not adding
it. However, I would have definitely added it, if it was missing.

Right, I think the difference here is "if it was missing" - the test
case itself seems like it could be a reasonable one (are there other tests
of the same compiler-rt functionality? (I assume the compiler-rt
functionality is the implementation of sadd/ssub?))

In this case, a change to Clang
instrumentation (arguments passed to UBSan runtime callbacks) improved
the user-facing part of the tool, and compiler-rt test suite is a good
place to verify that.

This seems like the problematic part - changes to LLVM improve the
user-facing part of Clang, but we don't add end-to-end tests of that, as a
general rule. I'm trying to understand why the difference between that and
compiler-rt

In what way do changes in LLVM change the user-facing part of Clang?
It obviously depends on how broadly one defines user-facing. Is a 1%
performance improvement from a particular optimization user-facing? Is
better debug info accuracy user-facing? I'm not sure. But it seems clear
that "the user sees a diagnostic or not" definitely is.

There's more than just performance in LLVM - ABI features, and yes, I'd
argue some pieces of debug info are pretty user facing (as are some
optimizations). We also have the remarks system in place now. (also the
compiler crashing (or not) is pretty user facing).

I'd argue that we probably should have some sort of integration tests for
ABI features. I think at the moment we're getting by thanks to self-hosting
and regularly building lots of real-world programs with ToT-ish compilers.

-- Sean Silva

dblaikie · February 26, 2016, 9:32pm

Sorry, lost track of this thread.

No worries

I mostly agree with what Richard and Justin said. Adding a few notes
about the general strategy we use:

(1) lit tests which look "end-to-end" proved to be way more convenient
for testing runtime libraries than unit tests.

We do have

the latter, and use them to provide test coverage for utility
functions, but we quite often accompany fix to the runtime library with
"end-to-end" small reproducer extracted from the real-world code that
exposed the issue.
Incidentally, this tests a whole lot of other functionality: Clang
driver, frontend, LLVM passes, etc, but it's not the intent of the test.

Indeed - this is analogous to the tests for, say, LLD that use llvm-mc
to produce the inputs rather than checking in object files. That area is
open to some discussion as to just how many tools we should rope in/how
isolated we should make tests (eg: maybe building the json object file
format was going too far towards isolation? Not clear - opinions differ).
But the point of the test is to test the compiler-rt functionality that was
added/removed/modified.

I think most people are in agreement with that, while acknowledging the
fuzzy line about how isolated we might be.

Yes.

These tests are sometimes platform-specific and poorly portable, but
they are more reliable (we make the same steps as the
user of the compiler), and serve the purpose of documentation.

(2) If we change LLVM instrumentation - we add a test to LLVM. If we
change Clang code generation or driver behavior - we add
a test to Clang. No excuses here.

(3) Sometimes we still add a compiler-rt test for the change in LLVM
or Clang: e.g. if we enhance Clang frontend to teach UBSan
to detecting yet another kind of overflow, it makes sense to add a
test to UBSan test-suite that demonstrates it, in addition to
Clang test verifying that we emit a call to UBSan runtime. Also,
compiler-rt test would allow us to verify that the actual error report
we present to the user is sane.

This bit ^ is a bit unclear to me. If there was no change to the UBSan
runtime, and the code generated by Clang is equivalent/similar to an
existing use of the UBSan runtime - what is it that the new compiler-rt
test is providing? (perhaps you could give a concrete example you had in
mind to look at?)

See r235568 (change to Clang) followed by r235569 (change to compiler-rt
test). Now, it's a cheat because I'm fixing test, not adding it. However, I
would have definitely added it, if it was missing.

Right, I think the difference here is "if it was missing" - the test case
itself seems like it could be a reasonable one (are there other tests of
the same compiler-rt functionality? (I assume the compiler-rt functionality
is the implementation of sadd/ssub?))

No, that's the point - sadd/ssub is handled solely in Clang/LLVM, and
UBSan runtime is only responsible for reporting an already-detected issue.
From this POV you can actually consider majority of UBSan lit tests
"end-to-end": they check that driver understands -fsanitize= flags,
necessary instrumentation is generated, callbacks into UBSan runtime are
invoked at runtime, and they print properly formatted message.

OK, then I'm not sure of the value in having multiple compiler-rt tests
here, for each different place where Clang/LLVM call into UBSan to print a
different diagnostic.

I would expect to see a single test, that demonstrates that UBSan can print
diagnostics, and exercise whatever options there are (that it supports
parameters, location information, etc) but not one for each diagnostic
Clang uses that infrastructure for. That seems repetitious/low value, no?

In this case, a change to Clang
instrumentation (arguments passed to UBSan runtime callbacks) improved
the user-facing part of the tool, and compiler-rt test suite is a good
place to verify that.

This seems like the problematic part - changes to LLVM improve the
user-facing part of Clang, but we don't add end-to-end tests of that, as a
general rule. I'm trying to understand why the difference between that and
compiler-rt

I would say the difference is the information we print to the screen
Normally, Clang prints nothing to stdout or stderr. If it prints compiler
warnings/errors, header search paths, etc., this behavior is coded in the
Clang itself, and doesn't depend on the LLVM layer.
The major goal of sanitizers (not all of compiler-rt, for sure) is to
provide information in a user-friendly way, and it's actively influenced by
Clang and/or LLVM layers.

These printed reports is something we can check against really
conveniently. True, Clang can be "improved" by, say, a change in LLVM
inliner, but how would you write a robust test for that doesn't inspect
implementation details (which the whole LLVM IR is, to some extent)?

Would be quite easy - indeed lots of tests in Clang in the early days (some
of which still remain) do things like this - testing all the way through
Clang and LLVM to observe certain assembly output. A simple way to test the
inliner would be to nm the symbols at the other end and demonstrate that
the inlined function is no longer emitted. This wouldn't require linkers,
etc, even. But even though that's possible, it's something we go out of our
way not to do when testing Clang and LLVM.

dblaikie · February 26, 2016, 9:34pm

I mostly agree with what Richard and Justin said. Adding a few notes
about the general strategy we use:

(1) lit tests which look "end-to-end" proved to be way more
convenient for testing runtime libraries than unit tests.

We do have

the latter, and use them to provide test coverage for utility
functions, but we quite often accompany fix to the runtime library with
"end-to-end" small reproducer extracted from the real-world code
that exposed the issue.
Incidentally, this tests a whole lot of other functionality: Clang
driver, frontend, LLVM passes, etc, but it's not the intent of the test.

Indeed - this is analogous to the tests for, say, LLD that use
llvm-mc to produce the inputs rather than checking in object files. That
area is open to some discussion as to just how many tools we should rope
in/how isolated we should make tests (eg: maybe building the json object
file format was going too far towards isolation? Not clear - opinions
differ). But the point of the test is to test the compiler-rt functionality
that was added/removed/modified.

I think most people are in agreement with that, while acknowledging
the fuzzy line about how isolated we might be.

Yes.

These tests are sometimes platform-specific and poorly portable, but
they are more reliable (we make the same steps as the
user of the compiler), and serve the purpose of documentation.

(2) If we change LLVM instrumentation - we add a test to LLVM. If we
change Clang code generation or driver behavior - we add
a test to Clang. No excuses here.

(3) Sometimes we still add a compiler-rt test for the change in LLVM
or Clang: e.g. if we enhance Clang frontend to teach UBSan
to detecting yet another kind of overflow, it makes sense to add a
test to UBSan test-suite that demonstrates it, in addition to
Clang test verifying that we emit a call to UBSan runtime. Also,
compiler-rt test would allow us to verify that the actual error report
we present to the user is sane.

This bit ^ is a bit unclear to me. If there was no change to the
UBSan runtime, and the code generated by Clang is equivalent/similar to an
existing use of the UBSan runtime - what is it that the new compiler-rt
test is providing? (perhaps you could give a concrete example you had in
mind to look at?)

See r235568 (change to Clang) followed by r235569 (change to
compiler-rt test). Now, it's a cheat because I'm fixing test, not adding
it. However, I would have definitely added it, if it was missing.

Right, I think the difference here is "if it was missing" - the test
case itself seems like it could be a reasonable one (are there other tests
of the same compiler-rt functionality? (I assume the compiler-rt
functionality is the implementation of sadd/ssub?))

In this case, a change to Clang
instrumentation (arguments passed to UBSan runtime callbacks) improved
the user-facing part of the tool, and compiler-rt test suite is a good
place to verify that.

This seems like the problematic part - changes to LLVM improve the
user-facing part of Clang, but we don't add end-to-end tests of that, as a
general rule. I'm trying to understand why the difference between that and
compiler-rt

In what way do changes in LLVM change the user-facing part of Clang?
It obviously depends on how broadly one defines user-facing. Is a 1%
performance improvement from a particular optimization user-facing? Is
better debug info accuracy user-facing? I'm not sure. But it seems clear
that "the user sees a diagnostic or not" definitely is.

There's more than just performance in LLVM - ABI features, and yes, I'd
argue some pieces of debug info are pretty user facing (as are some
optimizations). We also have the remarks system in place now. (also the
compiler crashing (or not) is pretty user facing).

I'd argue that we probably should have some sort of integration tests for
ABI features. I think at the moment we're getting by thanks to self-hosting
and regularly building lots of real-world programs with ToT-ish compilers.

Perhaps so, but I'd argue that they shouldn't be run as part of "make
check" & should be in a separate test grouping (probably mostly run by
buildbots) for the purpose of integration testing. We've made a pretty
conscious, deliberate, and consistent effort to not do integration testing
across the LLVM projects in "make check"-like testing, and to fix it where
we do find it. It seems to me that compiler-rt diverged from that approach
and I'm not really in favor of that divergence.

- David

Alexey_Samsonov1 · February 26, 2016, 10:10pm

I mostly agree with what Richard and Justin said. Adding a few
notes about the general strategy we use:

(1) lit tests which look "end-to-end" proved to be way more
convenient for testing runtime libraries than unit tests.

We do have

the latter, and use them to provide test coverage for utility
functions, but we quite often accompany fix to the runtime library with
"end-to-end" small reproducer extracted from the real-world code
that exposed the issue.
Incidentally, this tests a whole lot of other functionality: Clang
driver, frontend, LLVM passes, etc, but it's not the intent of the test.

Indeed - this is analogous to the tests for, say, LLD that use
llvm-mc to produce the inputs rather than checking in object files. That
area is open to some discussion as to just how many tools we should rope
in/how isolated we should make tests (eg: maybe building the json object
file format was going too far towards isolation? Not clear - opinions
differ). But the point of the test is to test the compiler-rt functionality
that was added/removed/modified.

I think most people are in agreement with that, while acknowledging
the fuzzy line about how isolated we might be.

Yes.

These tests are sometimes platform-specific and poorly portable,
but they are more reliable (we make the same steps as the
user of the compiler), and serve the purpose of documentation.

(2) If we change LLVM instrumentation - we add a test to LLVM. If
we change Clang code generation or driver behavior - we add
a test to Clang. No excuses here.

(3) Sometimes we still add a compiler-rt test for the change in
LLVM or Clang: e.g. if we enhance Clang frontend to teach UBSan
to detecting yet another kind of overflow, it makes sense to add a
test to UBSan test-suite that demonstrates it, in addition to
Clang test verifying that we emit a call to UBSan runtime. Also,
compiler-rt test would allow us to verify that the actual error report
we present to the user is sane.

This bit ^ is a bit unclear to me. If there was no change to the
UBSan runtime, and the code generated by Clang is equivalent/similar to an
existing use of the UBSan runtime - what is it that the new compiler-rt
test is providing? (perhaps you could give a concrete example you had in
mind to look at?)

See r235568 (change to Clang) followed by r235569 (change to
compiler-rt test). Now, it's a cheat because I'm fixing test, not adding
it. However, I would have definitely added it, if it was missing.

Right, I think the difference here is "if it was missing" - the test
case itself seems like it could be a reasonable one (are there other tests
of the same compiler-rt functionality? (I assume the compiler-rt
functionality is the implementation of sadd/ssub?))

In this case, a change to Clang
instrumentation (arguments passed to UBSan runtime callbacks)
improved the user-facing part of the tool, and compiler-rt test suite is a
good place to verify that.

This seems like the problematic part - changes to LLVM improve the
user-facing part of Clang, but we don't add end-to-end tests of that, as a
general rule. I'm trying to understand why the difference between that and
compiler-rt

In what way do changes in LLVM change the user-facing part of Clang?
It obviously depends on how broadly one defines user-facing. Is a 1%
performance improvement from a particular optimization user-facing? Is
better debug info accuracy user-facing? I'm not sure. But it seems clear
that "the user sees a diagnostic or not" definitely is.

There's more than just performance in LLVM - ABI features, and yes, I'd
argue some pieces of debug info are pretty user facing (as are some
optimizations). We also have the remarks system in place now. (also the
compiler crashing (or not) is pretty user facing).

I'd argue that we probably should have some sort of integration tests for
ABI features. I think at the moment we're getting by thanks to self-hosting
and regularly building lots of real-world programs with ToT-ish compilers.

Perhaps so, but I'd argue that they shouldn't be run as part of "make
check" & should be in a separate test grouping (probably mostly run by
buildbots) for the purpose of integration testing.

If you have llvm/clang/compiler-rt/libc++/libc++abi checkout, they are not
run as a part of "make check", only "make check-all", which kind of makes
sense (run *all* the tests!). You're free to run "make check-clang", "make
check-asan" etc.
if you're sure your changes are limited in scope. Just to be clear - do you
suggest that compiler-rt tests are too heavy for this configuration, and
want to introduce extra level - i.e. extract "make check-compiler-rt" out
of "make check-all", and introduce "make check-absolutely-everything", that
would encompass them?

We've made a pretty conscious, deliberate, and consistent effort to not do
integration testing across the LLVM projects in "make check"-like testing,
and to fix it where we do find it. It seems to me that compiler-rt diverged
from that approach and I'm not really in favor of that divergence.

I don't see why consistency by itself is a good thing. As a sanitizer
developer, current situation is convenient for me, but if it harms / slows
down / complicates workflow for other developers or LLVM as a whole - sure,
let's fix it.

Topic		Replies	Views
compiler-rt tests in cmake? LLVM Dev List Archives	32	124	June 3, 2013
[RFC] Cross-project lit test suite LLVM Dev List Archives	27	152	May 4, 2021
Regression tests LLVM Dev List Archives	2	89	January 3, 2018
Testing the compiler LLVM Dev List Archives	2	77	December 14, 2020
Unit testing generated IR LLVM Dev List Archives	0	71	June 25, 2016

Testing Best Practices/Goals (in the context of compiler-rt)

Related Topics