separate repository for test suite?

I wanted to raise the question of whether or not clang needs a separate repository for its test suite (as we do for llvm).

Lately we have been adding new test cases to the test suite that came from bug reports of this-or-that project not being parsed/analyzed correctly by clang. Often these test cases are reduced versions of preprocessed files. My concern is whether or not those test cases, which could be considered "derived work", fall under the original license of that source code. If that is the case, does this pose issues for clang?

The case I'm most concerned about is test cases that come from GPL'ed projects, since that license is obviously far different in its legal parameters than the UIUC license. Checking in a mix of GPL and non-GPL code into the clang repository may be problematic.

Complex issue, no exceedingly perfect answers. Might have to consult the llvm lawyer for guidance.

Basically, from an engineering point of view, strip testcase down and then hope for the best. In practice, worked fine for the gcc testsuite. One you get down to be small enough, the copyright claim of the original ceases to be much a concern. If the original came from a commercial testsuite, great care (clean room) technique should probably be used.

A lawyer type might propose that we include explicit wording saying basically that by submitting the bug report, the user agrees to the general process by which we strip down the testcase and include it for redistribution in the testsuite. If nothing else, this sets the users expectations better, which should reduce the number of unhappy campers. In the great FSF testsuite review, I think we wound up just rm -rf all the larger testcases and going on with life.

In my opinion, the kinds of tests that go in the clang test level should
almost always be small and minimized enough that this is a non-issue.
For test cases I personally reduce I usually minimize aggressively
and do some canonicalization of names (f1, f2, f3 or foo bar), largely
out of personal preference but it covers the license issue as well. Small
test cases for regression or feature issues benefit everyone by running
faster and highlighting the issue.

For larger tests and for executable tests I think the LLVM style approach
is that they should be in a separate repository regardless of license issues.
Its not clear to me that this repo needs to be different from the LLVM test
suite one, however. I think we just need to provide better hooks for running
the LLVM test suite using clang.

- Daniel

In my opinion, the kinds of tests that go in the clang test level should
almost always be small and minimized enough that this is a non-issue.
For test cases I personally reduce I usually minimize aggressively
and do some canonicalization of names (f1, f2, f3 or foo bar),

I also do the same most of the times. A rewriter tool would be handy here!

largely
out of personal preference but it covers the license issue as well. Small
test cases for regression or feature issues benefit everyone by running
faster and highlighting the issue.

For larger tests and for executable tests I think the LLVM style approach
is that they should be in a separate repository regardless of license issues.

Note, LLVM also keeps small regression tests inside the llvm module.

Its not clear to me that this repo needs to be different from the LLVM test
suite one, however. I think we just need to provide better hooks for running
the LLVM test suite using clang.

Yes, it makes sense to use one test suite, that is a collection of applications and benchmarks, for both compilers (llvm-gcc & clang+llvm).