[RFC] Require PRs for XFAILing tests

Hello LLVM-Dev,

The other day as I was digging through lldb’s test suite I noticed they support something kinda neat. In their python test harness, the attribute they use to denote expected failures supports a parameter for specifying the bug number. This got me thinking.

I believe that any test that is marked XFAIL is a bug, and we can use LIT to enforce that. So I wrote a patch (https://reviews.llvm.org/D25035) to add a feature to LIT which would support mapping XFAILs to PRs, and a flag to turn XFAILS without PRs into failures.

My proposal is to add this feature to LIT (after proper code review, and I still need to write tests for it), and then to annotate all our XFAILS with PRs. Once all the PRs are annotated I think we should enable this behavior by default and require PRs tracking all XFAILs.

Thoughts?
-Chris

I think that’s a great idea!

I wonder if there won’t be annoying cases that won’t fit well though, and a fake PR may be inserted just to please the system.
Have you survey the existing XFAIL test to see if it would be the case? (I can’t imagine why but…)

Thanks,

I think one of the good things that would come out of this policy change is that we would have to audit the existing XFAILs. The most common XFAIL directive from my grep is "XFAIL: *”, which is the universal failure. I cannot think of any situation where a universally failing test should be in-tree unless it is a bug that someone is expecting to fix.

The only situation that I’ve come up with where filing a PR for an XFAIL isn’t reasonable would be for out-of-tree tests, although it is likely I’m not thinking of some edge case. To support out-of-tree code I think that we could adapt my patch to support multiple prefixes that denote different bug trackers. That would allow downstream users to annotate their tests for their bug trackers.

-Chris

I don't know if that's true.

test/CodeGen/Generic/MachineBranchProb.ll:

; ARM & AArch64 run an extra SimplifyCFG which disrupts this test.
; XFAIL: arm,aarch64

; Hexagon runs passes that renumber the basic blocks, causing this test
; to fail.
; XFAIL: hexagon

-Krzysztof

This may be an unpopular opinion (and I don’t have the full context on those specific issues), but I believe that these are an abuse of XFAIL, and should probably be written in terms of REQUIRES instead of XFAIL.

I believe XFAIL tests actually execute, and are just marked as expected failure. If a test is not expected to ever succeed, we shouldn’t bother running it, which is what the REQUIRES directives are for.

-Chris

Agreed.

We already have an unwritten rule to create PRs for XFAILs, and we
normally don't XFAIL lightly (I don't, at least). But creating one PR
for every existing XFAIL may end up as a long list of never looked
PRs. :slight_smile:

This could be one of the commit hooks that were proposed a while ago,
to demand PR numbers for commits that have XFAIL in their change logs
(only if the line is changed). But I'm always sceptical as to the
value of such hooks...

cheers,
--renato

And the directive would require what specifically?

If anything, UNSUPPORTED may be better for this.

-Krzysztof

I cannot think of any situation where a universally failing test
should be in-tree unless it is a bug that someone is expecting to fix.

It seems moderately common to mark something XFAIL temporarily to get
the bots green while then going ahead to fix the issue. Your proposal
would add extra overhead to that flow by requiring a PR as well. This
has value when it turns out that fix can't happen in the short term for
any reason. I don't have a feel for how common that is, although I'm
sure it does happen.
I think the overhead is worth the added value, but then I'm a process
kind of guy.

I think there are a few interesting things that could follow from solidifying a policy requiring PRs for XFAILs.

First and foremost bugs can have way more context than than you would often find in a test case comment. That would make it a lot easier to audit XFAILs in the future and help keep the number of XFAILs to a minimum. I think this is important because many of our XFAILs are really old, and I’m not convinced that we shouldn’t just be deleting some of these tests.

For example, 2008-12-14-StrideAndSigned.ll and 7 other tests were marked with “XFAIL: *” in 2009, and the commit message doesn’t really explain what was going on:

commit 789558db70d9513a017c11c5be30945839fdff1c
Author: Nick Lewycky <nicholas@mxc.ca>
Date: Tue Jan 13 09:18:58 2009 +0000

    Wind SCEV back in time, to Nov 18th. This 'fixes' PR3275, PR3294, PR3295,
    PR3296 and PR3302.
    
    git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@62160 91177308-0d34-0410-b5e6-96231b3b80d8

Requiring a PR doesn’t necessarily fix this problem because you could list the wrong PR, or even just a generic stub PR that didn’t add meaningful value, but I think our community is pretty good at keeping people honest via code reviews.

I also think that eventually we could extend LIT to optionally query the bug database to perform automated audits to keep track of referenced bugs and produce reports about the referenced bug reports.

I also think the process of instituting this policy would require an audit of existing XFAILs which will allow us to evaluate the value provided by our current XFAILing tests and we can consider solutions to handle some of these cases better. That could be removing tests that are universally XFAILing with no expected change coming, or migrating tests to REQUIRES or UNSUPPORTED instead of XFAIL.

-Chris

Other important aspects:
- We can immediately start a specific and meaningful conversation about the problem
- Progress/blockers in solving the problem can be documented if necessary
- It is actually easy to find the corresponding discussion to a problem (the alternative of looking up the commit adding the XFAIL and then searching the commits mailing list for threads on that commit is more trouble)

- Matthias

This looks very useful for LLVM itself, but please consider downstream consumers with this feature. We have a few XFAIL’d tests in our downstream fork and run llvm-lit in our CI setup. We obviously can’t open an llvm.org bug for each one, because they don’t relate to bugs in upstream LLVM. Please ensure that we have a mechanism for disabling this (or a different spelling of XFAIL for local XFAILs).

David

I cannot think of any situation where a universally failing test
should be in-tree unless it is a bug that someone is expecting to fix.

It seems moderately common to mark something XFAIL temporarily to get
the bots green while then going ahead to fix the issue. Your proposal
would add extra overhead to that flow by requiring a PR as well. This
has value when it turns out that fix can’t happen in the short term for
any reason. I don’t have a feel for how common that is, although I’m
sure it does happen.
I think the overhead is worth the added value, but then I’m a process
kind of guy.

We already have an unwritten rule to create PRs for XFAILs, and we
normally don’t XFAIL lightly (I don’t, at least). But creating one PR
for every existing XFAIL may end up as a long list of never looked
PRs. :slight_smile:

As opposed to the other ~9000 open PRs? At least they would be tracked.

I’d be inclined to agree (or at least voice the same concern) as Renato here - as a project we don’t really have very good bug hygiene, so adding more bug filing doesn’t seem to me like it’d give us much value.

Auditing existing XFAILs can be done today without filing bugs for them.

And we still always have the option to (& in many cases do) file bugs for XFAILs to discuss them, etc.

But I don’t feel strongly about it either way, so happy to leave the folks who do to make the call/do the work.

  • Dave

My proposal is to add this feature to LIT (after proper code review, and I still need to write tests for it), and then to annotate all our XFAILS with PRs. Once all the PRs are annotated I think we should enable this behavior by default and require PRs tracking all XFAILs.

This looks very useful for LLVM itself, but please consider downstream consumers with this feature. We have a few XFAIL’d tests in our downstream fork and run llvm-lit in our CI setup. We obviously can’t open an llvm.org bug for each one, because they don’t relate to bugs in upstream LLVM. Please ensure that we have a mechanism for disabling this (or a different spelling of XFAIL for local XFAILs).

Absolutely. I expect to have two mechanisms for downstream users.

(1) The ability to specify your own bug tracker prefixes so that you can mark out-of-tree XFAILs with your own bugs if you’d like
(2) The ability to disable forcing failure. Additionally Matthias Braun suggested that I should add the ability to disable or add bug tracker URLs per-test suite, which I’ll be doing as well.

-Chris

I cannot think of any situation where a universally failing test
should be in-tree unless it is a bug that someone is expecting to fix.

It seems moderately common to mark something XFAIL temporarily to get
the bots green while then going ahead to fix the issue. Your proposal
would add extra overhead to that flow by requiring a PR as well. This
has value when it turns out that fix can’t happen in the short term for
any reason. I don’t have a feel for how common that is, although I’m
sure it does happen.
I think the overhead is worth the added value, but then I’m a process
kind of guy.

We already have an unwritten rule to create PRs for XFAILs, and we
normally don’t XFAIL lightly (I don’t, at least). But creating one PR
for every existing XFAIL may end up as a long list of never looked
PRs. :slight_smile:

As opposed to the other ~9000 open PRs? At least they would be tracked.

I’d be inclined to agree (or at least voice the same concern) as Renato here - as a project we don’t really have very good bug hygiene, so adding more bug filing doesn’t seem to me like it’d give us much value.

I haven’t done a full audit, but we have 257 XFAILs in LLVM.

  • 44 of those are vg_leak failures on TableGen tests which should be UNSUPPORTED because we allow TableGen to leak for performance the same way we allow clang to leak.
  • 15 of them are vg_leak in the OCaml bindings. Someone familiar with OCaml should chime in on it, but I suspect those too should be UNSUPPORTED
  • 125 of them are universal failure (XFAIL: *). Many of these have been marked this way for years. I suspect that if we take the time to go through these we will likely find that many of these test cases either should be tracked by bugs, or should be removed from the tree

From there, many of the test cases are XFAIL on features where they really should be UNSUPPORTED. I suspect that if we do a full audit of the XFAILs we would likely find <100 which should actually be XFAIL, and tracking those seems valuable to me.

Auditing existing XFAILs can be done today without filing bugs for them.

Yes, we can audit them today. Making bugs required for them makes it easier to audit them because there will (in theory) be a bug describing the justification for the XFAIL and what the underlying issue is. Digging up the reason why an XFAIL was added in 2009 is a little bit challenging today if there isn’t a PR associated with it (or a really good comment or commit message).

-Chris

I haven’t done a full audit, but we have 257 XFAILs in LLVM.

* 44 of those are vg_leak failures on TableGen tests which should be
UNSUPPORTED because we allow TableGen to leak for performance the same way
we allow clang to leak.
* 15 of them are vg_leak in the OCaml bindings. Someone familiar with
OCaml should chime in on it, but I suspect those too should be UNSUPPORTED

Do we still support running the test suite with Valgrind? I don't think we
run it continuously anymore. Maybe these are just noise now and we should
remove them. Users can still run Valgrind on LLVM, but LLVM's test suite is
an implementation detail, and we don't have to gaurantee that it is leak
free.

* 125 of them are universal failure (XFAIL: *). Many of these have been
marked this way for years. I suspect that if we take the time to go through
these we will likely find that many of these test cases either should be
tracked by bugs, or should be removed from the tree

From there, many of the test cases are XFAIL on features where they really
should be UNSUPPORTED. I suspect that if we do a full audit of the XFAILs
we would likely find <100 which should actually be XFAIL, and tracking
those seems valuable to me.

Bummer.

Auditing existing XFAILs can be done today without filing bugs for them.

Yes, we can audit them today. Making bugs required for them makes it
easier to audit them because there will (in theory) be a bug describing the
justification for the XFAIL and what the underlying issue is. Digging up
the reason why an XFAIL was added in 2009 is a little bit challenging today
if there isn’t a PR associated with it (or a really good comment or commit
message).

Can we require a comment for every XFAIL? That seems like a clear guideline
that can be enforced by code review, and it's less heavyweight.

Requiring comments for XFAIL is much harder to enforce via tools, and would need to be enforced with code review. One advantage I see to this process is that the tooling support is a relatively simple patch to LIT.

-Chris

It would be easy to enforce by changing the syntax of XFAIL to XFAIL <msg> and having lit report an error if tests didn’t contain a message for the XFAIL. lit -v could also then report the reasons for the XFAILing tests.

David

I'm not saying I _like_ this solution, but if that were an issue we
could always have an open issue e.g. "PRNNNN: Some tests are marked
XFAIL but only have this generic PR listed as the reason", for use in
these "quick fix" cases. It would also be easy to track if these
"quick fixes" didn't happen shortly.

Alex

From: Alex Bradbury [mailto:asb@asbradbury.org]
Sent: Saturday, October 01, 2016 1:06 PM
To: Robinson, Paul
Cc: Renato Golin; Chris Bieneman; llvm-dev@lists.llvm.org
Subject: Re: [llvm-dev] [RFC] Require PRs for XFAILing tests

>> I cannot think of any situation where a universally failing test
>> should be in-tree unless it is a bug that someone is expecting to fix.
>
> It seems moderately common to mark something XFAIL temporarily to get
> the bots green while then going ahead to fix the issue. Your proposal
> would add extra overhead to that flow by requiring a PR as well. This
> has value when it turns out that fix can't happen in the short term for
> any reason. I don't have a feel for how common that is, although I'm
> sure it does happen.
> I think the overhead is worth the added value, but then I'm a process
> kind of guy.

I'm not saying I _like_ this solution, but if that were an issue we
could always have an open issue e.g. "PRNNNN: Some tests are marked
XFAIL but only have this generic PR listed as the reason", for use in
these "quick fix" cases. It would also be easy to track if these
"quick fixes" didn't happen shortly.

As David Blaikie mentioned, our bug hygiene is not really that good.
It would be easy to find the set of tests citing the generic PR, but
somebody would have to take it upon themselves to go looking for them.
By the time that happened, the kinds of details we'd want to see in a
bug would be just as missing as if we had no XFAIL-to-PR link at all.

Conversely, requiring short-term XFAILs to have their own PR means
that if somebody fixed the test and forgot to close the PR, that
dangling PR would be easy to recognize as something that could be
summarily closed if anybody decided to go look at all the XFAIL-linked
PRs. This scenario leaves an open PR kicking around, O the horror,
but we have not lost any useful information.

Now, I think it would be a great and useful thing for somebody to take
on the role of PR Czar, to do that kind of sanitization of the bugs,
but I don't see it happening as an ongoing role. Therefore I prefer
a process that is a bit more tedious but doesn't lose information
over a simpler but lossy process.
--paulr

Adding a process to create more PRs is not going to change that. Until we have a plan to address that, forcing more bug reports is not really going to accomplish much.

-Krzysztof