BasicResultsFormatter - new test results summary

Todd, I've had to disable the new result formatter as it was not
working with the expected timeout logic we have for the old one. The
old XTIMEOUT code is a massive hack and I will be extremely glad when
we get rid of it, but we can't keep our buildbot red until then, so
I've switched it off.

I am ready to start working on this, but I wanted to run this idea
here first. I thought we could have a test annotation like:
@expectedTimeout(oslist=["linux"], ...)

Then, when the child runner would encounter this annotation, it would
set a flag in the "test is starting" message indicating that this test
may time out. Then if the test really times out, the parent would know
about this, and it could avoid flagging the test as error.

Alternatively, if we want to avoid the proliferation test result
states, we could key this off the standard @expectedFailure
annotation, then a "time out" would become just another way it which a
test can fail, and XTIMEOUT would become XFAIL.

What do you think ?

pl

PS: I am pretty new to this part of code, so any pointers you have
towards implementing this would be extremely helpful.

Todd, I've had to disable the new result formatter as it was not
working with the expected timeout logic we have for the old one. The
old XTIMEOUT code is a massive hack and I will be extremely glad when
we get rid of it, but we can't keep our buildbot red until then, so
I've switched it off.

Ah, sorry my comments on the check-in precede me reading this. Glad you
see this as a hack :slight_smile:

No worries on shutting it off. I can get the expected timeout as currently
written working with the updated summary results.

I am ready to start working on this, but I wanted to run this idea
here first. I thought we could have a test annotation like:
@expectedTimeout(oslist=["linux"], ...)

Then, when the child runner would encounter this annotation, it would
set a flag in the "test is starting" message indicating that this test
may time out. Then if the test really times out, the parent would know
about this, and it could avoid flagging the test as error.

Yes, the idea seems reasonable. The actual implementation will end up
being slightly different as the ResultsFormatter will receive the test
start event (where the timeout is expected comes from), whereas the
reporter of the timeout (the test worker) will not know anything about that
data. It will still generate the timeout, but then the ResultsFormatter
can deal with transforming this into the right event when a timeout is
"okay".

Alternatively, if we want to avoid the proliferation test result
states, we could key this off the standard @expectedFailure
annotation, then a "time out" would become just another way it which a
test can fail, and XTIMEOUT would become XFAIL.

What do you think ?

Even though the above would work, if the issue here ultimately is that a
larger timeout is needed, we can avoid all this by increasing the timeout.
Probably more effective, though, is going to be running it in the
follow-up, low-load, single worker pass, where presumably we would not hit
the timeout. If you think that would work, I'd say:

(1) short term (like in the next hour or so), I get the expected timeout
working in the summary results.

(2) longer term (like by end of weekend or maybe Monday at worst), we have
the second pass test run at lower load (i.e. single worker thread), which
should prevent these things from timing out in the first place.

If the analysis of the cause of the timeout is incorrect, then really we'll
want to do your initial proposal in the earlier paragraphs, though.

What do you think about any of that?

Merging threads.

The concept is not there to protect against timeouts, which are caused
by processes being too slow, for these we have been increasing
timeouts where necessary.

Okay, I see. If that’s the intent, then expected timeout sounds reasonable. (My abhorrence was against the idea of using that as a replacement for increasing a timeout that was too short under load).

I would go with your original approach (the marking as expected timeout). We can either have that generate a new event (much like a change I’m about to put in that has flakey tests send and event indicating that they are eligible for rerun) or annotate the start message. FWIW, the startTest() call on the LLDBTestResults gets called before decorators have a chance to execute, which is why I’m going with the ‘send an enabling event’ approach. (I’ll be checking that in shortly here, like when I’m done writing this email, so you’ll see what I did there).

(btw, I haven’t checked, is it possible to XFAIL crashes now?

This currently doesn’t work. We’d need a smarter rerun mechanism (something I do intend to do at some point), where we (1) know all the tests that should run from a given test file before any are run, and (2) when a timeout or exceptional exit takes out the whole file’s worth of tests, if there are any other tests that didn’t get to run, we rerun just those.

That is out of scope for the second pass low-load, single worker work I’m doing now. But it is something I’d like to have in the future. The first phase of that would be to list all the tests that didn’t run because of a timeout or exceptional exit. The second phase is to enable only running a named set of tests from a test file, and then rerunning ones that never had a chance to run. (That I could imagine rolling into the implementation of the second test run pass fairly nicely).

Hi,

thanks a lot for fixing the timeout issue on such a short notice. I
didn't think I'd find myself defending them, as I remember being quite
upset when they went in, but they have proven useful in stabilising
the build bots, and I think it's likely you may need them as well.
I'll try to now add a nicer way to expect timeouts so that we don't
have the hack in the new runner as well. I'll add a new message, like
you did for the flakey decorator.

I'm a bit uneasy about adding another kind of a decorator though. What
would you (and anyone else reading this) think about adding this
behavior to the existing XFAIL decorators?
This way, "timeout" would become just another way in which a test can
"fail", and any test marked with an XFAIL decorator would be eligible
for this treatment.

We would lose the ability to individually expect "failures" and
"timeouts", but I don't think that is really necessary, and I think it
will be worth the extra maintainability we get from the fact of having
fewer test decorators.

What do you think?

pl

Hi,

thanks a lot for fixing the timeout issue on such a short notice.

Sure thing. I didn't mind at all once I understood the context in which
they were being used.

I
didn't think I'd find myself defending them, as I remember being quite
upset when they went in, but they have proven useful in stabilising
the build bots,

Yes, I understand. I am definitely not a fan of tests that sometimes give
useful signals (tests that don't lock up), but sometimes lock up (and
therefore don't give useful signals). But since the lock-up is catchable
via the timeout, and since we can figure that out definitively, this isn't
quite so bad as it seems at first blush.

and I think it's likely you may need them as well.

I am hopeful the test rerun mechanism generally handles it, but that'll
only be the case if the timeout is triggered by heavy load. Otherwise, the
rerun phase will be no different. Time will tell :slight_smile:

I'll try to now add a nicer way to expect timeouts so that we don't
have the hack in the new runner as well. I'll add a new message, like
you did for the flakey decorator.

Feel free to mimic what I did with the flakey decorator sending a new test
event that can additively apply the "'expected timeout" marking to the test
method run. In fact, if you get the decorator set up and put in a

# TODO @tfiala add test event code here

I can get the rest of it in.

I'm a bit uneasy about adding another kind of a decorator though. What
would you (and anyone else reading this) think about adding this
behavior to the existing XFAIL decorators?
This way, "timeout" would become just another way in which a test can
"fail", and any test marked with an XFAIL decorator would be eligible
for this treatment.

I think I'd be okay with that. I like reducing the number of ways we
support failure decoration, so this seems reasonable.

We would lose the ability to individually expect "failures" and
"timeouts", but I don't think that is really necessary, and I think it
will be worth the extra maintainability we get from the fact of having
fewer test decorators.

OTOH, the piece we then lose is the ability to have an XFAIL mean "Hey this
test really should fail, we haven't implemented feature XYZ (correctly or
otherwise), so this better fail." In that semantic meaning, an unexpected
success would truly be an actionable signal --- either the test is now
passing because the feature now works (actionable signal option A: the
XFAIL should come off after verifying), or or the test is passing because
it is not testing what it thought it was, and the test needs to be modified
to more tightly bound the expected fail condition (actionable item option
B).

So it eliminates the definiteness of an XFAIL ideally meaning "this really
should fail," turning it into "it is permissible for this to fail."

All that said, our Python test suite is so far away from that ideal right
now. The highest level output of our test suite that I care about is "if
tests run green, this is a good build", and if "tests run red, this is a
bad build." I don't see the timeout being rolled into XFAIL as hurting
that. It seems reasonable to roll them together at this time. And the
test output will list and count the timeouts.

So I'd be okay with that at this time in the sake of simplifying markup for
tests.

What do you think?

We would lose the ability to individually expect "failures" and
"timeouts", but I don't think that is really necessary, and I think it
will be worth the extra maintainability we get from the fact of having
fewer test decorators.

OTOH, the piece we then lose is the ability to have an XFAIL mean "Hey this
test really should fail, we haven't implemented feature XYZ (correctly or
otherwise), so this better fail." In that semantic meaning, an unexpected
success would truly be an actionable signal --- either the test is now
passing because the feature now works (actionable signal option A: the XFAIL
should come off after verifying), or or the test is passing because it is
not testing what it thought it was, and the test needs to be modified to
more tightly bound the expected fail condition (actionable item option B).

So it eliminates the definiteness of an XFAIL ideally meaning "this really
should fail," turning it into "it is permissible for this to fail."

All that said, our Python test suite is so far away from that ideal right
now. The highest level output of our test suite that I care about is "if
tests run green, this is a good build", and if "tests run red, this is a bad
build." I don't see the timeout being rolled into XFAIL as hurting that.
It seems reasonable to roll them together at this time. And the test output
will list and count the timeouts.

I'd say that the root cause here is something different, namely the
fact that our tests do not behave deterministically. If they were
always ending with the same result, then all you said is above would
be true, regardless of whether that result was "failure" or "timeout"
- having a test consistently failing would give the same kind of
signal as a test consistently timing out (although I hope we never
have the latter kind). I am really only interested in hanging tests
here, using this to handle tests that were just slightly too slow is
quite a bad idea. In fact, I think using a uniform decorator would
discourage this, as then you will not have the option of saying "I
want this test to succeed, but if it takes too long, then don't worry
about that" (which is actually what we do right now, and we needed to
do that as we had a lot of tests hanging in the past, but I think
that's gotten better now).

So I'd be okay with that at this time in the sake of simplifying markup for
tests.

Ok, I'll get on it then.

pl