Who sets the 10-minute timeouts?

A recent change is causing several LLDB tests on Windows to fail and several more to time out, which I intend to look into.

It appears the timeout period is set to 600 seconds (10 minutes), which seems excessive and causes the Windows build bot to spend lots of time waiting. (e.g., http://lab.llvm.org:8011/builders/lldb-x64-windows-ninja/builds/7819/steps/test/logs/stdio)

Is there a reason why the timeouts are set that long? What would be a reasonable value?

Adrian.

It's just a number picked after some observation of the testsuite behaviour.
Historically there were some tests taking 3-5minutes to run to I think
the rationale was (2*[max_duration_single_test]), but lots of things
changed so maybe can be revisited.
Also, I'm under the impression given the test suite uses lit,
individual bots can override the value.

I recently increased/unified several internal timeouts throughout LLDB (https://reviews.llvm.org/D60340) in reaction to bots failing randomly on heavily used machines, particularly when ASAN is enabled, which can cause surprisingly long delays.

Since the normal operation should be that no tests fail, waiting an extra 10 minutes in the exceptional case that a test does fail seems more desirable than the chance of a working test failing because of too-small timeout. Therefore, I’d rather pick an excessively large per-test timeout to be safe.

– adrian

A recent change is causing several LLDB tests on Windows to fail and several more to time out, which I intend to look into.

It appears the timeout period is set to 600 seconds (10 minutes), which seems excessive and causes the Windows build bot to spend lots of time waiting. (e.g., http://lab.llvm.org:8011/builders/lldb-x64-windows-ninja/builds/7819/steps/test/logs/stdio)

Is there a reason why the timeouts are set that long? What would be a reasonable value?

I recently increased/unified several internal timeouts throughout LLDB (https://reviews.llvm.org/D60340) in reaction to bots failing randomly on heavily used machines, particularly when ASAN is enabled, which can cause surprisingly long delays.

Since the normal operation should be that no tests fail, waiting an extra 10 minutes in the exceptional case that a test does fail seems more desirable than the chance of a working test failing because of too-small timeout. Therefore, I'd rather pick an excessively large per-test timeout to be safe.

This is a little pedantic, but tests that fail some assert also won't trigger the timeout. It should only be tests that fail by stalling - for instance you expected to hit a breakpoint but never did - that trigger the timeout. That should be even less frequent that just test failures.

Jim

A recent change is causing several LLDB tests on Windows to fail and several more to time out, which I intend to look into.

It appears the timeout period is set to 600 seconds (10 minutes), which seems excessive and causes the Windows build bot to spend lots of time waiting. (e.g., http://lab.llvm.org:8011/builders/lldb-x64-windows-ninja/builds/7819/steps/test/logs/stdio)

Is there a reason why the timeouts are set that long? What would be a reasonable value?

I recently increased/unified several internal timeouts throughout LLDB (https://reviews.llvm.org/D60340) in reaction to bots failing randomly on heavily used machines, particularly when ASAN is enabled, which can cause surprisingly long delays.

Since the normal operation should be that no tests fail, waiting an extra 10 minutes in the exceptional case that a test does fail seems more desirable than the chance of a working test failing because of too-small timeout. Therefore, I’d rather pick an excessively large per-test timeout to be safe.

This is a little pedantic, but tests that fail some assert also won’t trigger the timeout. It should only be tests that fail by stalling

FYI: There are six tests stalling on Windows. They’ve been doing it long enough that the bot history no longer shows the last good build and the grid view never shows anything other than “building” because it can no longer keep up with the rate of submissions.

There are also many tests actually failing on Windows. It’s time consuming to bisect when the timeouts add 10 minutes to every step.

As a workaround, assuming you're running the tests with lit, you
should be able to override the timeout with --max-time.

I don't know where the windows bot output is...

Which tests are timing out?

Maybe you ought to skip those tests on Windows for now to reduce the noise?

Jim