[LNT] new server instance http://lnt.llvm.org seems unstable

Hi,

The new LNT server instance http://lnt.llvm.org seems to fail in many cases.

Any entrance to a ‘Run page’ (e.g. http://lnt.llvm.org/db_default/v4/nts/62475) and lately

also many perf bots result submissions (e.g. http://lab.llvm.org:8014/builders/clang-native-arm-lnt-perf/builds/2262/steps/test-suite/logs/stdio ) fails with:

“500 Internal Server Error”.

Any ideas?

Thanks, Elad

The run page problem were triggered by one of my commits (sorry) and should be mitigated now, see the thread at http://lists.llvm.org/pipermail/llvm-dev/2017-July/115971.html

I don't know about the submission problems, could they just an occasional network problem or are they a common phenomenon? Chris did some improvements to LNT to report back problems from the server side rather than a generic "500 internal server error" in case of errors. This should become active the next time lnt.llvm.org <http://lnt.llvm.org/&gt; is updated.

- Matthias

The run page problem were triggered by one of my commits (sorry) and
should be mitigated now, see the thread at
http://lists.llvm.org/pipermail/llvm-dev/2017-July/115971.html
<http://lists.llvm.org/pipermail/llvm-dev/2017-July/115971.html&gt;

I don't know about the submission problems, could they just an occasional
network problem or are they a common phenomenon? Chris did some
improvements to LNT to report back problems from the server side rather
than a generic "500 internal server error" in case of errors. This should
become active the next time lnt.llvm.org <http://lnt.llvm.org/&gt; is
updated.

Just FYI, I still see such errors on the most recent bui;lds:

http://lab.llvm.org:8011/builders/perf-x86_64-penryn-O3-polly-before-vectorizer-detect-only/builds/1205/steps/lnt.nightly-test/logs/stdio

2017-08-02 14:23:23 CRITICAL: Results were not obtained from submission.
nt
/home/grosser/buildslave/perf-x86_64-penryn-O3-polly-before-vectorizer-detect-only/tests/nt/build/sample-0/report.simple.csv
/home/grosser/buildslave/perf-x86_64-penryn-O3-polly-before-vectorizer-detect-only/tests/nt/build/sample-1/report.simple.csv
/home/grosser/buildslave/perf-x86_64-penryn-O3-polly-before-vectorizer-detect-only/tests/nt/build/sample-2/report.simple.csv
/home/grosser/buildslave/perf-x86_64-penryn-O3-polly-before-vectorizer-detect-only/tests/nt/build/sample-3/report.simple.csv
/home/grosser/buildslave/perf-x86_64-penryn-O3-polly-before-vectorizer-detect-only/tests/nt/build/sample-4/report.simple.csv
/home/grosser/buildslave/perf-x86_64-penryn-O3-polly-before-vectorizer-detect-only/tests/nt/build/sample-5/report.simple.csv
/home/grosser/buildslave/perf-x86_64-penryn-O3-polly-before-vectorizer-detect-only/tests/nt/build/sample-6/report.simple.csv
/home/grosser/buildslave/perf-x86_64-penryn-O3-polly-before-vectorizer-detect-only/tests/nt/build/sample-7/report.simple.csv
/home/grosser/buildslave/perf-x86_64-penryn-O3-polly-before-vectorizer-detect-only/tests/nt/build/sample-8/report.simple.csv
/home/grosser/buildslave/perf-x86_64-penryn-O3-polly-before-vectorizer-detect-only/tests/nt/build/sample-9/report.simple.csv
error: <!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 3.2 Final//EN">
<title>500 Internal Server Error</title>
<h1>Internal Server Error</h1>
<p>The server encountered an internal error and was unable to complete
your request. Either the server is overloaded or there is an error in
the application.</p>

2017-08-02 14:23:23 INFO: Running process cleanup.

Best,
Tobias

Seems it is something else then, someone needs to inspect the server logs to find out more about what is going on.

- Matthias

OK, who is in charge?

Best,
Tobias

Chris is on vacation this week. Not sure if someone else has access to the logs...

- Matthias

It's not super urgent, just wanted to make sure it does not get lost.
I CC Chris to make sure he has a full inbox when coming back from
vacation. :wink:

@Chris, hope you have a good and relaxed stay.

Best,
Tobias

Actually I just remember LNT has a page for viewing the log:

http://lnt.llvm.org/log

...
OperationalError: (OperationalError) (2006, 'MySQL server has gone away') 'SELECT `NT_Run`.`ID` AS `NT_Run_ID`, `NT_Run`.`MachineID` AS `NT_Run_MachineID`, `NT_Run`.`OrderID` AS `NT_Run_OrderID`, `NT_Run`.`ImportedFrom` AS `NT_Run_ImportedFrom`, `NT_Run`.`StartTime` AS `NT_Run_StartTime`, `NT_Run`.`EndTime` AS `NT_Run_EndTime`, `NT_Run`.`SimpleRunID` AS `NT_Run_SimpleRunID`, `NT_Run`.`Parameters` AS `NT_Run_Parameters` \nFROM `NT_Run` \nWHERE `NT_Run`.`ID` = %s' (62996L,)
...

Not sure what is causing that or what to do about it though.

- Matthias

This started since about 1-2 weeks, I guess. I am not sure about it
either, but seems unfortunate. Maybe we run into some kind of timeout?

Best,
Tobias

This started since about 1-2 weeks, I guess. I am not sure about it
either, but seems unfortunate. Maybe we run into some kind of timeout?

And another one :frowning:

http://lab.llvm.org:8011/builders/perf-x86_64-penryn-O3-polly/builds/1986/steps/lnt.nightly-test/logs/stdio

2017-08-03 00:24:44 CRITICAL: Results were not obtained from submission.
nt
/home/grosser/buildslave/perf-x86_64-penryn-O3-polly/tests/nt/build/sample-0/report.simple.csv
/home/grosser/buildslave/perf-x86_64-penryn-O3-polly/tests/nt/build/sample-1/report.simple.csv
/home/grosser/buildslave/perf-x86_64-penryn-O3-polly/tests/nt/build/sample-2/report.simple.csv
/home/grosser/buildslave/perf-x86_64-penryn-O3-polly/tests/nt/build/sample-3/report.simple.csv
/home/grosser/buildslave/perf-x86_64-penryn-O3-polly/tests/nt/build/sample-4/report.simple.csv
/home/grosser/buildslave/perf-x86_64-penryn-O3-polly/tests/nt/build/sample-5/report.simple.csv
/home/grosser/buildslave/perf-x86_64-penryn-O3-polly/tests/nt/build/sample-6/report.simple.csv
/home/grosser/buildslave/perf-x86_64-penryn-O3-polly/tests/nt/build/sample-7/report.simple.csv
/home/grosser/buildslave/perf-x86_64-penryn-O3-polly/tests/nt/build/sample-8/report.simple.csv
/home/grosser/buildslave/perf-x86_64-penryn-O3-polly/tests/nt/build/sample-9/report.simple.csv
error: <!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 3.2 Final//EN">
<title>500 Internal Server Error</title>
<h1>Internal Server Error</h1>
<p>The server encountered an internal error and was unable to complete
your request. Either the server is overloaded or there is an error in
the application.</p>

2017-08-03 00:24:44 INFO: Running process cleanup.
program finished with exit code 1
elapsedTime=20725.395111

Best,
Tobias

I noticed some strange stuff going on in the background workers because of the eventlets. Very much like what is discussed here:

https://stackoverflow.com/questions/14736766/why-does-gevent-socket-break-multiprocessing-connections-auth

I have disabled that feature in Gunicorn. It could impact db level socket stuff too.

Please let me know if you continue to see issues now.

Thank you, I will keep an eye on the servers.

Best,
Tobias