Buildbot insights

kwk · August 17, 2022, 9:09pm

Hi,

I am maintaining the daily LLVM snapshots for Fedora. To get a better understanding of the status of te builds I wanted to visualize “things” in Grafana using Postgres as a datasource. I’ve never done something like that but I was able to feed the JSON data from our build system as SQL instructions into our database.

Then a colleague asked if I could do a similiar thing for the LLVM buildbots. It was possible straight forward and analog to my previous work. All in all I’ve collected and stored more than 2,500,000 rows of denormalized build information for the staging and production buildbots.

This is a visualization of the build failures of each buildbot builder over the last days.

The SQL query for this is plain and simple:

SELECT
  $__timeGroup("build_started_at", '24h'),
  count(*) as "value",
  builder_name as "metric"
FROM buildbot_build_logs
WHERE
  $__timeFilter(build_started_at)
  AND build_complete = true
  AND buildbot_instance in (${buildbot_instance:sqlstring})
  AND build_results <> 0
GROUP BY "time", builder_name
ORDER BY 1

I would like to get some feedback on this to find out if there’s more insight that you guys are interested in.

To test out all of this on your own linux machine, try this (attention: this is work in progress):

git clone https://github.com/kwk/llvm-snapshot-monitoring.git
cd llvm-snapshot-monitoring
# Build the container images
make build
# Start the services (opens ports: 3000 for grafana, 5342 for postgres, and 8080 for adminer)
make start
# Hit ctrl-c once you see the logs flying through.
# Now pre-fill the database with buildbot data already prepared in the git repo.
make load-buildbot-logs 
# Open grafana and use admin/admin as credentials to view a dashboard.
xdg-open http://localhost:3000

I look forward to reading your suggestions or requests for insights.

Regards,
Konrad

tstellar · August 19, 2022, 12:41pm

This is really cool. It seems like we could use this to identify buildbots that fail often and need to be disabled. I’m trying to figure out the best visualization to help do that. I think maybe showing the failure count over the course of a week or a month rather than a day might help.

The raw failure count may skew the data towards faster buildbots, so maybe showing failure percentage (i.e. number of fails / number of runs) would be better too.

mehdi_amini · August 19, 2022, 3:10pm

I’d be interested in other kind of metrics as well, for example:

how long after the commit the build was completed?
how many commits are bundled together in a single build?

kwk · August 30, 2022, 3:06pm

@mehdi_amini thank you for this input. I’m actively working on this and will let you know when it’s done

Topic		Replies	Views
LLVM IRC channel flooded? LLVM Dev List Archives	46	349	May 22, 2015
Buildbot started commenting PRs on buid failures Project Infrastructure buildbot	15	691	July 17, 2024
Below are some buildbot numbers for the week of 04/28/2019 - 05/04/2019 LLVM Dev List Archives	0	74	May 14, 2019
buildbot failure in LLVM on clang-cmake-thumbv7-a15-full-sh LLVM Dev List Archives	6	172	September 29, 2015
False positive notifications around commit notifications LLVM Dev List Archives	15	183	October 28, 2021

Buildbot insights

Related topics