hydra/src
Maximilian Bosch 99afff03b0
hydra-queue-runner: drop broken connections from pool
Closes #1336

When restarting postgresql, the connections are still reused in
`hydra-queue-runner` causing errors like this

    main thread: Lost connection to the database server.
    queue monitor: Lost connection to the database server.

and no more builds being processed.

`hydra-evaluator` doesn't have that issue since it crashes right away.
We could let it retry indefinitely as well (see below), but I don't
want to change too much.

If the DB is still unreachable 10s later, the process will stop with a
non-zero exit code because of a missing DB connection. This however
isn't such a big deal because it will be immediately restarted
afterwards. With the current configuration, Hydra will never give up,
but restart (and retry) infinitely. To me that seems reasonable, i.e. to
retry DB connections on a long-running process. If this doesn't work
out, the monitoring should fire anyways because the queue fills up, but
I'm open to discuss that.

Please note that this isn't reproducible with the DB and the queue
runner on the same machine when using `services.hydra-dev`, because of
the `Requires=` dependency `hydra-queue-runner.service` ->
`hydra-init.service` -> `postgresql.service` that causes the queue
runner to be restarted on `systemctl restart postgresql`.

Internally, Hydra uses Nix's pool data structure: it basically has N
slots (here DB connections) and whenever a new one is requested, an idle
slot is provided or a new one is created (when N slots are active, it'll
be waited until one slot is free). The issue in the code here is however
that whenever an error is encountered, the slot is released, however the
same broken connection will be reused the next time. By using
`Pool::Handle::markBad`, Nix will drop a broken slot. This is now being
done when `pqxx::broken_connection` was caught.
2024-03-15 14:09:31 +01:00
..
hydra-eval-jobs Merge branch 'nix-next' into nix-2.20 2024-01-30 13:26:45 -05:00
hydra-evaluator Update to Nix 2.19 2023-11-30 15:26:46 -05:00
hydra-queue-runner hydra-queue-runner: drop broken connections from pool 2024-03-15 14:09:31 +01:00
lib Merge pull request #1361 from Ma27/fix-gitea-test 2024-03-08 15:28:07 +01:00
libhydra Update to Nix 2.19 2023-11-30 15:26:46 -05:00
root web: disable Sign in with Google popup 2024-01-25 09:27:46 +01:00
script More CA derivations prep 2024-01-25 21:32:22 -05:00
sql Add migration to drop non-null constraints 2024-01-26 11:53:58 -05:00
ttf Add font for the captcha 2013-03-04 12:16:13 +01:00
Makefile.am Revert "hydra-eval-jobs -> nix eval-hydra-jobs" 2020-02-19 20:36:52 +01:00
Makefile.PL perlcritic: use strict, use warnings 2021-09-06 22:13:33 -04:00