In NixOS, the user generation script was changed to set the permissions `0700`
to a home-directory that's specified in the `users.users`-submodule with
`createHome` being set to `true`[1].
However, the home-directory of `hydra` is also the base directory of other services using
other users (e.g. `hydra-queue-runner`). With permissions being `0700`, processes with
such a user cannot traverse into `/var/lib/hydra` and thus not into subdirectories.
I guess that this issue was kind of hidden because `hydra-init.service` ensures
proper permissions[2]. However, if `hydra-init.service` is not restarted on a
system-activation, the permissions of `/var/lib/hydra` will be set back to `0700`
by the activation script that runs on each activation.
This has lead to errors like this in `hydra-queue-runner` on my Hydra:
```
Sep 20 09:11:30 hydra hydra-queue-runner[306]: error (ignored): error: cannot unlink '/var/lib/hydra/build-logs/7h/dssz03gazrkqzfmlr5cprd0dvkg4db-squashfs.img.drv': Permission denied
Sep 20 09:11:30 hydra hydra-queue-runner[306]: error (ignored): error: cannot unlink '/var/lib/hydra/build-logs/b9/350vd8jpv1f86i312c9pkdcd2z56aw-squashfs.img.drv': Permission denied
Sep 20 09:11:30 hydra hydra-queue-runner[306]: error (ignored): error: cannot unlink '/var/lib/hydra/build-logs/kz/vlq4v9a1rylcp4fsqqav3lcjgskky4-squashfs.img.drv': Permission denied
Sep 20 09:11:30 hydra hydra-queue-runner[306]: error (ignored): error: cannot unlink '/var/lib/hydra/build-logs/xd/hkjnbbr9jp7364pkn8zpk9v8xapj2c-nix-2.4pre20210917_37cc50f.drv': Permission denied
Sep 20 09:11:30 hydra hydra-queue-runner[306]: error (ignored): error: cannot unlink '/var/lib/hydra/build-logs/zn/9df7225fl8p7iavqqfvlyay4rf0msw-nix-2.4pre20210917_37cc50f.drv': Permission denied
Sep 20 09:11:30 hydra hydra-queue-runner[306]: possibly transient failure building ‘/nix/store/7hdssz03gazrkqzfmlr5cprd0dvkg4db-squashfs.img.drv’ on ‘roflmayr’: error: creating directory '/var/lib/hydra/build-logs': Permission denied
Sep 20 09:11:30 hydra hydra-queue-runner[306]: will retry ‘/nix/store/7hdssz03gazrkqzfmlr5cprd0dvkg4db-squashfs.img.drv’ after 543s
```
Because of that, I decided to remove the `createHome = true;` setting and instead used
`systemd-tmpfiles`[3] which can not only ensure that certain directories
exist, but also proper permissions.
With this change, we can also get rid of the manual setup in
`hydra-init.service` since `systemd-tmpfiles` will be executed by
`switch-to-configuration` before *any* systemd service gets started. On
startup, `systemd-tmpfiles-setup.service` is invoked within
`sysinit.target` being reached, so when `hydra-init.service` gets called
in `multi-user.target`, the structure already exists.
[1] fa0d499dbf
[2] 3cec908738/hydra-module.nix (L260-L262)
[3] https://www.freedesktop.org/software/systemd/man/systemd-tmpfiles.html
This will make it easier to track specifically where queries are being
made from (assuming a `log_line_prefix` that includes `%a` in the
postgres configuration).
* Fix issue #614: restart queue/evaluator on sufficient disk space available.
* Only try to stop the service if it is currently running.
* Use named variables and added restarting message.
The creation of the `pg_trgm` extension needs superuser power. So,
this patch makes the extension creation in the Hydra NixOS module when
a local database is used.
If it is not possible to create this extension (remote database for
instance with nosuperuser), the creation of the `pg_trgm` index is
skipped (this index speedup queries on builds.drvpath) and warnings
are emitted:
initialising the Hydra database schema...
WARNING: Can not create extension pg_trgm: permission denied to create extension "pg_trgm"
WARNING: HINT: Temporary provide superuser role to your Hydra Postgresql user and run the script src/sql/upgrade-57.sql
WARNING: The pg_trgm index on builds.drvpath has been skipped (slower complex queries on builds.drvpath)
This allows to keep smooth migrations: the migration process doesn't
require a manual step (but this manual step is recommended on big
remote databases).
* The "Jobset" page now shows when evaluations are in progress (rather
than just pending).
* Restored the ability to do a single evaluation from the command line
by doing "hydra-evaluator <project> <jobset>".
* Fix some consistency issues between jobset status in PostgreSQL and
in hydra-evaluator. In particular, "lastCheckedTime" was never
updated internally.
Without this I got the following error in my journal:
Oct 25 22:42:29 mymachine hydra-evaluator[4085]: starting evaluation of jobset ‘myproject:.jobsets’
Oct 25 22:42:29 mymachine hydra-evaluator[4085]: timeout: failed to run command ‘hydra-eval-jobset’: No such file or directory
Oct 25 22:42:29 mymachine hydra-evaluator[4085]: evaluation of jobset ‘myproject:.jobsets’ finished with status 32512
The uid split a while back caused the web interface to create GC roots
in /nix/var/nix/gcroots/per-user/hydra-www, where they wouldn't be
purged by hydra-update-gc-roots. Thus restarted builds would
accumulate forever. The fix is to keep the roots in a shared directory
with gid=hydra.