The previous query
select count(*) from builds b left join buildsteps s on s.build = b.id where busy = 1 and finished = 0
is suddenly taking several minutes. Probably PostgreSQL decided to use
a suboptimal query plan.
The maximum output size per build step (as the sum of the NARs of each
output) can be set via hydra.conf, e.g.
max-output-size = 1000000000
The default is 2 GiB.
Also refactored the build error / status handling a bit.
When using a binary cache store, the queue runner receives NARs from
the build machines, compresses them, and uploads them to the
cache. However, keeping multiple large NARs in memory can cause the
queue runner to run out of memory. This can happen for instance when
it's processing multiple ISO images concurrently.
The fix is to use a TokenServer to prevent the builder threads to
store more than a certain total size of NARs concurrently (at the
moment, this is hard-coded at 4 GiB). Builder threads that cause the
limit to be exceeded will block until other threads have finished.
The 4 GiB limit does not include certain other allocations, such as
for xz compression or for FSAccessor::readFile(). But since these are
unlikely to be more than the size of the NARs and hydra.nixos.org has
32 GiB RAM, it should be fine.
The old page didn't scale very well if you have 150K builds in the
queue, in fact it tended to make browsers hang. The new one just
shows, for each jobset, the number of queued builds. The actual builds
can be seen by going to the corresponding jobset page and looking at
the evals.
Same problem as d744362e4a.
at /nix/store/ksvsbr7pg4z69bv6fbbc8h7x7rm2104m-gcc-4.9.3/include/c++/4.9.3/bits/predefined_ops.h:166
__last@entry=..., __comp=...) at /nix/store/ksvsbr7pg4z69bv6fbbc8h7x7rm2104m-gcc-4.9.3/include/c++/4.9.3/bits/stl_algo.h:1827
__comp=...) at /nix/store/ksvsbr7pg4z69bv6fbbc8h7x7rm2104m-gcc-4.9.3/include/c++/4.9.3/bits/stl_algo.h:4717
Respects <slack> blocks in the hydra config, with attributes:
* jobs: a regexp matching the job name (in the format project:jobset:job)
* url: The URL to a slack incoming webhook
* force: If true, always send messages. Otherwise, only when the build status changes
Multiple <slack> blocks are allowed
To use the local Nix store (default):
store_mode = direct
To use a local binary cache:
store_mode = local-binary-cache
binary_cache_dir = /var/lib/hydra/binary-cache
To use an S3 bucket:
store_mode = s3-binary-cache
binary_cache_s3_bucket = my-nix-bucket
Also, respect binary_cache_{secret,public}_key_file for signing the
binary cache.
The queue runner no longer uses this field, and it doesn't provide
very interesting historical data (mostly SSH failures), but it takes
up a lot of space. Also, it contained some bad UTF-8 which was
preventing an upgrade to Postgres 9.5, so a good occasion to get rid
of it.
The required configuration in hydra.conf:
enable_google_login = 1
google_client_id = 238429sdjkds....apps.googleusercontent.com
and optionally persona_allowed_domains to restrict to one or more
domains.
This is necessary given the current size of the Nixpkgs/NixOS
jobsets. Once we have a Nix store + Postgres on SSD, we can reduce
this again.
Should really make this configurable...
The uid split a while back caused the web interface to create GC roots
in /nix/var/nix/gcroots/per-user/hydra-www, where they wouldn't be
purged by hydra-update-gc-roots. Thus restarted builds would
accumulate forever. The fix is to keep the roots in a shared directory
with gid=hydra.