These are build steps that remain "busy" in the database even though
they have finished, because they couldn't be updated (e.g. due to a
PostgreSQL connection problem). To prevent them from showing up as
busy in the "Machine status" page, we now periodically purge them.
Previously, if the queue monitor thread encounters a build that Hydra
has previously built, it downloaded the output paths from the binary
cache, just to determine the build products and metrics. This is very
inefficient. In particular, when doing something like merging
nixpkgs:staging into nixpkgs:master, the queue monitor thread will be
locked up for a long time fetching files from S3, causing the build
farm to be mostly idle.
Of course this is entirely unnecessary, since the build
products/metrics are already in the Hydra database. So now we just
look up a previous build with the same output path, and copy the
products/metrics.
Mutliple <githubstatus> sections are possible:
* jobs: regexp for jobs to match
* inputs: the input which corresponds to the github repo/rev whose
status we want to report. Can be repeated
* authorization: Verbatim contents of the Authorization header. See
https://developer.github.com/v3/#authentication.
Otherwise, the browser may mix up HTML and JSON responses if it has
requested both. For example, hitting the back button to return to a
job metric page will show a JSON response, because that was the last
thing the browser fetched for that URL.
This requires Catalyst::Action::Rest >= 1.20.
The previous query
select count(*) from builds b left join buildsteps s on s.build = b.id where busy = 1 and finished = 0
is suddenly taking several minutes. Probably PostgreSQL decided to use
a suboptimal query plan.
The maximum output size per build step (as the sum of the NARs of each
output) can be set via hydra.conf, e.g.
max-output-size = 1000000000
The default is 2 GiB.
Also refactored the build error / status handling a bit.
When using a binary cache store, the queue runner receives NARs from
the build machines, compresses them, and uploads them to the
cache. However, keeping multiple large NARs in memory can cause the
queue runner to run out of memory. This can happen for instance when
it's processing multiple ISO images concurrently.
The fix is to use a TokenServer to prevent the builder threads to
store more than a certain total size of NARs concurrently (at the
moment, this is hard-coded at 4 GiB). Builder threads that cause the
limit to be exceeded will block until other threads have finished.
The 4 GiB limit does not include certain other allocations, such as
for xz compression or for FSAccessor::readFile(). But since these are
unlikely to be more than the size of the NARs and hydra.nixos.org has
32 GiB RAM, it should be fine.
The old page didn't scale very well if you have 150K builds in the
queue, in fact it tended to make browsers hang. The new one just
shows, for each jobset, the number of queued builds. The actual builds
can be seen by going to the corresponding jobset page and looking at
the evals.
Same problem as d744362e4a.
at /nix/store/ksvsbr7pg4z69bv6fbbc8h7x7rm2104m-gcc-4.9.3/include/c++/4.9.3/bits/predefined_ops.h:166
__last@entry=..., __comp=...) at /nix/store/ksvsbr7pg4z69bv6fbbc8h7x7rm2104m-gcc-4.9.3/include/c++/4.9.3/bits/stl_algo.h:1827
__comp=...) at /nix/store/ksvsbr7pg4z69bv6fbbc8h7x7rm2104m-gcc-4.9.3/include/c++/4.9.3/bits/stl_algo.h:4717
Respects <slack> blocks in the hydra config, with attributes:
* jobs: a regexp matching the job name (in the format project:jobset:job)
* url: The URL to a slack incoming webhook
* force: If true, always send messages. Otherwise, only when the build status changes
Multiple <slack> blocks are allowed
To use the local Nix store (default):
store_mode = direct
To use a local binary cache:
store_mode = local-binary-cache
binary_cache_dir = /var/lib/hydra/binary-cache
To use an S3 bucket:
store_mode = s3-binary-cache
binary_cache_s3_bucket = my-nix-bucket
Also, respect binary_cache_{secret,public}_key_file for signing the
binary cache.
The queue runner no longer uses this field, and it doesn't provide
very interesting historical data (mostly SSH failures), but it takes
up a lot of space. Also, it contained some bad UTF-8 which was
preventing an upgrade to Postgres 9.5, so a good occasion to get rid
of it.
The required configuration in hydra.conf:
enable_google_login = 1
google_client_id = 238429sdjkds....apps.googleusercontent.com
and optionally persona_allowed_domains to restrict to one or more
domains.
This is necessary given the current size of the Nixpkgs/NixOS
jobsets. Once we have a Nix store + Postgres on SSD, we can reduce
this again.
Should really make this configurable...
The uid split a while back caused the web interface to create GC roots
in /nix/var/nix/gcroots/per-user/hydra-www, where they wouldn't be
purged by hydra-update-gc-roots. Thus restarted builds would
accumulate forever. The fix is to keep the roots in a shared directory
with gid=hydra.
Regression introduced by 1fdc258de0.
The commit introduced a channel/custom PathPart which uses the new
custom channel expressions, but I forgot to remove CaptureArgs, so the
URL really is channel/latest/ignored-value.
Signed-off-by: aszlig <aszlig@redmoonstudios.org>
Reported-by: Peter Simons <simons@cryp.to>
This removes the "busy", "locker" and "logfile" columns, which are no
longer used by the queue runner. The "Running builds" page now only
shows builds that have an active build step.
Previously, priority bumps could take a long time to get noticed if
getQueuedBuilds() was busy processing zillions of queue
additions. (This was made worse by the reintroduction of substitute
checking.)
We have this set in upgrade-42.sql, so it's better to stay consistent
with the basic SQL file to avoid problems with new Hydra installations.
Signed-off-by: aszlig <aszlig@redmoonstudios.org>
Reported-by: Eelco Dolstra <eelco.dolstra@logicblox.com>
There is still a tiny window between the calls to nix-prefetch-* and
addTempRoot. This could be eliminated by adding a "-o" option to
nix-prefetch-*, or by not using those scripts at all (and use
addToStore directly).
This allows Hydra to use binaries from available binary caches. It
makes the queue monitor thread quite a bit slower, so if you don't
want to use binary caches, it's better to add "--option
build-use-substitutes false" to the hydra-queue-runner invocation.
Fixed#243.
The last paragraph states about package installation of the "following"
jobs, but it only applies to generic channels, so let's only display it
there.
Signed-off-by: aszlig <aszlig@redmoonstudios.org>
So this is the final part which is needed in order to be able to deliver
custom channels, everything else is now just polishing.
We do this by simply redirecting to the build product download URL and
we use binary_cache_url the same way as in NixChannel.pm.
Signed-off-by: aszlig <aszlig@redmoonstudios.org>
We should now get an overview and help text on how to add a particular
channel and also a bit of information about the builds that are required
for a channel to get upgraded.
Right now we only select the latest successful build in the latest
successful evaluation, so if someone wants to have more information about
which channel has failed, (s)he still has to look at the "Channels" tab
of the jobset.
We can make this more fancy at some later point if this is really
needed, because right now we're only interested in the latest build,
because it's the only thing necessary to deliver the channel contents.
Signed-off-by: aszlig <aszlig@redmoonstudios.org>
It's actually lower-case _despite_ the spelling in the SQL file(s),
because the schema auto-generator from DBIx::Class doesn't take it into
account because it's working on SQLite and the latter seems to ignore
case.
Signed-off-by: aszlig <aszlig@redmoonstudios.org>
We want to have contents and detauls of channel expressions as well and
we already have that in product.type == file, so why not reuse the same
for the channel expression?
Signed-off-by: aszlig <aszlig@redmoonstudios.org>
We now have a searchBuildsAndEvalsForJobset, which creates such a
mapping for us, so we don't need to duplicate code in jobs_tab and
channels_tab.
Also, we're going to use this for the overview of a particular channel
as well, so it makes sense to put it in CatalystUtils instead of
directly in Jobset.pm.
Instead of eval->jobs, it's now eval->builds, because it's really an
aggregate over the builds schema, rather than the job schema.
Signed-off-by: aszlig <aszlig@redmoonstudios.org>
We only allow channel/latest anyway, so it really doesn't make sense to
explicitly specify this in the PathPart and provide other dispatcher
once we have more than just "latest", which greatly simplifies the
dispatch tree.
Signed-off-by: aszlig <aszlig@redmoonstudios.org>
We now have a column for that, so no need for counting rows which was a
bit inefficient anyway, because we only would have needed the first row
in the result.
Signed-off-by: aszlig <aszlig@redmoonstudios.org>