Pick the jobset that has used the smallest fraction of its share,
rather than the jobset furthest below its share in absolute terms.
This gives jobsets with a small share a quicker start (but they
will also run out of their share quicker).
Each jobset now has a "scheduling share" that determines how much of
the build farm's time it is entitled to. For instance, if a jobset
has 100 shares and the total number of shares of all jobsets is 1000,
it's entitled to 10% of the build farm's time. When there is a free
build slot for a given system type, the queue runner will select the
jobset that is furthest below its scheduling share over a certain time
window (currently, the last day). Withing that jobset, it will pick
the build with the highest priority.
So meta.schedulingPriority now only determines the order of builds
within a jobset, not between jobsets. This makes it much easier to
prioritise one jobset over another (e.g. nixpkgs:trunk over
nixpkgs:stdenv).
In your hydra config, you can add an arbitrary number of <s3config>
sections, with the following options:
* name (required): Bucket name
* jobs (required): A regex to match job names (in project:jobset:job
format) that should be backed up to this bucket
* compression_type: bzip2 (default), xz, or none
* prefix: String to prepend to all hydra-created s3 keys (if this is
meant to represent a directory, you should include the trailing slash,
e.g. "cache/"). Default "".
After each build with an output (i.e. successful or failed-with-output
builds), the output path and its closure are uploaded to the bucket as
.nar files, with corresponding .narinfos to enable use as a binary
cache.
This plugin requires that s3 credentials be available. It uses
Net::Amazon::S3, which as of this commit the nixpkgs version can
retrieve s3 credentials from the AWS_ACCESS_KEY_ID and
AWS_SECRET_ACCESS_KEY environment variables, or from ec2 instance
metadata when using an IAM role.
This commit also adds a hydra-s3-backup-collect-garbage program, which
uses hydra's gc roots directory to determine which paths are live, and
then deletes all files except nix-cache-info and any .nar or .narinfo
files corresponding to live paths. hydra-s3-backup-collect-garbage
respects the prefix configuration option, so it won't delete anything
outside of the hierarchy you give it, and it has the same credential
requirements as the plugin. Probably a timer unit running the garbage
collection periodically should be added to hydra-module.nix
Note that two of the added tests fail, due to a bug in the interaction
between Net::Amazon::S3 and fake-s3. Those behaviors work against real
s3 though, so I'm committing this even with the broken tests.
Signed-off-by: Shea Levy <shea@shealevy.com>
We now keep *all* unfinished evaluations of a jobset, in addition to
the <keepnr> most recent finished evaluations.
The main motivation is to ensure that mirror-{nixos,nixpkgs} work
properly: if building an evaluation takes too long, some of its builds
may already have been garbage-collected by the time the others finish.
We had Postgres barfing with this error:
DBIx::Class::Storage::DBI::_dbh_execute(): DBI Exception: DBD::Pg::st execute failed: ERROR: stack depth limit exceeded
because the ‘drvpath => [ @dependentDrvs ]’ in failDependents can
cause a query of unbounded size. (In this specific case there was a
failure of Bison, which has > 10000 dependent derivations.) So now we
just get all scheduled builds from the DB.
Aggregate constituents are derivations. However there can be multiple
builds in an evaluation that have the same derivation, i.e. they can
alias each other (e.g. "emacs", "emacs24" and "emacs24Packages.emacs"
in Nixpkgs). Previously we picked a build arbitrarily for the
AggregateConstituents table. Now we pick the one with the shortest
name (e.g. "emacs").
We now keep all builds in the N most recent evaluations of a jobset,
rather than the N most recent builds of every job. Note that this
means that typically fewer builds will be kept (since jobs may be
unchanged across evaluations).
For presentation purposes, we need to know what builds are part of an
aggregate build. So at evaluation time, look at the "members"
attribute, find the corresponding builds in the eval, and create a
mapping in the AggregateMembers table.
The NrBuilds table tracks the value of ‘select count(*) from Builds
where finished = 0’, keeping it up to date via a trigger. This is
necessary to make the /all page fast, since otherwise it needs to do a
sequential scan on the Builds table.
HipChat notification messages now say which committers were
responsible, e.g.
Job patchelf:trunk:tarball: Failed, probably due to 2 commits by Eelco Dolstra
Restarted builds whose derivation has been garbage-collected in the
meantime caused hydra-queue-runner to get stuck in a loop saying:
Jun 14 11:54:25 lucifer hydra-queue-runner[31844]: system type `x86_64-darwin': 0 active, 2 allowed, started 2 builds
Jun 14 11:54:25 lucifer hydra-queue-runner[31844]: {UNKNOWN}: path `/nix/store/wcizsch2garjlvs4pswrar47i1hwjaia-inconsolata.drv' is not valid at
/nix/store/ypkdm4v13yrk941rvp8h0y425a5ww6nm-hydra-0.1pre1353-40debf1/bin/.hydra-queue-runner-wrapped line 51. at
/nix/store/kjpsc2zdaxnd44azxyw60f2px839m1cd-hydra-perl-deps/lib/perl5/site_perl/5.16.2/Catalyst/Model/DBIC/Schema.pm line 501
This happens if the previous iteration took more than 60 seconds.
Then the queue runner may think that builds failed to start properly
and unlock them, e.g.
build 5264936 pid 19248 died, unlocking
build 5264951 pid 19248 died, unlocking
build 5257073 pid 19248 died, unlocking
...
Because we don't start a build if a dependency is already building,
it's possible that some or all of the $extraAllowed highest-priority
builds in the queue are not eligible. E.g. with $extraAllowed = 32,
we might start only 3 builds even though there are thousands in the
queue. The fix is to try all queued builds until $extraAllowed have
been started.
Issue #99.
Previously, for scheduled builds, "timestamp" contained the time the
build was added to the queue, while for finished builds, it was the
time the build finished. Now it's always the former.
See e.g. http://hydra.nixos.org/build/4915744.
P.S. existing active build steps of finished builds can be marked as
aborted by running:
update buildsteps set busy = 0, status = 4
where (build, stepnr) in
(select s.build, s.stepnr from buildsteps s join builds b on s.build = b.id where b.finished = 1 and s.busy = 1);
This is mostly so we don't have to pass around common parameters like
"db" and "config", and we don't have to check for the existence of
methods.
A plugin now looks like this:
package Hydra::Plugin::TwitterNotification;
use parent 'Hydra::Plugin';
sub buildFinished {
my ($self, $build, $dependents) = @_;
print STDERR "tweeting about build ", $build->id, "\n";
# Send tweet...
# Hydra database is $self->{db}.
}
You can now add plugins to Hydra by writing a module called
Hydra::Plugin::<whatever> and putting it in Perl's search path. The
only plugin operation currently supported in buildFinished, called
when hydra-build has finished doing a build.
For instance, a Twitter notification plugin would look like this:
package Hydra::Plugin::TwitterNotification;
sub buildFinished {
my ($self, $db, $config, $build, $dependents) = @_;
print STDERR "tweeting about build ", $build->id, "\n";
# send tweet...
}
1;
Previously this function didn't actually have a lot of effect. If a
build A had a dependency B, Hydra would start B first. But on the
next scan through the queue, it would start A anyway, because of the
"busy => 0" restriction.
Now the queue runner won't start a build if a dependency is already
running. (This is not necessarily optimal, since the build may have
other dependencies that don't correspond to a build in the queue but
could run. One day we'll start all Hydra builds in parallel...)
Also, for performance, use computeFSClosure instead of "nix-store
-qR". And don't bother with topological sorting because it didn't
have an effect anyway since the database returns dependencies in
arbitrary order.
This allows checking a jobset (say) at most once a day. It's also
possible to disable polling by setting the interval to 0. This is
useful for jobsets that use push notification or are manually
evaluated.
Avoid the frequently printed
hydra-queue-runner[10293]: system type `x86_64-linux': 2 active, 2 allowed, starting 0 builds
message. That information is only interesting when some build are
actually started.
This is a followup to commit
10882a1ffd ("Add multiple output
support").
* src/script/hydra-eval-guile-jobs.in (job-evaluations->sxml): Return
several `output' tags in the body, and remove the `outPath' attribute
of `job'.
Note that on machines that support multiple system types, EACH system type gets the full number of build slots, which is almost certainly not what we want.
External machines can now notify Hydra that it should check a
repository by sending a GET or PUSH request to /api/push, providing a
list of jobsets to be checked and/or a list of repository URLs. In
the latter case, all jobsets that have any of the specified
repositories as an input will be checked.
For instance, you can configure GitHub or BitBucket to send a request
to the URL
http://hydra.example.org/api/push?repos=git://github.com/NixOS/nixpkgs.git
to trigger evaluation of all jobsets that have
git://github.com/NixOS/nixpkgs.git as an input, or to the URL
http://hydra.example.org/api/push?jobsets=patchelf:trunk,nixpkgs:trunk
to trigger evaluation of just the specified jobsets.
It's pointless to store these, since Nix knows where the logs are.
Also handle (in fact require) Nix's new log storage scheme. Also some
cleanups in the build page.
If a build has ‘preferLocalBuilds = true’ (or we're not using remote
building), and the build has a non-permanent failure, then the build
status should be "Aborted" rather than "Failed". This is denoted by
an exit status of 100 from nix-store.
This happened in a pathological case in Nixpkgs: the "grub" job is
evaluated for i686-linux and x86_64-linux, but in the latter case it
returns the same derivation as in the former case. So only one build
should be added.
This gets rid of the openHydraDB function and ensures that we
open the database in a consistent way.
Also drop the PostgreSQL sequence hacks. They don't seem to be
necessary anymore.
When checking whether the jobset is unchanged, we need to compare with
the previous JobsetEval regardless of whether it had new builds.
Otherwise we'll keep adding new JobsetEval rows.
This isn't perfect because it doesn't handle the case where a
previous build hasn't finished yet. But at least it won't send mail
for old builds that fail while a newer build has already succeeded.
* Don't use isCurrent anymore; instead look up builds in the previous
jobset evaluation. (The isCurrent field is still maintained because
it's still used in some other places.)
* To determine whether to perform an evaluation, compare the hash of
the current inputs with the inputs of the previous jobset
evaluation, rather than checking if there was ever an evaluation
with those inputs. This way, if the inputs of an evaluation change
back to a previous state, we get a new jobset evaluation in the
database (and thus the latest jobset evaluation correctly represents
the latest state of the jobset).
* Improve performance by removing some unnecessary operations and
adding an index.
Since it read the actual roots after determining the set of desired
roots, there was a possibility that it would delete roots added by
hydra-evaluator or hydra-build while hydra-update-gc-roots was
running. This could cause a derivation to be garbage-collected before
the build was performed, for instance. Now the actual roots are read
first, so any root added after that time is not deleted.
The hydra-update-gc-roots script is taking around 95 minutes on our
Hydra instance (though a lot of that is I/O wait). This patch
significantly reduces the number of database queries. In particular,
the N most recent successful builds for each job in a jobset are now
determined in a single query. Also, it removes the calls to
readlink().
Prepared statements are sometimes much slower than unprepared
statements, because the planner doesn't have access to the query
parameters. This is the case for the active build steps query (in
/status), where a prepared statement is three orders of magnitude
slower. So disable the use of prepared statements in this case.
(Since the query parameters are constant here, it would be nicer if we
could tell DBIx::Class to prepare a statement with those parameters
fixed. But I don't know an easy way to do so.)