Commit graph

248 commits

Author SHA1 Message Date
Shea Levy 26470f1656 Check all inputs for blame but only email selected inputs
Signed-off-by: Shea Levy <shea@shealevy.com>
2013-10-08 14:47:24 -04:00
Shea Levy 3e4a4e3761 Propagate checkresponsible from JobsetInput to BuildInput
Signed-off-by: Shea Levy <shea@shealevy.com>
2013-10-08 13:24:49 -04:00
Eelco Dolstra 720c3892a3 Use delete instead of delete_all
DBIC's delete_all method fetches all rows separately, which is slow.
2013-10-03 19:42:44 +02:00
Eelco Dolstra b1f7096935 Restore old findBuildDependencyInQueue behaviour 2013-10-03 13:08:32 +02:00
Eelco Dolstra b1a26e6caa Revert "Add a dependency_lookup configuration option to enable (slow) dependency lookup in queue. This behaviour was disabled temporarily in accefbb79 due to slowness in very large queues, but some people might be dependent on it, so it is configurable until the previous behaviour is implemented more efficiently."
This reverts commit 24f5a6b15f.
2013-10-03 13:07:32 +02:00
Rob Vermaas 24f5a6b15f Add a dependency_lookup configuration option to enable (slow) dependency lookup in queue. This behaviour was disabled temporarily in accefbb79 due to slowness in very large queues, but some people might be dependent on it, so it is configurable until the previous behaviour is implemented more efficiently. 2013-10-03 09:09:18 +00:00
Eelco Dolstra 4dd1197d89 Fix uninitialized value warning 2013-09-30 10:01:09 +00:00
Eelco Dolstra af2b0c8bad Remove dead code 2013-09-30 11:57:38 +02:00
Eelco Dolstra d46ebeea99 Distinguish between permanent evaluation errors and transient input errors
Fixes #112.
2013-09-25 16:21:16 +02:00
Eelco Dolstra e1c9e28589 Handle UTF-8 characters in eval error messages 2013-09-25 15:51:03 +02:00
Eelco Dolstra a8db329839 Warn against multiple jobs with the same name 2013-09-25 15:30:59 +02:00
Eelco Dolstra a2491f76a4 Use the same start/stop time for the build steps as for the build 2013-09-25 01:00:20 +02:00
Eelco Dolstra f037a318e3 *headdesk*
DBIC::Class helpfully doesn't warn you when you're matching against
unselected columns.  So this query actually returned all builds...
2013-09-25 01:00:20 +02:00
Rob Vermaas b1e29e50a7 Only send email notification of evaluation error when the evaluation error has changed. Fixes #121. 2013-09-24 12:01:57 -04:00
Shea Levy 6d5a3d0580 Derivations with multiple outputs break the 'link name is store path' assumption
Signed-off-by: Shea Levy <shea@shealevy.com>
2013-09-22 21:26:59 -04:00
Eelco Dolstra 77dbf55abb hydra-queue-runner: Tweaked the selection method
Pick the jobset that has used the smallest fraction of its share,
rather than the jobset furthest below its share in absolute terms.
This gives jobsets with a small share a quicker start (but they
will also run out of their share quicker).
2013-09-21 19:54:58 +00:00
Eelco Dolstra cf43c605cd hydra-queue-runner: Cache the lookup of time spent per jobset 2013-09-21 19:54:46 +00:00
Eelco Dolstra 4cdf1a270d hydra-queue-runner: Set the start time properly 2013-09-21 19:38:02 +00:00
Eelco Dolstra 52ce662710 hydra-queue-runner: Don't kill builds we just started 2013-09-21 20:51:43 +02:00
Eelco Dolstra accefbb798 hydra-queue-runner: Disable findBuildDependencyInQueue for now
It's way too slow.
2013-09-21 20:35:02 +02:00
Eelco Dolstra 9602499c1c hydra-evaluator: Do the actual work in a subprocess
This should get rid of the slow memory leaks exhibited by
hydra-evaluator.
2013-09-21 15:49:27 +00:00
Eelco Dolstra 4ed877360b hydra-queue-runner: Improved scheduling
Each jobset now has a "scheduling share" that determines how much of
the build farm's time it is entitled to.  For instance, if a jobset
has 100 shares and the total number of shares of all jobsets is 1000,
it's entitled to 10% of the build farm's time.  When there is a free
build slot for a given system type, the queue runner will select the
jobset that is furthest below its scheduling share over a certain time
window (currently, the last day).  Withing that jobset, it will pick
the build with the highest priority.

So meta.schedulingPriority now only determines the order of builds
within a jobset, not between jobsets.  This makes it much easier to
prioritise one jobset over another (e.g. nixpkgs:trunk over
nixpkgs:stdenv).
2013-09-21 14:57:01 +00:00
Shea Levy 74388353b5 Add a plugin for backing up builds in s3
In your hydra config, you can add an arbitrary number of <s3config>
sections, with the following options:

* name (required): Bucket name
* jobs (required): A regex to match job names (in project:jobset:job
  format) that should be backed up to this bucket
* compression_type: bzip2 (default), xz, or none
* prefix: String to prepend to all hydra-created s3 keys (if this is
  meant to represent a directory, you should include the trailing slash,
  e.g. "cache/"). Default "".

After each build with an output (i.e. successful or failed-with-output
builds), the output path and its closure are uploaded to the bucket as
.nar files, with corresponding .narinfos to enable use as a binary
cache.

This plugin requires that s3 credentials be available. It uses
Net::Amazon::S3, which as of this commit the nixpkgs version can
retrieve s3 credentials from the AWS_ACCESS_KEY_ID and
AWS_SECRET_ACCESS_KEY environment variables, or from ec2 instance
metadata when using an IAM role.

This commit also adds a hydra-s3-backup-collect-garbage program, which
uses hydra's gc roots directory to determine which paths are live, and
then deletes all files except nix-cache-info and any .nar or .narinfo
files corresponding to live paths. hydra-s3-backup-collect-garbage
respects the prefix configuration option, so it won't delete anything
outside of the hierarchy you give it, and it has the same credential
requirements as the plugin. Probably a timer unit running the garbage
collection periodically should be added to hydra-module.nix

Note that two of the added tests fail, due to a bug in the interaction
between Net::Amazon::S3 and fake-s3. Those behaviors work against real
s3 though, so I'm committing this even with the broken tests.

Signed-off-by: Shea Levy <shea@shealevy.com>
2013-09-18 18:32:58 +02:00
Eelco Dolstra 4705af48b8 hydra-build: Hack to handle timeouts 2013-09-18 13:06:35 +00:00
Eelco Dolstra e54b536bb7 hydra-update-gc-roots: Don't keep the most recent successful view result
Views are deprecated.
2013-09-18 11:12:33 +00:00
Eelco Dolstra 2845d46d21 hydra-update-gc-roots: Keep more evals
We now keep *all* unfinished evaluations of a jobset, in addition to
the <keepnr> most recent finished evaluations.

The main motivation is to ensure that mirror-{nixos,nixpkgs} work
properly: if building an evaluation takes too long, some of its builds
may already have been garbage-collected by the time the others finish.
2013-09-18 11:10:10 +00:00
Eelco Dolstra 3f68076577 hydra-build: Don't send a giant query to the database
We had Postgres barfing with this error:

  DBIx::Class::Storage::DBI::_dbh_execute(): DBI Exception: DBD::Pg::st execute failed: ERROR: stack depth limit exceeded

because the ‘drvpath => [ @dependentDrvs ]’ in failDependents can
cause a query of unbounded size.  (In this specific case there was a
failure of Bison, which has > 10000 dependent derivations.)  So now we
just get all scheduled builds from the DB.
2013-09-10 11:01:29 +00:00
Eelco Dolstra 35aad40692 Kill builds that produce more than 64 MiB of log output 2013-09-10 10:33:55 +00:00
Rob Vermaas bf42392fe4 Fix typo. 2013-08-27 15:12:41 +02:00
Eelco Dolstra a57957df84 Handle job aliases in AggregateConstituents
Aggregate constituents are derivations.  However there can be multiple
builds in an evaluation that have the same derivation, i.e. they can
alias each other (e.g. "emacs", "emacs24" and "emacs24Packages.emacs"
in Nixpkgs).  Previously we picked a build arbitrarily for the
AggregateConstituents table.  Now we pick the one with the shortest
name (e.g. "emacs").
2013-08-27 11:48:02 +02:00
Eelco Dolstra 46f8b25c1f Keep builds that failed with output
The user may want to look at the output, so they shouldn't be
GC'ed right away.
2013-08-16 16:36:06 +02:00
Eelco Dolstra d16738e130 hydra-update-gc-roots: Keep the most recent evaluations
We now keep all builds in the N most recent evaluations of a jobset,
rather than the N most recent builds of every job.  Note that this
means that typically fewer builds will be kept (since jobs may be
unchanged across evaluations).
2013-08-16 16:21:30 +02:00
Eelco Dolstra 1776d9118f Rename aggregate members to constituents 2013-08-15 02:33:10 +02:00
Eelco Dolstra d58142b3f0 Store aggregate members in the database
For presentation purposes, we need to know what builds are part of an
aggregate build.  So at evaluation time, look at the "members"
attribute, find the corresponding builds in the eval, and create a
mapping in the AggregateMembers table.
2013-08-14 01:59:29 +02:00
Eelco Dolstra 452c8e36d1 Materialize the number of finished builds
The NrBuilds table tracks the value of ‘select count(*) from Builds
where finished = 0’, keeping it up to date via a trigger.  This is
necessary to make the /all page fast, since otherwise it needs to do a
sequential scan on the Builds table.
2013-08-12 20:19:10 +02:00
Shea Levy 166d56088f Call buildFinished when a cached build is added
Signed-off-by: Shea Levy <shea@shealevy.com>
2013-07-08 13:35:34 -04:00
Eelco Dolstra d18fc4fc38 Include names of committers in HipChat notifications
HipChat notification messages now say which committers were
responsible, e.g.

  Job patchelf:trunk:tarball: Failed, probably due to 2 commits by Eelco Dolstra
2013-07-02 13:54:18 +02:00
Eelco Dolstra 7e11d01abf Remove tabs 2013-07-02 11:37:16 +02:00
Eelco Dolstra 98a105fe69 hydra-build: Give a nicer error message if the derivation is gone 2013-06-14 11:01:53 +00:00
Eelco Dolstra cceab7308b hydra-queue-runner: Handle restarted builds whose derivation is gone
Restarted builds whose derivation has been garbage-collected in the
meantime caused hydra-queue-runner to get stuck in a loop saying:

Jun 14 11:54:25 lucifer hydra-queue-runner[31844]: system type `x86_64-darwin': 0 active, 2 allowed, started 2 builds
Jun 14 11:54:25 lucifer hydra-queue-runner[31844]: {UNKNOWN}: path `/nix/store/wcizsch2garjlvs4pswrar47i1hwjaia-inconsolata.drv' is not valid at
/nix/store/ypkdm4v13yrk941rvp8h0y425a5ww6nm-hydra-0.1pre1353-40debf1/bin/.hydra-queue-runner-wrapped line 51. at
/nix/store/kjpsc2zdaxnd44azxyw60f2px839m1cd-hydra-perl-deps/lib/perl5/site_perl/5.16.2/Catalyst/Model/DBIC/Schema.pm line 501
2013-06-14 11:00:05 +00:00
Eelco Dolstra 40debf1515 hydra-queue-runner: Don't unlock builds we just started
This happens if the previous iteration took more than 60 seconds.
Then the queue runner may think that builds failed to start properly
and unlock them, e.g.

build 5264936 pid 19248 died, unlocking
build 5264951 pid 19248 died, unlocking
build 5257073 pid 19248 died, unlocking
...
2013-06-07 20:15:37 +00:00
Eelco Dolstra 5d9b7c6ab2 Speed up findBuildDependencyInQueue
This was taking a long time due to the giant SQL query.

Issue #99.
2013-06-07 20:15:32 +00:00
Eelco Dolstra 8e36343b62 hydra-queue-runner: Start as many builds as possible on each iteration
Because we don't start a build if a dependency is already building,
it's possible that some or all of the $extraAllowed highest-priority
builds in the queue are not eligible.  E.g. with $extraAllowed = 32,
we might start only 3 builds even though there are thousands in the
queue.  The fix is to try all queued builds until $extraAllowed have
been started.

Issue #99.
2013-06-07 20:15:20 +00:00
Eelco Dolstra 1f1615e80b Support revision control systems via plugins 2013-05-25 15:36:58 -04:00
Eelco Dolstra 9ac363d32a Fill in starttime/stoptime for cached builds 2013-05-24 12:43:02 -04:00
Eelco Dolstra 57b2bb0674 Let Builds.timestamp refer to the time the build was added
Previously, for scheduled builds, "timestamp" contained the time the
build was added to the queue, while for finished builds, it was the
time the build finished.  Now it's always the former.
2013-05-23 10:45:49 -04:00
Rob Vermaas 43785dfca9 Merge pull request #85 from peti/dont-clutter-system-log-with-debug-messages
hydra-queue-runner: don't clutter the system log with debug message
2013-05-10 14:52:13 -07:00
Eelco Dolstra 3939974df8 Set build status to 1 if the primary build failed 2013-05-10 00:51:45 +02:00
Eelco Dolstra 102359bf44 Add separate build step status codes for cached failures and timeouts 2013-05-09 22:13:01 +00:00
Eelco Dolstra a6d8566faf If a build aborts, mark any remaining active build steps as aborted
See e.g. http://hydra.nixos.org/build/4915744.

P.S. existing active build steps of finished builds can be marked as
aborted by running:

update buildsteps set busy = 0, status = 4
  where (build, stepnr) in
    (select s.build, s.stepnr from buildsteps s join builds b on s.build = b.id where b.finished = 1 and s.busy = 1);
2013-05-09 18:03:34 +02:00