Commit graph

278 commits

Author SHA1 Message Date
Eelco Dolstra 744cee134e hydra-queue-runner: Compress binary cache NARs using xz 2016-02-15 21:56:53 +01:00
Eelco Dolstra 2d0dd7fb49 hydra-queue-runner: Write directly to a binary cache 2016-02-15 21:10:29 +01:00
Eelco Dolstra 92d8b59361 Process Nix API changes 2016-02-11 15:59:47 +01:00
Eelco Dolstra 97f8c61928 Fix hydra-queue-runner --build-one 2015-12-29 17:53:33 +01:00
Eelco Dolstra c087472c71 Remove superfluous "has" function 2015-11-02 14:29:12 +01:00
Eelco Dolstra 2d128d2a6f Don't show redundant "removing machine..." messages 2015-10-30 18:22:43 +01:00
Eelco Dolstra d8d188301d Fix division-by-zero crash
Not clear why step_->jobsets was empty...
2015-10-30 18:01:48 +01:00
Eelco Dolstra 1ff48da3d3 int2String -> std::to_string 2015-10-30 18:01:38 +01:00
Eelco Dolstra 4d1816b152 Remove obsolete Builds columns and provide accurate "Running builds"
This removes the "busy", "locker" and "logfile" columns, which are no
longer used by the queue runner. The "Running builds" page now only
shows builds that have an active build step.
2015-10-27 15:37:17 +01:00
Eelco Dolstra 53c80d9526 getQueuedBuilds(): Periodically stop to handle priority bumps
Previously, priority bumps could take a long time to get noticed if
getQueuedBuilds() was busy processing zillions of queue
additions. (This was made worse by the reintroduction of substitute
checking.)
2015-10-22 17:00:46 +02:00
Eelco Dolstra 71bf7e02d5 Use nix::willBuildLocally() 2015-10-21 15:44:29 +02:00
Eelco Dolstra 8e8e31ce86 Re-implement log size limits
The old queue runner already had this. However, we now store "log
limit exceeded" as a separate status code in the database.
2015-10-06 17:35:08 +02:00
Eelco Dolstra 82504fe010 hydra-queue-runner: Use substitutes
This allows Hydra to use binaries from available binary caches. It
makes the queue monitor thread quite a bit slower, so if you don't
want to use binary caches, it's better to add "--option
build-use-substitutes false" to the hydra-queue-runner invocation.

Fixed #243.
2015-10-05 14:57:44 +02:00
Eelco Dolstra f8141fdc98 Set propagatedFrom for cached failed build steps 2015-09-11 15:55:26 +02:00
Eelco Dolstra 6075ac6fed Remove localhost hack 2015-09-09 16:50:59 +02:00
Eelco Dolstra ee9bf7ace7 Account steps with preferLocalBuild as a separate system type
They will show up in machineTypes as (e.g.) x86_64-linux:local instead
of x86_64-linux. This is to prevent the Hydra provisioner from
creating machines for steps that are supposed to be executed locally.
2015-09-02 13:42:25 +02:00
Eelco Dolstra 7e954aff03 Keep machine stats even when a machine is removed from the machines file
This is important for the Hydra provisioner, since it needs to be able
to see whether a disabled machine still has jobs running on it.
2015-09-02 13:31:47 +02:00
Eelco Dolstra 2a7fbd57cc Allow the machines file to specify host public keys
It's easier for the Hydra provisioner to put host public keys in the
machines file than to separately manage the known_hosts file
(especially when the provisioner runs on a different machine).
2015-08-26 13:43:02 +02:00
Eelco Dolstra 7aa52517e9 Support multiple machines files
This is primarily useful for the Hydra provisioner, which can write
its machines to another file than /etc/nix/machines.
2015-08-25 15:34:53 +02:00
Eelco Dolstra 7a654259ff Wake the dispatcher when the machines file has changed 2015-08-17 15:48:10 +02:00
Eelco Dolstra 092d60735b Keep track of wait time per system type
I.e., how much time the currently runnable steps per system type have
been waiting. This is useful for deciding whether to provision more
machines.
2015-08-17 15:45:44 +02:00
Eelco Dolstra 99bfc37764 Don't abort steps that have an unsupported system type
This is necessary because the required system type can become
available later (e.g. by being provisioned by the
auto-scaler). However, in the future, we may want to fail steps if
they have been unsupported for more than a certain amount of time.
2015-08-17 15:10:41 +02:00
Eelco Dolstra ea1eb2e3fb Keep track of requiredSystemFeatures in the machine stats
For example, steps that require the "kvm" feature may require a
different kind of machine to be provisioned. This can also be used to
require performance-sensitive tests to run on a particular kind of
machine, e.g., by setting requiredSystemFeatures to something like
"ec2-i2.8xlarge".
2015-08-17 14:37:57 +02:00
Eelco Dolstra d571e44b86 Keep stats for the Hydra auto scaler
"hydra-queue-runner --status" now prints how many runnable and running
build steps exist for each machine type. This allows additional
machines to be provisioned based on the Hydra load.
2015-08-17 13:50:41 +02:00
Eelco Dolstra d4759c1da2 hydra-queue-runner: Detect changes to the scheduling shares 2015-08-12 13:17:56 +02:00
Eelco Dolstra 576dc0c120 For completeness, re-implement meta.schedulingPriority 2015-08-12 12:05:43 +02:00
Eelco Dolstra b7965df928 Load the queue in order of global priority 2015-08-11 02:14:34 +02:00
Eelco Dolstra 97f11baa8d Revive jobset scheduling
(I.e. taking the jobset scheduling share into account.)
2015-08-11 01:31:56 +02:00
Eelco Dolstra eb13007fe6 Allow build to be bumped to the front of the queue via the web interface
Builds now have a "Bump up" action. This will cause the queue runner
to prioritise the steps of the build above all other steps.
2015-08-10 16:19:47 +02:00
Eelco Dolstra 27182c7c1d Start steps in order of ascending build ID 2015-08-10 16:19:47 +02:00
Eelco Dolstra 593850b956 Fix potential race in dispatcher wakeup 2015-08-10 12:54:55 +02:00
Eelco Dolstra 6a1c950e94 Unindent 2015-08-10 11:33:22 +02:00
Eelco Dolstra f21b88e388 Remove superfluous check 2015-08-07 04:20:34 +02:00
Eelco Dolstra f1fbf8c605 Fix race in finishing builds that have been cancelled 2015-08-07 04:18:48 +02:00
Eelco Dolstra ff3f5eb4d8 Fix remote building on Nix 1.10 2015-07-31 03:41:55 +02:00
Eelco Dolstra 5b9a288123 Workaround for RemoteStore not supporting cmdBuildDerivation yet 2015-07-31 03:39:20 +02:00
Eelco Dolstra 4d26546d3c Add support for tracking custom metrics
Builds can now emit metrics that Hydra will store in its database and
render as time series via flot charts. Typical applications are to
keep track of performance indicators, coverage percentages, artifact
sizes, and so on.

For example, a coverage build can emit the coverage percentage as
follows:

  echo "lineCoverage $pct %" > $out/nix-support/hydra-metrics

Graphs of all metrics for a job can be seen at

  http://.../job/<project>/<jobset>/<job>#tabs-charts

Specific metrics are also visible at

  http://.../job/<project>/<jobset>/<job>/metric/<metric>

The latter URL also allows getting the data in JSON format (e.g. via
"curl -H 'Accept: application/json'").
2015-07-31 00:57:30 +02:00
Eelco Dolstra c18fb0ad74 Temporarily disable machines after a connection failure 2015-07-21 15:58:47 +02:00
Eelco Dolstra 7e026d35f7 Split hydra-queue-runner.cc more 2015-07-21 15:14:17 +02:00
Eelco Dolstra 5370be9f52 hydra-queue-runner: Use cmdBuildDerivation
See 1511aa9f48 and eda2f36c2a.
2015-07-21 01:54:24 +02:00
Eelco Dolstra 3ded87329d Keep track of how many threads are waiting 2015-07-10 19:10:14 +02:00
Eelco Dolstra 89fb723ace Notify the queue runner when a build is deleted 2015-07-08 11:43:35 +02:00
Eelco Dolstra 35b7c4f82b Allow only 1 thread to send a closure to a given machine at the same time
This prevents a race where multiple threads see that machine X is
missing path P, and start sending it concurrently. Nix handles this
correctly, but it's still wasteful (especially for the case where P ==
GHC).

A more refined scheme would be to have per machine, per path locks.
2015-07-07 14:06:48 +02:00
Eelco Dolstra 16696a4aee Namespace cleanup 2015-07-07 10:29:43 +02:00
Eelco Dolstra 63745b8e25 Move buildRemote() into State 2015-07-07 10:25:33 +02:00
Eelco Dolstra df29527531 Refactor 2015-07-07 10:17:21 +02:00
Eelco Dolstra dffb629b8a Unify Hydra's NixOS module with the one used for hydra.nixos.org
In particular, the queue runner and web server now run under different
UIDs.
2015-07-02 01:01:44 +02:00
Eelco Dolstra 2ece42b2b9 Support preferLocalBuild
Derivations with "preferLocalBuild = true" can now be executed on
specific machines (typically localhost) by setting the mandary system
features field to include "local". For example:

  localhost x86_64-linux,i686-linux - 10 100 - local

says that "localhost" can *only* do builds with "preferLocalBuild =
true". The speed factor of 100 will make the machine almost always win
over other machines.
2015-06-30 00:20:19 +02:00
Eelco Dolstra 008d610467 getQueuedBuilds(): Don't catch errors while loading a build from the queue
Otherwise we never recover from reset daemon connections, e.g.

  hydra-queue-runner[16106]: while loading build 599369: cannot start daemon worker: reading from file: Connection reset by peer
  hydra-queue-runner[16106]: while loading build 599236: writing to file: Broken pipe
  ...

The error is now handled queueMonitor(), causing the next call to
queueMonitorLoop() to create a new connection.
2015-06-26 21:06:35 +02:00
Eelco Dolstra 2f4676bd97 JSONObject doesn't handle 64-bit integers 2015-06-25 16:59:48 +02:00
Eelco Dolstra c6fcce3b3b Moar stats 2015-06-25 16:47:39 +02:00
Eelco Dolstra 18a3c3ff1c Update "make check" for the new queue runner
Also, if the machines file contains an entry for localhost, then run
"nix-store --serve" directly, without going through SSH.
2015-06-25 16:47:39 +02:00
Eelco Dolstra 32210905d8 Automatically reload $NIX_REMOTE_SYSTEMS when it changes
Otherwise, you'd have to restart the queue runner to add or remove
machines.
2015-06-25 16:47:25 +02:00
Eelco Dolstra 1a0e1eb5a0 More stats 2015-06-24 13:19:27 +02:00
Eelco Dolstra 3f8891b6ff Fix incorrect debug message 2015-06-23 17:53:15 +02:00
Eelco Dolstra af5cbe97aa createStep(): Cache finished derivations
This gets rid of a lot of redundant calls to readDerivation().
2015-06-23 03:25:31 +02:00
Eelco Dolstra 681f63a382 Typo 2015-06-23 02:15:11 +02:00
Eelco Dolstra 524ee295e0 Fix sending notifications in the successful case 2015-06-23 02:13:06 +02:00
Eelco Dolstra 4db7c51b5c Rate-limit the number of threads copying closures at the same time
Having a hundred threads doing I/O at the same time is bad on magnetic
disks because of the excessive disk seeks. So allow only 4 threads to
copy closures in parallel.
2015-06-23 01:49:14 +02:00
Eelco Dolstra a317d24b29 hydra-queue-runner: Send build notifications
Since our notification plugins are written in Perl, sending
notification from C++ requires a small Perl helper named
‘hydra-notify’.
2015-06-23 00:14:49 +02:00
Eelco Dolstra 5312e1209b Keep per-machine stats 2015-06-22 17:11:17 +02:00
Eelco Dolstra d06366e7cf Remove obsolete comment 2015-06-22 16:59:50 +02:00
Eelco Dolstra e069ee960e Doh 2015-06-22 16:58:40 +02:00
Eelco Dolstra 41ba7418e2 hydra-queue-runner: More stats 2015-06-22 15:34:33 +02:00
Eelco Dolstra 62b53a0a47 Guard against concurrent invocations of hydra-queue-runner 2015-06-22 14:24:03 +02:00
Eelco Dolstra fbd7c02217 Periodically dump/log status 2015-06-22 14:15:43 +02:00
Eelco Dolstra 4f4141e1db Add command ‘hydra-queue-runner --status’ to show current status 2015-06-22 14:06:44 +02:00
Eelco Dolstra 44a2b74f5a Keep track of the number of build steps that are being built
(As opposed to being in the closure copying stage.)
2015-06-22 11:23:00 +02:00
Eelco Dolstra fed71d3fe9 Move "created" field into Step::State 2015-06-22 11:07:52 +02:00
Eelco Dolstra 90a08db241 hydra-queue-runner: Fix assertion failure 2015-06-22 10:59:07 +02:00
Eelco Dolstra d744362e4a hydra-queue-runner: Fix segfault sorting machines by load
While sorting machines by load, the load of a machine
(machine->currentJobs) can be changed by other threads. If that
happens, the comparator is no longer a proper ordering, in which case
std::sort() can segfault. So we now make a copy of currentJobs before
sorting.
2015-06-21 16:21:42 +02:00
Eelco Dolstra a0eff6fc15 Fix machine selection 2015-06-19 17:45:26 +02:00
Eelco Dolstra 81abb6e166 Improve parsing of hydra-build-products 2015-06-19 17:20:20 +02:00
Eelco Dolstra e13477bdf2 Robustness 2015-06-19 16:35:49 +02:00
Eelco Dolstra f196967c43 Don't create a propagated build step to the same build 2015-06-19 15:33:37 +02:00
Eelco Dolstra 7afc61691b Doh 2015-06-19 15:27:49 +02:00
Eelco Dolstra 133d298e26 Asynchronously compress build logs 2015-06-19 15:06:12 +02:00
Eelco Dolstra 8e408048e2 Create build step for non-top-level cached failures
This fixes the missing build step on failures like

  http://hydra.nixos.org/build/23222231
2015-06-19 11:33:15 +02:00
Eelco Dolstra 77c8bfd392 Improve logging for aborts 2015-06-19 10:37:22 +02:00
Eelco Dolstra 8db1ae2855 Less verbosity 2015-06-18 17:43:13 +02:00
Eelco Dolstra 89b629eeb1 Fix finishing steps that are not top-level of any build 2015-06-18 17:37:35 +02:00
Eelco Dolstra 9cdbff2fdf Handle concurrent finishing of the same build
There is a slight possibility that the queue monitor and a builder
thread simultaneously decide to mark a build as finished. That's fine,
as long as we ensure the DB update is idempotent (as ensured by doing
"update Builds set finished = 1 ... where finished = 0").
2015-06-18 17:12:51 +02:00
Eelco Dolstra 948473c909 Fix race between the queue monitor and the builder threads 2015-06-18 16:30:28 +02:00
Eelco Dolstra 9c03b11ca8 Simplify retry handling 2015-06-18 14:51:50 +02:00
Eelco Dolstra e039f5f840 Create failed build steps for cached failures 2015-06-18 04:35:37 +02:00
Eelco Dolstra 92ea800cfb Set finishedInDB in a few more places 2015-06-18 04:19:21 +02:00
Eelco Dolstra 47367451c7 hydra-queue-runner: Set isCachedBuild 2015-06-18 03:28:58 +02:00
Eelco Dolstra 8257812d0a Acquire exclusive table lock earlier 2015-06-18 02:44:29 +02:00
Eelco Dolstra 69be3cfe93 hydra-queue-runner: Handle status queries on the main thread
Doing it on the queue monitor thread was problematic because
processing the queue can take a while.
2015-06-18 01:57:01 +02:00
Eelco Dolstra a40ca6b76e hydra-queue-runner: Improve dispatcher
We now take the machine speed factor into account, just like
build-remote.pl.
2015-06-18 01:52:20 +02:00
Eelco Dolstra 3855131185 hydra-queue-runner: Improve SSH flags 2015-06-18 00:50:48 +02:00
Eelco Dolstra f57d0b0c54 hydra-queue-runner: Maintain count of active build steps 2015-06-18 00:24:56 +02:00
Eelco Dolstra 59dae60558 hydra-queue-runner: More stats 2015-06-17 22:38:12 +02:00
Eelco Dolstra ec8e8edc86 hydra-queue-runner: Handle $HYDRA_DBI 2015-06-17 22:11:01 +02:00
Eelco Dolstra ce9e859a9c hydra-queue-runner: Implement --unlock 2015-06-17 21:35:20 +02:00
Eelco Dolstra ca48818b30 Fix remote building 2015-06-17 17:28:59 +02:00
Eelco Dolstra 11be780948 Handle failure with output 2015-06-17 17:11:42 +02:00
Eelco Dolstra b1a75c7f63 getQueuedBuilds(): Handle dependent builds first
If a build A depends on a derivation that is the top-level derivation
of some build B, then we should process B before A (meaning we
shouldn't make the derivation runnable before B has been
added). Otherwise, the derivation will be "accounted" to A rather than
B (so the build step will show up in the wrong build).
2015-06-17 14:46:02 +02:00
Eelco Dolstra 745efce828 hydra-queue-runner: Implement timeouts
Also, keep track of timeouts in the database as a distinct build
status.
2015-06-17 13:32:33 +02:00
Eelco Dolstra 2da4987bc2 Don't lock the CPU 2015-06-17 11:48:38 +02:00
Eelco Dolstra b91a616520 Automatically retry aborted builds
Aborted builds are now put back on the runnable queue and retried
after a certain time interval (currently 60 seconds for the first
retry, then tripled on each subsequent retry).
2015-06-17 11:45:20 +02:00
Eelco Dolstra e02654b3a0 Prefer cached failure over unsupported system type 2015-06-16 18:00:39 +02:00
Eelco Dolstra 42e7301c08 Add status dump facility
Doing

  $ psql hydra -c 'notify dump_status'

will cause hydra-queue-runner to dump some internal status info on
stderr.
2015-06-15 18:20:14 +02:00
Eelco Dolstra dd104f14ea Make the queue monitor more robust, and better debug output 2015-06-15 16:54:52 +02:00
Eelco Dolstra 147eb4fd15 Support requiredSystemFeatures 2015-06-15 16:33:50 +02:00
Eelco Dolstra 508ab7f8a2 Tweak build steps 2015-06-15 15:48:05 +02:00
Eelco Dolstra 21aaa0596b Fail builds with previously failed steps early 2015-06-15 15:31:42 +02:00
Eelco Dolstra c00bf7cd1a Check non-runnable steps for unsupported system type 2015-06-15 15:13:03 +02:00
Eelco Dolstra 5019fceb20 Add a error type for "unsupported system type" 2015-06-15 15:07:04 +02:00
Eelco Dolstra 541fbd62cc Immediately abort builds that require an unsupported system type 2015-06-15 14:51:49 +02:00
Eelco Dolstra f9cd5adae8 Queue monitor: Get only the fields we need 2015-06-11 18:09:50 +02:00
Eelco Dolstra c974fb893b Support cancelling builds 2015-06-11 18:07:45 +02:00
Eelco Dolstra c08883966c Use PostgreSQL notifications for queue events
Hydra-queue-runner now no longer polls the queue periodically, but
instead sleeps until it receives a notification from PostgreSQL about
a change to the queue (build added, build cancelled or build
restarted).

Also, for the "build added" case, we now only check for builds with an
ID greater than the previous greatest ID. This is much more efficient
if the queue is large.
2015-06-11 17:41:59 +02:00
Eelco Dolstra d72a88b562 Don't try to handle SIGINT
It just makes things unnecessarily complicated. We can just exit
without cleaning anything up, since the only thing to do is unmark
builds and build steps as busy. But we can do that by having systemd
call "hydra-queue-runner --unlock" from ExecStopPost.
2015-06-10 15:55:46 +02:00
Eelco Dolstra a4fb93c119 Lock builds for a shorter amount of time 2015-06-10 15:36:21 +02:00
Eelco Dolstra 6d738a31bf Keep track of failed paths in the Hydra database
I.e. don't use Nix's failed paths feature anymore. Easier to keep
everything in one place.
2015-06-10 14:57:16 +02:00
Eelco Dolstra c68036f8b0 Pass ssh key 2015-06-10 14:57:07 +02:00
Eelco Dolstra 7dd1f0097e Finish copyClosure 2015-06-09 16:03:41 +02:00
Eelco Dolstra c93aa92563 Create BuildSteps race-free
If multiple threads create a step for the same build, they could get
the same "max(stepnr)" and allocate conflicting new step numbers. So
lock the BuildSteps table while doing this. We could use a different
isolation level, but this is easier.
2015-06-09 15:03:20 +02:00
Eelco Dolstra 61d4060522 Record the machine used for a build step 2015-06-09 14:57:49 +02:00
Eelco Dolstra ca1fbdd058 Mark builds as busy 2015-06-09 14:31:43 +02:00
Eelco Dolstra 8b12ac1f6d Basic remote building
This removes the need for Nix's build-remote.pl.

Build logs are now written to $HYDRA_DATA/build-logs because
hydra-queue-runner doesn't have write permission to /nix/var/log.
2015-06-09 14:21:21 +02:00
Eelco Dolstra 3a6cb2f270 Implement a database connection pool 2015-05-29 20:55:13 +02:00
Eelco Dolstra 214b95706c On SIGINT, shut down the builder threads
Note that they don't get interrupted at the moment (so on SIGINT, any
running builds will need to finish first).
2015-05-29 20:02:15 +02:00
Eelco Dolstra e778821940 Make concurrency more robust 2015-05-29 17:14:20 +02:00
Eelco Dolstra 8640e30787 Very basic multi-threaded queue runner 2015-05-29 01:31:12 +02:00
Eelco Dolstra 604fdb908f Pass null values to libpqxx properly 2015-05-28 19:06:17 +02:00
Eelco Dolstra dc446c3980 Start of single-process hydra-queue-runner 2015-05-28 17:39:29 +02:00