Commit graph

114 commits

Author SHA1 Message Date
Eelco Dolstra
6075ac6fed Remove localhost hack 2015-09-09 16:50:59 +02:00
Eelco Dolstra
ee9bf7ace7 Account steps with preferLocalBuild as a separate system type
They will show up in machineTypes as (e.g.) x86_64-linux:local instead
of x86_64-linux. This is to prevent the Hydra provisioner from
creating machines for steps that are supposed to be executed locally.
2015-09-02 13:42:25 +02:00
Eelco Dolstra
7e954aff03 Keep machine stats even when a machine is removed from the machines file
This is important for the Hydra provisioner, since it needs to be able
to see whether a disabled machine still has jobs running on it.
2015-09-02 13:31:47 +02:00
Eelco Dolstra
2a7fbd57cc Allow the machines file to specify host public keys
It's easier for the Hydra provisioner to put host public keys in the
machines file than to separately manage the known_hosts file
(especially when the provisioner runs on a different machine).
2015-08-26 13:43:02 +02:00
Eelco Dolstra
7aa52517e9 Support multiple machines files
This is primarily useful for the Hydra provisioner, which can write
its machines to another file than /etc/nix/machines.
2015-08-25 15:34:53 +02:00
Eelco Dolstra
7a654259ff Wake the dispatcher when the machines file has changed 2015-08-17 15:48:10 +02:00
Eelco Dolstra
092d60735b Keep track of wait time per system type
I.e., how much time the currently runnable steps per system type have
been waiting. This is useful for deciding whether to provision more
machines.
2015-08-17 15:45:44 +02:00
Eelco Dolstra
99bfc37764 Don't abort steps that have an unsupported system type
This is necessary because the required system type can become
available later (e.g. by being provisioned by the
auto-scaler). However, in the future, we may want to fail steps if
they have been unsupported for more than a certain amount of time.
2015-08-17 15:10:41 +02:00
Eelco Dolstra
ea1eb2e3fb Keep track of requiredSystemFeatures in the machine stats
For example, steps that require the "kvm" feature may require a
different kind of machine to be provisioned. This can also be used to
require performance-sensitive tests to run on a particular kind of
machine, e.g., by setting requiredSystemFeatures to something like
"ec2-i2.8xlarge".
2015-08-17 14:37:57 +02:00
Eelco Dolstra
d571e44b86 Keep stats for the Hydra auto scaler
"hydra-queue-runner --status" now prints how many runnable and running
build steps exist for each machine type. This allows additional
machines to be provisioned based on the Hydra load.
2015-08-17 13:50:41 +02:00
Eelco Dolstra
d4759c1da2 hydra-queue-runner: Detect changes to the scheduling shares 2015-08-12 13:17:56 +02:00
Eelco Dolstra
576dc0c120 For completeness, re-implement meta.schedulingPriority 2015-08-12 12:05:43 +02:00
Eelco Dolstra
b7965df928 Load the queue in order of global priority 2015-08-11 02:14:34 +02:00
Eelco Dolstra
97f11baa8d Revive jobset scheduling
(I.e. taking the jobset scheduling share into account.)
2015-08-11 01:31:56 +02:00
Eelco Dolstra
eb13007fe6 Allow build to be bumped to the front of the queue via the web interface
Builds now have a "Bump up" action. This will cause the queue runner
to prioritise the steps of the build above all other steps.
2015-08-10 16:19:47 +02:00
Eelco Dolstra
27182c7c1d Start steps in order of ascending build ID 2015-08-10 16:19:47 +02:00
Eelco Dolstra
593850b956 Fix potential race in dispatcher wakeup 2015-08-10 12:54:55 +02:00
Eelco Dolstra
6a1c950e94 Unindent 2015-08-10 11:33:22 +02:00
Eelco Dolstra
f21b88e388 Remove superfluous check 2015-08-07 04:20:34 +02:00
Eelco Dolstra
f1fbf8c605 Fix race in finishing builds that have been cancelled 2015-08-07 04:18:48 +02:00
Eelco Dolstra
ff3f5eb4d8 Fix remote building on Nix 1.10 2015-07-31 03:41:55 +02:00
Eelco Dolstra
5b9a288123 Workaround for RemoteStore not supporting cmdBuildDerivation yet 2015-07-31 03:39:20 +02:00
Eelco Dolstra
4d26546d3c Add support for tracking custom metrics
Builds can now emit metrics that Hydra will store in its database and
render as time series via flot charts. Typical applications are to
keep track of performance indicators, coverage percentages, artifact
sizes, and so on.

For example, a coverage build can emit the coverage percentage as
follows:

  echo "lineCoverage $pct %" > $out/nix-support/hydra-metrics

Graphs of all metrics for a job can be seen at

  http://.../job/<project>/<jobset>/<job>#tabs-charts

Specific metrics are also visible at

  http://.../job/<project>/<jobset>/<job>/metric/<metric>

The latter URL also allows getting the data in JSON format (e.g. via
"curl -H 'Accept: application/json'").
2015-07-31 00:57:30 +02:00
Eelco Dolstra
c18fb0ad74 Temporarily disable machines after a connection failure 2015-07-21 15:58:47 +02:00
Eelco Dolstra
7e026d35f7 Split hydra-queue-runner.cc more 2015-07-21 15:14:17 +02:00
Eelco Dolstra
5370be9f52 hydra-queue-runner: Use cmdBuildDerivation
See 1511aa9f48 and eda2f36c2a.
2015-07-21 01:54:24 +02:00
Eelco Dolstra
3ded87329d Keep track of how many threads are waiting 2015-07-10 19:10:14 +02:00
Eelco Dolstra
89fb723ace Notify the queue runner when a build is deleted 2015-07-08 11:43:35 +02:00
Eelco Dolstra
35b7c4f82b Allow only 1 thread to send a closure to a given machine at the same time
This prevents a race where multiple threads see that machine X is
missing path P, and start sending it concurrently. Nix handles this
correctly, but it's still wasteful (especially for the case where P ==
GHC).

A more refined scheme would be to have per machine, per path locks.
2015-07-07 14:06:48 +02:00
Eelco Dolstra
16696a4aee Namespace cleanup 2015-07-07 10:29:43 +02:00
Eelco Dolstra
63745b8e25 Move buildRemote() into State 2015-07-07 10:25:33 +02:00
Eelco Dolstra
df29527531 Refactor 2015-07-07 10:17:21 +02:00
Eelco Dolstra
dffb629b8a Unify Hydra's NixOS module with the one used for hydra.nixos.org
In particular, the queue runner and web server now run under different
UIDs.
2015-07-02 01:01:44 +02:00
Eelco Dolstra
2ece42b2b9 Support preferLocalBuild
Derivations with "preferLocalBuild = true" can now be executed on
specific machines (typically localhost) by setting the mandary system
features field to include "local". For example:

  localhost x86_64-linux,i686-linux - 10 100 - local

says that "localhost" can *only* do builds with "preferLocalBuild =
true". The speed factor of 100 will make the machine almost always win
over other machines.
2015-06-30 00:20:19 +02:00
Eelco Dolstra
008d610467 getQueuedBuilds(): Don't catch errors while loading a build from the queue
Otherwise we never recover from reset daemon connections, e.g.

  hydra-queue-runner[16106]: while loading build 599369: cannot start daemon worker: reading from file: Connection reset by peer
  hydra-queue-runner[16106]: while loading build 599236: writing to file: Broken pipe
  ...

The error is now handled queueMonitor(), causing the next call to
queueMonitorLoop() to create a new connection.
2015-06-26 21:06:35 +02:00
Eelco Dolstra
2f4676bd97 JSONObject doesn't handle 64-bit integers 2015-06-25 16:59:48 +02:00
Eelco Dolstra
c6fcce3b3b Moar stats 2015-06-25 16:47:39 +02:00
Eelco Dolstra
18a3c3ff1c Update "make check" for the new queue runner
Also, if the machines file contains an entry for localhost, then run
"nix-store --serve" directly, without going through SSH.
2015-06-25 16:47:39 +02:00
Eelco Dolstra
32210905d8 Automatically reload $NIX_REMOTE_SYSTEMS when it changes
Otherwise, you'd have to restart the queue runner to add or remove
machines.
2015-06-25 16:47:25 +02:00
Eelco Dolstra
1a0e1eb5a0 More stats 2015-06-24 13:19:27 +02:00
Eelco Dolstra
3f8891b6ff Fix incorrect debug message 2015-06-23 17:53:15 +02:00
Eelco Dolstra
af5cbe97aa createStep(): Cache finished derivations
This gets rid of a lot of redundant calls to readDerivation().
2015-06-23 03:25:31 +02:00
Eelco Dolstra
681f63a382 Typo 2015-06-23 02:15:11 +02:00
Eelco Dolstra
524ee295e0 Fix sending notifications in the successful case 2015-06-23 02:13:06 +02:00
Eelco Dolstra
4db7c51b5c Rate-limit the number of threads copying closures at the same time
Having a hundred threads doing I/O at the same time is bad on magnetic
disks because of the excessive disk seeks. So allow only 4 threads to
copy closures in parallel.
2015-06-23 01:49:14 +02:00
Eelco Dolstra
a317d24b29 hydra-queue-runner: Send build notifications
Since our notification plugins are written in Perl, sending
notification from C++ requires a small Perl helper named
‘hydra-notify’.
2015-06-23 00:14:49 +02:00
Eelco Dolstra
5312e1209b Keep per-machine stats 2015-06-22 17:11:17 +02:00
Eelco Dolstra
d06366e7cf Remove obsolete comment 2015-06-22 16:59:50 +02:00
Eelco Dolstra
e069ee960e Doh 2015-06-22 16:58:40 +02:00
Eelco Dolstra
41ba7418e2 hydra-queue-runner: More stats 2015-06-22 15:34:33 +02:00