Commit graph

323 commits

Author SHA1 Message Date
Eelco Dolstra f6081668dc
Allow determinism checking for entire jobsets
Setting

  xxx-jobset-repeats = patchelf:master:2

will cause Hydra to perform every build step in the specified jobset 2
additional times (i.e. 3 times in total). Non-determinism is not fatal
unless the derivation has the attribute "isDeterministic = true"; we
just note the lack of determinism in the Hydra database. This will
allow us to get stats about the (lack of) reproducibility of all of
Nixpkgs.
2016-12-07 15:57:13 +01:00
Eelco Dolstra 8bb36e79bd
Support testing build determinism
Builds can now specify the attribute "isDeterministic = true" to tell
Hydra to build with build-repeat > 0. If there is a mismatch between
rounds, the step / build fails with a suitable status.

Maybe this should be a meta attribute, but that makes it invisible to
hydra-queue-runner, and it seems reasonable to make a claim of
mandatory determinism part of the derivation (since e.g. enabling this
flag should trigger a rebuild).
2016-12-06 17:46:06 +01:00
Eelco Dolstra afb8765ae4
hydra-queue-runner: Bump memory limit to reflect more accurate accounting 2016-11-16 17:51:18 +01:00
Eelco Dolstra b4d32a3085
hydra-queue-runner: More accurate memory accounting
We now take into account the memory necessary for compressing the NAR
being exported to the binary cache, plus xz compression overhead.

Also, we now release the memory tokens for the NAR accessor *after*
releasing the NAR accessor. Previously the memory for the NAR accessor
might still be in use while another thread does an allocation, causing
the maximum to be exceeded temporarily.

Also, use notify_all instead of notify_one to wake up memory token
waiters. This is not very nice, but not every waiter is requesting the
same number of tokens, so some might be able to proceed.
2016-11-16 17:48:50 +01:00
Eelco Dolstra cb5e438a08 Bump Nix
Fixes #398.
2016-11-09 19:15:13 +01:00
Eelco Dolstra 1ecc8a4f40 hydra-queue-runner: Fix a race keeping cancelled steps alive
If a step is cancelled just as its builder step is starting,
doBuildStep() will return sRetry. This causes builder() to make the
step runnable again, since the queue monitor may have added new builds
referencing it. The idea is that if the latter condition is not true,
the step's reference count will drop to zero and it will be
deleted. However, if the dispatcher thread sees and locks the step
before the reference count can drop to zero in the builder thread, the
dispatcher thread will start a new builder thread for the step. Thus
the step can be kept alive for an indefinite amount of time.

The fix is for State::builder() to use a weak pointer to the step, to
ensure that the step's reference count can drop to zero *before* it's
added to the runnable queue.
2016-11-08 11:47:49 +01:00
Eelco Dolstra de9d7bcf25 hydra-queue-runner: Handle exceptions in the dispatcher thread
E.g. "resource unavailable" when creating new threads.
2016-11-08 11:25:43 +01:00
Eelco Dolstra 7863d2e1da Step cancellation: Don't use pthread_cancel()
This was a bad idea because pthread_cancel() is unsalvageable broken
in C++. Destructors are not allowed to throw exceptions (especially in
C++11), but pthread_cancel() can cause a __cxxabiv1::__forced_unwind
exception inside any destructor that invokes a cancellation
point. (This exception can be caught but *must* be rethrown.) So let's
just kill the builder process instead.
2016-11-07 19:38:24 +01:00
Eelco Dolstra d7453bd8be hydra-queue-runner: Fix message 2016-11-02 12:44:18 +01:00
Eelco Dolstra 4f08c85c69 hydra-queue-runner: Fix assertion failure
It was hitting

    assert(reservation.unique());

Since we do want the machine reservation to be released before calling
wakeDispatcher(), let's use a different object for keeping track of
active steps.
2016-11-02 12:41:00 +01:00
Eelco Dolstra b3169ce438 Kill active build steps when builds are cancelled
We now kill active build steps when there are no more referring
builds. This is useful e.g. for preventing cancelled multi-hour TPC-H
benchmark runs from hogging build machines.
2016-10-31 14:58:29 +01:00
Eelco Dolstra a816ef873d Warn against empty machines file 2016-10-31 11:40:36 +01:00
Eelco Dolstra 3b84d4711b Bump Nix 2016-10-26 15:10:56 +02:00
Eelco Dolstra 0b00d51baf Prevent orphaned build steps
If two active steps of the same build failed, then the first would be
marked as "failed", but the second would end up as "orphaned", causing
it to be marked as "aborted" later on. Now it's correctly marked as
"failed".
2016-10-26 14:42:28 +02:00
Eelco Dolstra 8e1d791d0c Truncate the log just before starting the remote build
This gets rid of all those remote substitution messages that were
polluting the build logs.
2016-10-26 13:41:51 +02:00
Eelco Dolstra 3fcfa20d1a Fix regression caused by ee2e9f53
‘basicDrv.inputSrcs’ also contains the outputs of inputDrvs. These
don't necessarily exist in the local store, so copying them may cause
an exception. We should only copy the real inputSrcs.
2016-10-24 16:49:11 +02:00
Eelco Dolstra a3efdcdfd9 Use std::regex 2016-10-21 18:06:26 +02:00
Eelco Dolstra e0b2921ff2 Concurrent hydra-evaluator
This rewrites the top-level loop of hydra-evaluator in C++. The Perl
stuff is moved into hydra-eval-jobset. (Rewriting the entire evaluator
would be nice but is a bit too much work.) The new version has some
advantages:

* It can run multiple jobset evaluations in parallel.

* It uses PostgreSQL notifications so it doesn't have to poll the
  database. So if a jobset is triggered via the web interface or from
  a GitHub / Bitbucket webhook, evaluation of the jobset will start
  almost instantaneously (assuming the evaluator is not at its
  concurrency limit).

* It imposes a timeout on evaluations. So if e.g. hydra-eval-jobset
  hangs connecting to a Mercurial server, it will eventually be
  killed.
2016-10-14 14:22:12 +02:00
Eelco Dolstra 16feddd5d4 Drop obsolete -laws-cpp-sdk-s3 2016-10-14 14:22:12 +02:00
Eelco Dolstra dd5af7637d Remove finally.hh 2016-10-14 14:22:12 +02:00
Eelco Dolstra ee2e9f5335 Update to reflect BinaryCacheStore changes
BinaryCacheStore no longer implements buildPaths() and ensurePath(),
so we need to use copyPath() / copyClosure().
2016-10-07 20:23:05 +02:00
Eelco Dolstra 6a313c691b hydra-queue-runner: Fix build 2016-10-06 16:58:54 +02:00
Alexander Ried 7089142fdc Add error/warnings for deprecated store specification 2016-10-06 15:10:14 +02:00
Alexander Ried a73f211bf2 Use store-api for binary cache instantiation 2016-10-06 15:09:44 +02:00
Alexander Ried 1c2f6281b9 Remove signing parameter (nix#f435f82) 2016-10-06 15:09:12 +02:00
Alexander Ried 232e6e8556 Replace buildVerbosity with verboseBuild (nix#5761827) 2016-10-06 15:08:02 +02:00
Alexander Ried 492d16074c Remove s3binarystore (moved to nix in d155d80) 2016-10-06 15:07:21 +02:00
Eelco Dolstra b1512a152a Fix build failure on GCC 5.4 2016-09-30 17:05:07 +02:00
Shea Levy 5962367ffc Send BuildFinished notifications on cached build results.
Fixes #342
2016-08-17 06:40:12 -04:00
Eelco Dolstra a55942603a Provide a plugin hook for when build steps finish
Fixes #318.
2016-05-27 14:35:32 +02:00
Eelco Dolstra b50a105ca7 S3BinaryCacheStore: Use disk cache 2016-04-20 15:29:40 +02:00
Eelco Dolstra afb86638cd Updates for negative .narinfo caching 2016-04-15 15:39:20 +02:00
Eelco Dolstra 177bf25d64 Queue monitor: Bail out earlier if a step has failed previously
Currently, the hydra.nixos.org queue contains 1000s of Darwin builds
that all depend on a stdenv-darwin that previously failed. However,
before, first createStep() would construct a dependency graph for each
build, then getQueuedBuilds() would discover that one of the steps had
failed previously and discard all those steps. Since the graph
construction involves a lot of uncached calls to isValidPath(), this
took several seconds per build.

Now createStep() detects the previous failure right away and bails
out.
2016-04-15 14:32:16 +02:00
Eelco Dolstra ef72569cc3 Merge pull request #280 from shlevy/github-status-api
Add a plugin to interact with the github status API.
2016-04-14 20:03:45 +02:00
Eelco Dolstra d6f188a01a Typo 2016-04-13 16:45:40 +02:00
Eelco Dolstra b1e36b550c max-output-size -> max_output_size
To be consistent with other Catalyst/Hydra config option names.
2016-04-13 16:30:52 +02:00
Eelco Dolstra 077ed3f571 Periodically clear orphaned build steps
These are build steps that remain "busy" in the database even though
they have finished, because they couldn't be updated (e.g. due to a
PostgreSQL connection problem). To prevent them from showing up as
busy in the "Machine status" page, we now periodically purge them.
2016-04-13 16:30:52 +02:00
Eelco Dolstra f3f661bac1 Reuse build products / metrics stored in the database
Previously, if the queue monitor thread encounters a build that Hydra
has previously built, it downloaded the output paths from the binary
cache, just to determine the build products and metrics. This is very
inefficient. In particular, when doing something like merging
nixpkgs:staging into nixpkgs:master, the queue monitor thread will be
locked up for a long time fetching files from S3, causing the build
farm to be mostly idle.

Of course this is entirely unnecessary, since the build
products/metrics are already in the Hydra database. So now we just
look up a previous build with the same output path, and copy the
products/metrics.
2016-04-13 16:30:52 +02:00
Eelco Dolstra 8c7edb1005 Fix narrowing conversion 2016-04-13 16:30:52 +02:00
Eelco Dolstra 00c78440b1 Disambiguate "marking build as succeeded" message 2016-04-13 16:30:52 +02:00
Eelco Dolstra ad834343b5 Fix build against current Nix master 2016-04-13 16:30:52 +02:00
Shea Levy 9b37cb89ae Add buildStarted plugin hook 2016-04-12 14:42:01 -04:00
Eelco Dolstra ddc9f3cc6a Temporarily disable machines on any exception, not just connection failures 2016-03-22 16:54:40 +01:00
Eelco Dolstra 0aecd65e59 /queue-runner-status: Include info about temporarily disabled machines 2016-03-22 16:54:06 +01:00
Eelco Dolstra 5535bc28ca Tweak 2016-03-10 16:46:15 +01:00
Eelco Dolstra 60e7930d2b Bump memory limit a bit 2016-03-10 16:46:01 +01:00
Eelco Dolstra 75e7b35477 Fix retry of transient failures 2016-03-10 16:44:26 +01:00
Eelco Dolstra 33da40f272 Doh 2016-03-09 17:31:57 +01:00
Eelco Dolstra 4b9c76e502 hydra-queue-runner: Ensure regular status dumps 2016-03-09 17:11:34 +01:00
Eelco Dolstra 4151be7e69 Make the output size limit configurable
The maximum output size per build step (as the sum of the NARs of each
output) can be set via hydra.conf, e.g.

  max-output-size = 1000000000

The default is 2 GiB.

Also refactored the build error / status handling a bit.
2016-03-09 17:00:09 +01:00
Eelco Dolstra dc790c5f7e Fix bad format string 2016-03-09 16:59:35 +01:00
Eelco Dolstra 80ff78b1b6 Unify build and step status codes
Also remove the obsolete status code 5 from the database.
2016-03-09 15:30:43 +01:00
Eelco Dolstra 9127f5bbc3 hydra-queue-runner: Limit memory usage
When using a binary cache store, the queue runner receives NARs from
the build machines, compresses them, and uploads them to the
cache. However, keeping multiple large NARs in memory can cause the
queue runner to run out of memory. This can happen for instance when
it's processing multiple ISO images concurrently.

The fix is to use a TokenServer to prevent the builder threads to
store more than a certain total size of NARs concurrently (at the
moment, this is hard-coded at 4 GiB). Builder threads that cause the
limit to be exceeded will block until other threads have finished.

The 4 GiB limit does not include certain other allocations, such as
for xz compression or for FSAccessor::readFile(). But since these are
unlikely to be more than the size of the NARs and hydra.nixos.org has
32 GiB RAM, it should be fine.
2016-03-09 14:30:13 +01:00
Eelco Dolstra b77a43b83d Get rid of "will retry" messages after "maybe cancelling..." 2016-03-08 13:09:39 +01:00
Eelco Dolstra 718fef29ef Keep track of time required to load builds 2016-03-08 13:09:29 +01:00
Eelco Dolstra 2feb17c681 Some more logging 2016-03-08 13:08:07 +01:00
Eelco Dolstra 45b237453a hydra-queue-runner: Recycle finishedDrvs
This should prevent the queue monitor thread from looking up the same
derivations over and over again.
2016-03-08 11:52:13 +01:00
Eelco Dolstra 2ab8e9a1e0 hydra-queue-runner: Fix handling of missing derivations
This barfed with 'queue monitor: ERROR: column "errormsg" of relation
"builds" does not exist' due to the removal of the errorMsg column.
2016-03-07 19:05:24 +01:00
Eelco Dolstra e7ce225558 Fix build 2016-03-04 17:51:32 +01:00
Eelco Dolstra 86a2d6471c Fix a boost format string abort 2016-03-02 20:06:48 +01:00
Eelco Dolstra 232ca8fea2 Fix build 2016-03-02 17:05:07 +01:00
Eelco Dolstra b98a061c24 Add some instrumentation to keep track of dispatcher cost 2016-03-02 14:18:39 +01:00
Eelco Dolstra 6beee0ab49 Fix segfault sorting runnable steps
Same problem as d744362e4a.

    at /nix/store/ksvsbr7pg4z69bv6fbbc8h7x7rm2104m-gcc-4.9.3/include/c++/4.9.3/bits/predefined_ops.h:166
    __last@entry=..., __comp=...) at /nix/store/ksvsbr7pg4z69bv6fbbc8h7x7rm2104m-gcc-4.9.3/include/c++/4.9.3/bits/stl_algo.h:1827
    __comp=...) at /nix/store/ksvsbr7pg4z69bv6fbbc8h7x7rm2104m-gcc-4.9.3/include/c++/4.9.3/bits/stl_algo.h:4717
2016-03-02 13:59:24 +01:00
Eelco Dolstra 7cd08c7c46 Warn if PostgreSQL appears stalled 2016-02-29 15:10:30 +01:00
Eelco Dolstra 922dc541c2 Add log message 2016-02-29 11:58:06 +01:00
Eelco Dolstra 610a8d67ae Better AWS error messages 2016-02-26 22:40:27 +01:00
Eelco Dolstra 1a055e7e9e Reduce severity level of some message 2016-02-26 21:31:08 +01:00
Eelco Dolstra 6bb860fd6e Add FIXME 2016-02-26 21:15:05 +01:00
Eelco Dolstra 53ca41ef9f Use US standard S3 region 2016-02-26 20:57:47 +01:00
Eelco Dolstra c635f5d0ea Fix Makefile.am 2016-02-26 19:54:55 +01:00
Eelco Dolstra b9afaadfb3 Keep better bytesReceived/bytesSent stats 2016-02-26 16:17:05 +01:00
Eelco Dolstra 6d741d2ffa Prevent download of NARs we just uploaded 2016-02-26 15:21:44 +01:00
Eelco Dolstra 02190b0fef Support hydra-build-products on binary cache stores 2016-02-26 14:45:03 +01:00
Eelco Dolstra 8e24ad6f0d Sync with Nix 2016-02-25 10:58:31 +01:00
Eelco Dolstra 8321a3eb27 Sync with Nix 2016-02-24 14:04:31 +01:00
Eelco Dolstra 7b509237cd Bleh Automake 2016-02-22 18:05:15 +01:00
Eelco Dolstra 6c3ae36648 hydra-queue-runner: Get store mode configuration from hydra.conf
To use the local Nix store (default):

  store_mode = direct

To use a local binary cache:

  store_mode = local-binary-cache
  binary_cache_dir = /var/lib/hydra/binary-cache

To use an S3 bucket:

  store_mode = s3-binary-cache
  binary_cache_s3_bucket = my-nix-bucket

Also, respect binary_cache_{secret,public}_key_file for signing the
binary cache.
2016-02-22 17:23:06 +01:00
Eelco Dolstra 94817d77d9 BinaryCacheStore: Respect build-use-substitutes 2016-02-22 17:21:39 +01:00
Eelco Dolstra 5668aa5f71 After uploading a .narinfo, add it to the LRU cache 2016-02-20 10:35:16 +01:00
Eelco Dolstra 88a05763cc Pool local store connections 2016-02-20 00:04:08 +01:00
Eelco Dolstra 1cefd6cac8 Fix log message 2016-02-20 00:02:37 +01:00
Eelco Dolstra 2b76094a23 S3BinaryCacheStore::isValidPath(): Do a GET instead of HEAD 2016-02-19 17:41:11 +01:00
Eelco Dolstra bd76f9120a Cache .narinfo lookups 2016-02-19 16:19:40 +01:00
Eelco Dolstra a0f74047da Keep some statistics for the binary cache stores 2016-02-19 14:24:23 +01:00
Eelco Dolstra dc4a00347d Use a single BinaryCacheStore for all threads
This will make it easier to do caching / keep stats. Also, we won't
have S3Client's connection pooling if we create multiple S3Client
instances.
2016-02-18 17:31:19 +01:00
Eelco Dolstra 00a7be13a2 Make queue runner internal status available under /queue-runner-status 2016-02-18 17:11:46 +01:00
Eelco Dolstra 8c9fc677c1 Typo 2016-02-18 16:43:24 +01:00
Eelco Dolstra db3fcc0f5e Enable substitution on the build machines
If properly configured, this allows them to get store paths directly
from S3, rather than having to receive them from the queue runner.
2016-02-18 16:42:05 +01:00
Eelco Dolstra 2d40888e2e Add an S3-backed binary cache store 2016-02-18 16:18:50 +01:00
Eelco Dolstra 0e254ca66d Refactor local binary cache code into a subclass 2016-02-18 14:06:17 +01:00
Eelco Dolstra a992f688d1 Rename class 2016-02-18 13:02:20 +01:00
Eelco Dolstra de77cc2910 Rename file 2016-02-18 13:02:20 +01:00
Eelco Dolstra ce5790285a Merge remote-tracking branch 'origin/master' into binary-cache 2016-02-17 11:54:59 +01:00
Eelco Dolstra d7a123fcd4 Keep track of the time we spend copying to/from build machines 2016-02-17 10:30:23 +01:00
Eelco Dolstra 25022bf5fd hydra-queue-runner: Support generating a signed binary cache 2016-02-16 16:41:42 +01:00
Eelco Dolstra 744cee134e hydra-queue-runner: Compress binary cache NARs using xz 2016-02-15 21:56:53 +01:00
Eelco Dolstra 2d0dd7fb49 hydra-queue-runner: Write directly to a binary cache 2016-02-15 21:10:29 +01:00
Eelco Dolstra 92d8b59361 Process Nix API changes 2016-02-11 15:59:47 +01:00
Eelco Dolstra 97f8c61928 Fix hydra-queue-runner --build-one 2015-12-29 17:53:33 +01:00
Eelco Dolstra c087472c71 Remove superfluous "has" function 2015-11-02 14:29:12 +01:00
Eelco Dolstra 2d128d2a6f Don't show redundant "removing machine..." messages 2015-10-30 18:22:43 +01:00
Eelco Dolstra d8d188301d Fix division-by-zero crash
Not clear why step_->jobsets was empty...
2015-10-30 18:01:48 +01:00
Eelco Dolstra 1ff48da3d3 int2String -> std::to_string 2015-10-30 18:01:38 +01:00
Eelco Dolstra 4d1816b152 Remove obsolete Builds columns and provide accurate "Running builds"
This removes the "busy", "locker" and "logfile" columns, which are no
longer used by the queue runner. The "Running builds" page now only
shows builds that have an active build step.
2015-10-27 15:37:17 +01:00
Eelco Dolstra 53c80d9526 getQueuedBuilds(): Periodically stop to handle priority bumps
Previously, priority bumps could take a long time to get noticed if
getQueuedBuilds() was busy processing zillions of queue
additions. (This was made worse by the reintroduction of substitute
checking.)
2015-10-22 17:00:46 +02:00
Eelco Dolstra 71bf7e02d5 Use nix::willBuildLocally() 2015-10-21 15:44:29 +02:00
Eelco Dolstra 8e8e31ce86 Re-implement log size limits
The old queue runner already had this. However, we now store "log
limit exceeded" as a separate status code in the database.
2015-10-06 17:35:08 +02:00
Eelco Dolstra 82504fe010 hydra-queue-runner: Use substitutes
This allows Hydra to use binaries from available binary caches. It
makes the queue monitor thread quite a bit slower, so if you don't
want to use binary caches, it's better to add "--option
build-use-substitutes false" to the hydra-queue-runner invocation.

Fixed #243.
2015-10-05 14:57:44 +02:00
Eelco Dolstra f8141fdc98 Set propagatedFrom for cached failed build steps 2015-09-11 15:55:26 +02:00
Eelco Dolstra 6075ac6fed Remove localhost hack 2015-09-09 16:50:59 +02:00
Eelco Dolstra ee9bf7ace7 Account steps with preferLocalBuild as a separate system type
They will show up in machineTypes as (e.g.) x86_64-linux:local instead
of x86_64-linux. This is to prevent the Hydra provisioner from
creating machines for steps that are supposed to be executed locally.
2015-09-02 13:42:25 +02:00
Eelco Dolstra 7e954aff03 Keep machine stats even when a machine is removed from the machines file
This is important for the Hydra provisioner, since it needs to be able
to see whether a disabled machine still has jobs running on it.
2015-09-02 13:31:47 +02:00
Eelco Dolstra 2a7fbd57cc Allow the machines file to specify host public keys
It's easier for the Hydra provisioner to put host public keys in the
machines file than to separately manage the known_hosts file
(especially when the provisioner runs on a different machine).
2015-08-26 13:43:02 +02:00
Eelco Dolstra 7aa52517e9 Support multiple machines files
This is primarily useful for the Hydra provisioner, which can write
its machines to another file than /etc/nix/machines.
2015-08-25 15:34:53 +02:00
Eelco Dolstra 7a654259ff Wake the dispatcher when the machines file has changed 2015-08-17 15:48:10 +02:00
Eelco Dolstra 092d60735b Keep track of wait time per system type
I.e., how much time the currently runnable steps per system type have
been waiting. This is useful for deciding whether to provision more
machines.
2015-08-17 15:45:44 +02:00
Eelco Dolstra 99bfc37764 Don't abort steps that have an unsupported system type
This is necessary because the required system type can become
available later (e.g. by being provisioned by the
auto-scaler). However, in the future, we may want to fail steps if
they have been unsupported for more than a certain amount of time.
2015-08-17 15:10:41 +02:00
Eelco Dolstra ea1eb2e3fb Keep track of requiredSystemFeatures in the machine stats
For example, steps that require the "kvm" feature may require a
different kind of machine to be provisioned. This can also be used to
require performance-sensitive tests to run on a particular kind of
machine, e.g., by setting requiredSystemFeatures to something like
"ec2-i2.8xlarge".
2015-08-17 14:37:57 +02:00
Eelco Dolstra d571e44b86 Keep stats for the Hydra auto scaler
"hydra-queue-runner --status" now prints how many runnable and running
build steps exist for each machine type. This allows additional
machines to be provisioned based on the Hydra load.
2015-08-17 13:50:41 +02:00
Eelco Dolstra d4759c1da2 hydra-queue-runner: Detect changes to the scheduling shares 2015-08-12 13:17:56 +02:00
Eelco Dolstra 576dc0c120 For completeness, re-implement meta.schedulingPriority 2015-08-12 12:05:43 +02:00
Eelco Dolstra b7965df928 Load the queue in order of global priority 2015-08-11 02:14:34 +02:00
Eelco Dolstra 97f11baa8d Revive jobset scheduling
(I.e. taking the jobset scheduling share into account.)
2015-08-11 01:31:56 +02:00
Eelco Dolstra eb13007fe6 Allow build to be bumped to the front of the queue via the web interface
Builds now have a "Bump up" action. This will cause the queue runner
to prioritise the steps of the build above all other steps.
2015-08-10 16:19:47 +02:00
Eelco Dolstra 27182c7c1d Start steps in order of ascending build ID 2015-08-10 16:19:47 +02:00
Eelco Dolstra 593850b956 Fix potential race in dispatcher wakeup 2015-08-10 12:54:55 +02:00
Eelco Dolstra 6a1c950e94 Unindent 2015-08-10 11:33:22 +02:00
Eelco Dolstra f21b88e388 Remove superfluous check 2015-08-07 04:20:34 +02:00
Eelco Dolstra f1fbf8c605 Fix race in finishing builds that have been cancelled 2015-08-07 04:18:48 +02:00
Eelco Dolstra ff3f5eb4d8 Fix remote building on Nix 1.10 2015-07-31 03:41:55 +02:00
Eelco Dolstra 5b9a288123 Workaround for RemoteStore not supporting cmdBuildDerivation yet 2015-07-31 03:39:20 +02:00
Eelco Dolstra 4d26546d3c Add support for tracking custom metrics
Builds can now emit metrics that Hydra will store in its database and
render as time series via flot charts. Typical applications are to
keep track of performance indicators, coverage percentages, artifact
sizes, and so on.

For example, a coverage build can emit the coverage percentage as
follows:

  echo "lineCoverage $pct %" > $out/nix-support/hydra-metrics

Graphs of all metrics for a job can be seen at

  http://.../job/<project>/<jobset>/<job>#tabs-charts

Specific metrics are also visible at

  http://.../job/<project>/<jobset>/<job>/metric/<metric>

The latter URL also allows getting the data in JSON format (e.g. via
"curl -H 'Accept: application/json'").
2015-07-31 00:57:30 +02:00
Eelco Dolstra c18fb0ad74 Temporarily disable machines after a connection failure 2015-07-21 15:58:47 +02:00
Eelco Dolstra 7e026d35f7 Split hydra-queue-runner.cc more 2015-07-21 15:14:17 +02:00
Eelco Dolstra 5370be9f52 hydra-queue-runner: Use cmdBuildDerivation
See 1511aa9f48 and eda2f36c2a.
2015-07-21 01:54:24 +02:00
Eelco Dolstra 3ded87329d Keep track of how many threads are waiting 2015-07-10 19:10:14 +02:00
Eelco Dolstra 89fb723ace Notify the queue runner when a build is deleted 2015-07-08 11:43:35 +02:00
Eelco Dolstra 35b7c4f82b Allow only 1 thread to send a closure to a given machine at the same time
This prevents a race where multiple threads see that machine X is
missing path P, and start sending it concurrently. Nix handles this
correctly, but it's still wasteful (especially for the case where P ==
GHC).

A more refined scheme would be to have per machine, per path locks.
2015-07-07 14:06:48 +02:00
Eelco Dolstra 16696a4aee Namespace cleanup 2015-07-07 10:29:43 +02:00
Eelco Dolstra 63745b8e25 Move buildRemote() into State 2015-07-07 10:25:33 +02:00
Eelco Dolstra df29527531 Refactor 2015-07-07 10:17:21 +02:00
Eelco Dolstra dffb629b8a Unify Hydra's NixOS module with the one used for hydra.nixos.org
In particular, the queue runner and web server now run under different
UIDs.
2015-07-02 01:01:44 +02:00
Eelco Dolstra 2ece42b2b9 Support preferLocalBuild
Derivations with "preferLocalBuild = true" can now be executed on
specific machines (typically localhost) by setting the mandary system
features field to include "local". For example:

  localhost x86_64-linux,i686-linux - 10 100 - local

says that "localhost" can *only* do builds with "preferLocalBuild =
true". The speed factor of 100 will make the machine almost always win
over other machines.
2015-06-30 00:20:19 +02:00
Eelco Dolstra 008d610467 getQueuedBuilds(): Don't catch errors while loading a build from the queue
Otherwise we never recover from reset daemon connections, e.g.

  hydra-queue-runner[16106]: while loading build 599369: cannot start daemon worker: reading from file: Connection reset by peer
  hydra-queue-runner[16106]: while loading build 599236: writing to file: Broken pipe
  ...

The error is now handled queueMonitor(), causing the next call to
queueMonitorLoop() to create a new connection.
2015-06-26 21:06:35 +02:00
Eelco Dolstra 2f4676bd97 JSONObject doesn't handle 64-bit integers 2015-06-25 16:59:48 +02:00
Eelco Dolstra c6fcce3b3b Moar stats 2015-06-25 16:47:39 +02:00
Eelco Dolstra 18a3c3ff1c Update "make check" for the new queue runner
Also, if the machines file contains an entry for localhost, then run
"nix-store --serve" directly, without going through SSH.
2015-06-25 16:47:39 +02:00
Eelco Dolstra 32210905d8 Automatically reload $NIX_REMOTE_SYSTEMS when it changes
Otherwise, you'd have to restart the queue runner to add or remove
machines.
2015-06-25 16:47:25 +02:00
Eelco Dolstra 1a0e1eb5a0 More stats 2015-06-24 13:19:27 +02:00
Eelco Dolstra 3f8891b6ff Fix incorrect debug message 2015-06-23 17:53:15 +02:00