Commit graph

65 commits

Author SHA1 Message Date
Eelco Dolstra
6517446c34
Update to latest nixUnstable 2017-09-14 17:22:48 +02:00
Eelco Dolstra
50ab80caf2
Don't wait forever to acquire the send lock 2017-09-01 15:29:06 +02:00
Eelco Dolstra
bba383bf1b
hydra-queue-runner: Keep some notification statistics 2017-07-24 16:26:44 +02:00
Eelco Dolstra
f46a21e16e
Slight cleanup 2017-07-21 17:22:11 +02:00
Eelco Dolstra
150228d7de
Upload build logs to the binary cache 2017-03-15 16:59:57 +01:00
Eelco Dolstra
7e6486e694
Move log compression to a plugin 2017-03-15 16:59:57 +01:00
Eelco Dolstra
500c27e4d5
Add hydra.conf option "nar_buffer_size" to configure memoryTokens limit
It defaults to half the physical RAM.
2017-03-03 12:37:27 +01:00
Eelco Dolstra
f6081668dc
Allow determinism checking for entire jobsets
Setting

  xxx-jobset-repeats = patchelf:master:2

will cause Hydra to perform every build step in the specified jobset 2
additional times (i.e. 3 times in total). Non-determinism is not fatal
unless the derivation has the attribute "isDeterministic = true"; we
just note the lack of determinism in the Hydra database. This will
allow us to get stats about the (lack of) reproducibility of all of
Nixpkgs.
2016-12-07 15:57:13 +01:00
Eelco Dolstra
8bb36e79bd
Support testing build determinism
Builds can now specify the attribute "isDeterministic = true" to tell
Hydra to build with build-repeat > 0. If there is a mismatch between
rounds, the step / build fails with a suitable status.

Maybe this should be a meta attribute, but that makes it invisible to
hydra-queue-runner, and it seems reasonable to make a claim of
mandatory determinism part of the derivation (since e.g. enabling this
flag should trigger a rebuild).
2016-12-06 17:46:06 +01:00
Eelco Dolstra
b4d32a3085
hydra-queue-runner: More accurate memory accounting
We now take into account the memory necessary for compressing the NAR
being exported to the binary cache, plus xz compression overhead.

Also, we now release the memory tokens for the NAR accessor *after*
releasing the NAR accessor. Previously the memory for the NAR accessor
might still be in use while another thread does an allocation, causing
the maximum to be exceeded temporarily.

Also, use notify_all instead of notify_one to wake up memory token
waiters. This is not very nice, but not every waiter is requesting the
same number of tokens, so some might be able to proceed.
2016-11-16 17:48:50 +01:00
Eelco Dolstra
7863d2e1da Step cancellation: Don't use pthread_cancel()
This was a bad idea because pthread_cancel() is unsalvageable broken
in C++. Destructors are not allowed to throw exceptions (especially in
C++11), but pthread_cancel() can cause a __cxxabiv1::__forced_unwind
exception inside any destructor that invokes a cancellation
point. (This exception can be caught but *must* be rethrown.) So let's
just kill the builder process instead.
2016-11-07 19:38:24 +01:00
Eelco Dolstra
4f08c85c69 hydra-queue-runner: Fix assertion failure
It was hitting

    assert(reservation.unique());

Since we do want the machine reservation to be released before calling
wakeDispatcher(), let's use a different object for keeping track of
active steps.
2016-11-02 12:41:00 +01:00
Eelco Dolstra
b3169ce438 Kill active build steps when builds are cancelled
We now kill active build steps when there are no more referring
builds. This is useful e.g. for preventing cancelled multi-hour TPC-H
benchmark runs from hogging build machines.
2016-10-31 14:58:29 +01:00
Eelco Dolstra
ee2e9f5335 Update to reflect BinaryCacheStore changes
BinaryCacheStore no longer implements buildPaths() and ensurePath(),
so we need to use copyPath() / copyClosure().
2016-10-07 20:23:05 +02:00
Eelco Dolstra
a55942603a Provide a plugin hook for when build steps finish
Fixes #318.
2016-05-27 14:35:32 +02:00
Eelco Dolstra
ef72569cc3 Merge pull request #280 from shlevy/github-status-api
Add a plugin to interact with the github status API.
2016-04-14 20:03:45 +02:00
Eelco Dolstra
077ed3f571 Periodically clear orphaned build steps
These are build steps that remain "busy" in the database even though
they have finished, because they couldn't be updated (e.g. due to a
PostgreSQL connection problem). To prevent them from showing up as
busy in the "Machine status" page, we now periodically purge them.
2016-04-13 16:30:52 +02:00
Eelco Dolstra
f3f661bac1 Reuse build products / metrics stored in the database
Previously, if the queue monitor thread encounters a build that Hydra
has previously built, it downloaded the output paths from the binary
cache, just to determine the build products and metrics. This is very
inefficient. In particular, when doing something like merging
nixpkgs:staging into nixpkgs:master, the queue monitor thread will be
locked up for a long time fetching files from S3, causing the build
farm to be mostly idle.

Of course this is entirely unnecessary, since the build
products/metrics are already in the Hydra database. So now we just
look up a previous build with the same output path, and copy the
products/metrics.
2016-04-13 16:30:52 +02:00
Shea Levy
9b37cb89ae Add buildStarted plugin hook 2016-04-12 14:42:01 -04:00
Eelco Dolstra
33da40f272 Doh 2016-03-09 17:31:57 +01:00
Eelco Dolstra
4b9c76e502 hydra-queue-runner: Ensure regular status dumps 2016-03-09 17:11:34 +01:00
Eelco Dolstra
4151be7e69 Make the output size limit configurable
The maximum output size per build step (as the sum of the NARs of each
output) can be set via hydra.conf, e.g.

  max-output-size = 1000000000

The default is 2 GiB.

Also refactored the build error / status handling a bit.
2016-03-09 17:00:09 +01:00
Eelco Dolstra
80ff78b1b6 Unify build and step status codes
Also remove the obsolete status code 5 from the database.
2016-03-09 15:30:43 +01:00
Eelco Dolstra
9127f5bbc3 hydra-queue-runner: Limit memory usage
When using a binary cache store, the queue runner receives NARs from
the build machines, compresses them, and uploads them to the
cache. However, keeping multiple large NARs in memory can cause the
queue runner to run out of memory. This can happen for instance when
it's processing multiple ISO images concurrently.

The fix is to use a TokenServer to prevent the builder threads to
store more than a certain total size of NARs concurrently (at the
moment, this is hard-coded at 4 GiB). Builder threads that cause the
limit to be exceeded will block until other threads have finished.

The 4 GiB limit does not include certain other allocations, such as
for xz compression or for FSAccessor::readFile(). But since these are
unlikely to be more than the size of the NARs and hydra.nixos.org has
32 GiB RAM, it should be fine.
2016-03-09 14:30:13 +01:00
Eelco Dolstra
b77a43b83d Get rid of "will retry" messages after "maybe cancelling..." 2016-03-08 13:09:39 +01:00
Eelco Dolstra
718fef29ef Keep track of time required to load builds 2016-03-08 13:09:29 +01:00
Eelco Dolstra
b98a061c24 Add some instrumentation to keep track of dispatcher cost 2016-03-02 14:18:39 +01:00
Eelco Dolstra
6beee0ab49 Fix segfault sorting runnable steps
Same problem as d744362e4a.

    at /nix/store/ksvsbr7pg4z69bv6fbbc8h7x7rm2104m-gcc-4.9.3/include/c++/4.9.3/bits/predefined_ops.h:166
    __last@entry=..., __comp=...) at /nix/store/ksvsbr7pg4z69bv6fbbc8h7x7rm2104m-gcc-4.9.3/include/c++/4.9.3/bits/stl_algo.h:1827
    __comp=...) at /nix/store/ksvsbr7pg4z69bv6fbbc8h7x7rm2104m-gcc-4.9.3/include/c++/4.9.3/bits/stl_algo.h:4717
2016-03-02 13:59:24 +01:00
Eelco Dolstra
7cd08c7c46 Warn if PostgreSQL appears stalled 2016-02-29 15:10:30 +01:00
Eelco Dolstra
6d741d2ffa Prevent download of NARs we just uploaded 2016-02-26 15:21:44 +01:00
Eelco Dolstra
8321a3eb27 Sync with Nix 2016-02-24 14:04:31 +01:00
Eelco Dolstra
6c3ae36648 hydra-queue-runner: Get store mode configuration from hydra.conf
To use the local Nix store (default):

  store_mode = direct

To use a local binary cache:

  store_mode = local-binary-cache
  binary_cache_dir = /var/lib/hydra/binary-cache

To use an S3 bucket:

  store_mode = s3-binary-cache
  binary_cache_s3_bucket = my-nix-bucket

Also, respect binary_cache_{secret,public}_key_file for signing the
binary cache.
2016-02-22 17:23:06 +01:00
Eelco Dolstra
88a05763cc Pool local store connections 2016-02-20 00:04:08 +01:00
Eelco Dolstra
dc4a00347d Use a single BinaryCacheStore for all threads
This will make it easier to do caching / keep stats. Also, we won't
have S3Client's connection pooling if we create multiple S3Client
instances.
2016-02-18 17:31:19 +01:00
Eelco Dolstra
ce5790285a Merge remote-tracking branch 'origin/master' into binary-cache 2016-02-17 11:54:59 +01:00
Eelco Dolstra
d7a123fcd4 Keep track of the time we spend copying to/from build machines 2016-02-17 10:30:23 +01:00
Eelco Dolstra
2d0dd7fb49 hydra-queue-runner: Write directly to a binary cache 2016-02-15 21:10:29 +01:00
Eelco Dolstra
92d8b59361 Process Nix API changes 2016-02-11 15:59:47 +01:00
Eelco Dolstra
c087472c71 Remove superfluous "has" function 2015-11-02 14:29:12 +01:00
Eelco Dolstra
53c80d9526 getQueuedBuilds(): Periodically stop to handle priority bumps
Previously, priority bumps could take a long time to get noticed if
getQueuedBuilds() was busy processing zillions of queue
additions. (This was made worse by the reintroduction of substitute
checking.)
2015-10-22 17:00:46 +02:00
Eelco Dolstra
71bf7e02d5 Use nix::willBuildLocally() 2015-10-21 15:44:29 +02:00
Eelco Dolstra
8e8e31ce86 Re-implement log size limits
The old queue runner already had this. However, we now store "log
limit exceeded" as a separate status code in the database.
2015-10-06 17:35:08 +02:00
Eelco Dolstra
82504fe010 hydra-queue-runner: Use substitutes
This allows Hydra to use binaries from available binary caches. It
makes the queue monitor thread quite a bit slower, so if you don't
want to use binary caches, it's better to add "--option
build-use-substitutes false" to the hydra-queue-runner invocation.

Fixed #243.
2015-10-05 14:57:44 +02:00
Eelco Dolstra
7e954aff03 Keep machine stats even when a machine is removed from the machines file
This is important for the Hydra provisioner, since it needs to be able
to see whether a disabled machine still has jobs running on it.
2015-09-02 13:31:47 +02:00
Eelco Dolstra
2a7fbd57cc Allow the machines file to specify host public keys
It's easier for the Hydra provisioner to put host public keys in the
machines file than to separately manage the known_hosts file
(especially when the provisioner runs on a different machine).
2015-08-26 13:43:02 +02:00
Eelco Dolstra
7aa52517e9 Support multiple machines files
This is primarily useful for the Hydra provisioner, which can write
its machines to another file than /etc/nix/machines.
2015-08-25 15:34:53 +02:00
Eelco Dolstra
092d60735b Keep track of wait time per system type
I.e., how much time the currently runnable steps per system type have
been waiting. This is useful for deciding whether to provision more
machines.
2015-08-17 15:45:44 +02:00
Eelco Dolstra
ea1eb2e3fb Keep track of requiredSystemFeatures in the machine stats
For example, steps that require the "kvm" feature may require a
different kind of machine to be provisioned. This can also be used to
require performance-sensitive tests to run on a particular kind of
machine, e.g., by setting requiredSystemFeatures to something like
"ec2-i2.8xlarge".
2015-08-17 14:37:57 +02:00
Eelco Dolstra
d571e44b86 Keep stats for the Hydra auto scaler
"hydra-queue-runner --status" now prints how many runnable and running
build steps exist for each machine type. This allows additional
machines to be provisioned based on the Hydra load.
2015-08-17 13:50:41 +02:00
Eelco Dolstra
d4759c1da2 hydra-queue-runner: Detect changes to the scheduling shares 2015-08-12 13:17:56 +02:00