hydra

Author	SHA1	Message	Date
Eelco Dolstra	2946899504	Turn hydra-notify into a daemon It now receives notifications about started/finished builds/steps via PostgreSQL. This gets rid of the (substantial) overhead of starting hydra-notify for every event. It also allows other programs (even on other machines) to listen to Hydra notifications.	2019-08-13 18:18:21 +02:00
Michael Bishop	3ad091faf3	allow using a shorter context and increase hydra-notify debug (cherry picked from commit 1c76ad393669af2f728fd519a050f417319412a6)	2019-03-20 15:22:24 -04:00
Antoine Eiche	9a73ec6455	hydra-queue-runner: better error message if nix-store can not be started The hydra-queue-runner opens a connection to the builder. If the builder is 'localhost' it starts `nix-store`, otherwise it starts 'ssh'. Currently, if the hydra-queue-runner can not start `nix-store` (not in the PATH for instance), the error message is: cannot connect to ‘localhost’: error: cannot start ssh: No such file or directory This is not useful since ssh is actually not started:/ With this patch the error message is now: cannot connect to ‘localhost’: error: cannot start nix-store: No such file or directory	2019-01-23 10:42:47 +01:00
Eelco Dolstra	423c0440ea	Typo	2018-12-20 12:07:02 +01:00
Eelco Dolstra	8d26144121	Fix building against nix master	2018-10-30 14:41:21 +01:00
Eelco Dolstra	4e27796eba	Allow setting GC_INITIAL_HEAP_SIZE for hydra-eval-jobs This cannot be done in the hydra-evaluator systemd unit, since then every other Nix process (e.g. hydra-evaluator and nix-prefetch-*) will also allocate the specified heap size, probably leading to OOM.	2018-05-16 14:14:53 +02:00
Eelco Dolstra	c0fac52872	Add some debug code	2018-03-07 10:23:43 +01:00
Eelco Dolstra	5a1f2a50e5	Handle derivations with system type 'builtin' Fixes #540.	2018-03-07 10:22:35 +01:00
Eelco Dolstra	68afa2bf6f	Dump more system info in /queue-runner-status	2018-03-07 10:06:56 +01:00
Eelco Dolstra	e9670641ec	Distinguish build step states The web interface now shows whether a build step is connecting, copying inputs/outputs, building, etc.	2017-12-07 15:35:31 +01:00
Eelco Dolstra	457483ba0e	Don't lock the BuildSteps table when inserting Instead, optimistically insert a row and retry if there is a conflict.	2017-12-07 14:41:29 +01:00
Eelco Dolstra	eef0d8861b	Remove test line	2017-10-19 13:17:29 +02:00
Eelco Dolstra	b04dc6c76e	Fix root creation when the root already exists but is owned by another user	2017-10-19 12:28:38 +02:00
Eelco Dolstra	cc64e51f75	USER -> LOGNAME for consistency Don't remember why we use LOGNAME. Also ensure that it's set.	2017-10-18 11:23:00 +02:00
Will Dietz	c81594f470	hydra-queue-runner: ensure roots directory exists Fixes #513	2017-10-17 13:04:56 -05:00
Eelco Dolstra	45b138373b	hydra-queue-runner: Write GC roots for outputs paths We lost this behaviour somewhere. So build outputs could be GC'ed when running the collector with --option gc-keep-outputs false.	2017-10-12 18:55:38 +02:00
Eelco Dolstra	27103398c9	Make maxLogSize configurable	2017-09-22 15:23:58 +02:00
Eelco Dolstra	b828224fee	Periodically close RemoteStore connections This prevents an accumulation of temproots. See `89dc62c174`.	2017-09-14 18:16:33 +02:00
Eelco Dolstra	6517446c34	Update to latest nixUnstable	2017-09-14 17:22:48 +02:00
Eelco Dolstra	4af97c57f5	Acquire the send lock only while actually sending Thus, we no longer hold the send lock while substituting missing paths on the build machine. This is a good thing in particular for macOS builders which have a tendency to hang forever in curl downloads.	2017-09-01 16:28:49 +02:00
Eelco Dolstra	50ab80caf2	Don't wait forever to acquire the send lock	2017-09-01 15:29:06 +02:00
Eelco Dolstra	7c976d2aec	hydra-queue-runner: Make build notification more reliable Previously, when hydra-queue-runner was restarted, any pending "build finished" notifications were lost. Now hydra-queue-runner marks finished but unnotified builds in the database and uses that to run pending notifications at startup.	2017-07-26 15:17:51 +02:00
Will Dietz	719df63190	queue-monitor: never move lastBuildId forward without processing jobs.	2017-07-25 20:05:37 -05:00
Eelco Dolstra	e117d85c2a	hydra-queue-runner: Set a thread title for the builder threads This should make debugging slightly easier.	2017-07-25 15:59:41 +02:00
Eelco Dolstra	e78b9fd4ee	hydra-queue-runner: Allow concurrent notifications The queue runner can now run up to ‘max-concurrent-notifications’ in parallel (default is 2). This is useful when some hydra-notify invocations can take a long time to complete (e.g. because they need to compress a giant build log) and we don't want this to block all other notifications.	2017-07-24 16:35:34 +02:00
Eelco Dolstra	bba383bf1b	hydra-queue-runner: Keep some notification statistics	2017-07-24 16:26:44 +02:00
Eelco Dolstra	f46a21e16e	Slight cleanup	2017-07-21 17:22:11 +02:00
Eelco Dolstra	dc5e0b120a	Fix a race that can cause hydra-queue-runner to ignore newly added builds As @dtzWill discovered, with the concurrent hydra-evaluator, there can be multiple active transactions adding builds to the database. As a result, builds can become visible in a non-monotonically increasing order, breaking the queue monitor's assumption that build IDs only go up. The fix is to have hydra-eval-jobset provide the lowest build ID it just added in the builds_added notification, and have the queue monitor check from there. Fixes #496.	2017-07-21 14:34:48 +02:00
Eelco Dolstra	6fc851d376	Improve erorr message	2017-07-17 14:10:34 +02:00
Eelco Dolstra	66ae66024e	Sync with latest Nix	2017-07-17 11:38:58 +02:00
Eelco Dolstra	1f94f03699	Fix build	2017-04-26 15:11:12 +02:00
Eelco Dolstra	cc85208fe4	Fix build	2017-04-18 20:50:18 +02:00
Eelco Dolstra	426aea1236	hydra-queue-runner: Allow multiple concurrent daemon connections	2017-04-06 18:50:53 +02:00
Eelco Dolstra	5810042a3b	Periodically clear Store's path info cache Otherwise the queue runner can consider paths as valid that have been garbage-collected since the first time it queried them.	2017-04-06 17:20:23 +02:00
Eelco Dolstra	8364f4ec70	Upload log files to the right location We were mixing up builds and steps. So for example https://cache.nixos.org/log/2w66a98iqbjdppc5s2b8qvhi3gprvy45-freecell-solver-4.8.0.drv at the moment contains the log for /nix/store/442r9d5ihbcpgq8q9dhijhvhlmplzp96-perl-namespace-autoclean-0.28.drv because the latter is a step in http://hydra.nixos.org/build/51300420. Oops.	2017-04-06 13:05:30 +02:00
Eelco Dolstra	4f11cf45dc	Fix build cancellation We nowadays ignore SIGINT, so the sshd child process inherited this and ignored SIGINT as well.	2017-04-05 11:01:57 +02:00
Eelco Dolstra	147ba3ca31	Set proper charset on log files	2017-03-31 18:00:08 +02:00
Eelco Dolstra	8771f7f913	Merge pull request #382 from shlevy/cached-build-notifications Send BuildFinished notifications on cached build results.	2017-03-29 18:52:20 +02:00
Eelco Dolstra	57bc0eaead	hydra-queue-runner: Limit concurrent database connections Adding a 96-core aarch64 build machine to the build farm caused the potential number of database connections to increase a lot, so we started hitting the Postgres connection limit.	2017-03-21 11:53:46 +01:00
Eelco Dolstra	150228d7de	Upload build logs to the binary cache	2017-03-15 16:59:57 +01:00
Eelco Dolstra	7e6486e694	Move log compression to a plugin	2017-03-15 16:59:57 +01:00
Eelco Dolstra	d1afb42f12	Supress debug message	2017-03-15 16:59:56 +01:00
Eelco Dolstra	73900e9f5f	Fix std::stoi exception	2017-03-08 15:07:52 +01:00
Eelco Dolstra	edebdf33f0	hydra-queue-runner: Handle SIGINT	2017-03-03 12:41:00 +01:00
Eelco Dolstra	500c27e4d5	Add hydra.conf option "nar_buffer_size" to configure memoryTokens limit It defaults to half the physical RAM.	2017-03-03 12:37:27 +01:00
Eelco Dolstra	53b1f7da64	Decrease memoryTokens	2017-02-03 14:44:52 +01:00
Eelco Dolstra	a366f362e1	Use latest nixUnstable	2017-02-03 14:39:18 +01:00
Eelco Dolstra	8a120006f0	Fix version test	2016-12-08 16:03:50 +01:00
Eelco Dolstra	9989e6c0f4	Get exact build start/stop times from the remote	2016-12-07 16:10:21 +01:00
Eelco Dolstra	f6081668dc	Allow determinism checking for entire jobsets Setting xxx-jobset-repeats = patchelf:master:2 will cause Hydra to perform every build step in the specified jobset 2 additional times (i.e. 3 times in total). Non-determinism is not fatal unless the derivation has the attribute "isDeterministic = true"; we just note the lack of determinism in the Hydra database. This will allow us to get stats about the (lack of) reproducibility of all of Nixpkgs.	2016-12-07 15:57:13 +01:00
Eelco Dolstra	8bb36e79bd	Support testing build determinism Builds can now specify the attribute "isDeterministic = true" to tell Hydra to build with build-repeat > 0. If there is a mismatch between rounds, the step / build fails with a suitable status. Maybe this should be a meta attribute, but that makes it invisible to hydra-queue-runner, and it seems reasonable to make a claim of mandatory determinism part of the derivation (since e.g. enabling this flag should trigger a rebuild).	2016-12-06 17:46:06 +01:00
Eelco Dolstra	afb8765ae4	hydra-queue-runner: Bump memory limit to reflect more accurate accounting	2016-11-16 17:51:18 +01:00
Eelco Dolstra	b4d32a3085	hydra-queue-runner: More accurate memory accounting We now take into account the memory necessary for compressing the NAR being exported to the binary cache, plus xz compression overhead. Also, we now release the memory tokens for the NAR accessor after releasing the NAR accessor. Previously the memory for the NAR accessor might still be in use while another thread does an allocation, causing the maximum to be exceeded temporarily. Also, use notify_all instead of notify_one to wake up memory token waiters. This is not very nice, but not every waiter is requesting the same number of tokens, so some might be able to proceed.	2016-11-16 17:48:50 +01:00
Eelco Dolstra	cb5e438a08	Bump Nix Fixes #398.	2016-11-09 19:15:13 +01:00
Eelco Dolstra	1ecc8a4f40	hydra-queue-runner: Fix a race keeping cancelled steps alive If a step is cancelled just as its builder step is starting, doBuildStep() will return sRetry. This causes builder() to make the step runnable again, since the queue monitor may have added new builds referencing it. The idea is that if the latter condition is not true, the step's reference count will drop to zero and it will be deleted. However, if the dispatcher thread sees and locks the step before the reference count can drop to zero in the builder thread, the dispatcher thread will start a new builder thread for the step. Thus the step can be kept alive for an indefinite amount of time. The fix is for State::builder() to use a weak pointer to the step, to ensure that the step's reference count can drop to zero before it's added to the runnable queue.	2016-11-08 11:47:49 +01:00
Eelco Dolstra	de9d7bcf25	hydra-queue-runner: Handle exceptions in the dispatcher thread E.g. "resource unavailable" when creating new threads.	2016-11-08 11:25:43 +01:00
Eelco Dolstra	7863d2e1da	Step cancellation: Don't use pthread_cancel() This was a bad idea because pthread_cancel() is unsalvageable broken in C++. Destructors are not allowed to throw exceptions (especially in C++11), but pthread_cancel() can cause a __cxxabiv1::__forced_unwind exception inside any destructor that invokes a cancellation point. (This exception can be caught but must be rethrown.) So let's just kill the builder process instead.	2016-11-07 19:38:24 +01:00
Eelco Dolstra	d7453bd8be	hydra-queue-runner: Fix message	2016-11-02 12:44:18 +01:00
Eelco Dolstra	4f08c85c69	hydra-queue-runner: Fix assertion failure It was hitting assert(reservation.unique()); Since we do want the machine reservation to be released before calling wakeDispatcher(), let's use a different object for keeping track of active steps.	2016-11-02 12:41:00 +01:00
Eelco Dolstra	b3169ce438	Kill active build steps when builds are cancelled We now kill active build steps when there are no more referring builds. This is useful e.g. for preventing cancelled multi-hour TPC-H benchmark runs from hogging build machines.	2016-10-31 14:58:29 +01:00
Eelco Dolstra	a816ef873d	Warn against empty machines file	2016-10-31 11:40:36 +01:00
Eelco Dolstra	3b84d4711b	Bump Nix	2016-10-26 15:10:56 +02:00
Eelco Dolstra	0b00d51baf	Prevent orphaned build steps If two active steps of the same build failed, then the first would be marked as "failed", but the second would end up as "orphaned", causing it to be marked as "aborted" later on. Now it's correctly marked as "failed".	2016-10-26 14:42:28 +02:00
Eelco Dolstra	8e1d791d0c	Truncate the log just before starting the remote build This gets rid of all those remote substitution messages that were polluting the build logs.	2016-10-26 13:41:51 +02:00
Eelco Dolstra	3fcfa20d1a	Fix regression caused by `ee2e9f53` ‘basicDrv.inputSrcs’ also contains the outputs of inputDrvs. These don't necessarily exist in the local store, so copying them may cause an exception. We should only copy the real inputSrcs.	2016-10-24 16:49:11 +02:00
Eelco Dolstra	a3efdcdfd9	Use std::regex	2016-10-21 18:06:26 +02:00
Eelco Dolstra	e0b2921ff2	Concurrent hydra-evaluator This rewrites the top-level loop of hydra-evaluator in C++. The Perl stuff is moved into hydra-eval-jobset. (Rewriting the entire evaluator would be nice but is a bit too much work.) The new version has some advantages: * It can run multiple jobset evaluations in parallel. * It uses PostgreSQL notifications so it doesn't have to poll the database. So if a jobset is triggered via the web interface or from a GitHub / Bitbucket webhook, evaluation of the jobset will start almost instantaneously (assuming the evaluator is not at its concurrency limit). * It imposes a timeout on evaluations. So if e.g. hydra-eval-jobset hangs connecting to a Mercurial server, it will eventually be killed.	2016-10-14 14:22:12 +02:00
Eelco Dolstra	16feddd5d4	Drop obsolete -laws-cpp-sdk-s3	2016-10-14 14:22:12 +02:00
Eelco Dolstra	dd5af7637d	Remove finally.hh	2016-10-14 14:22:12 +02:00
Eelco Dolstra	ee2e9f5335	Update to reflect BinaryCacheStore changes BinaryCacheStore no longer implements buildPaths() and ensurePath(), so we need to use copyPath() / copyClosure().	2016-10-07 20:23:05 +02:00
Eelco Dolstra	6a313c691b	hydra-queue-runner: Fix build	2016-10-06 16:58:54 +02:00
Alexander Ried	7089142fdc	Add error/warnings for deprecated store specification	2016-10-06 15:10:14 +02:00
Alexander Ried	a73f211bf2	Use store-api for binary cache instantiation	2016-10-06 15:09:44 +02:00
Alexander Ried	1c2f6281b9	Remove signing parameter (nix#f435f82)	2016-10-06 15:09:12 +02:00
Alexander Ried	232e6e8556	Replace buildVerbosity with verboseBuild (nix#5761827)	2016-10-06 15:08:02 +02:00
Alexander Ried	492d16074c	Remove s3binarystore (moved to nix in d155d80)	2016-10-06 15:07:21 +02:00
Eelco Dolstra	b1512a152a	Fix build failure on GCC 5.4	2016-09-30 17:05:07 +02:00
Shea Levy	5962367ffc	Send BuildFinished notifications on cached build results. Fixes #342	2016-08-17 06:40:12 -04:00
Eelco Dolstra	a55942603a	Provide a plugin hook for when build steps finish Fixes #318.	2016-05-27 14:35:32 +02:00
Eelco Dolstra	b50a105ca7	S3BinaryCacheStore: Use disk cache	2016-04-20 15:29:40 +02:00
Eelco Dolstra	afb86638cd	Updates for negative .narinfo caching	2016-04-15 15:39:20 +02:00
Eelco Dolstra	177bf25d64	Queue monitor: Bail out earlier if a step has failed previously Currently, the hydra.nixos.org queue contains 1000s of Darwin builds that all depend on a stdenv-darwin that previously failed. However, before, first createStep() would construct a dependency graph for each build, then getQueuedBuilds() would discover that one of the steps had failed previously and discard all those steps. Since the graph construction involves a lot of uncached calls to isValidPath(), this took several seconds per build. Now createStep() detects the previous failure right away and bails out.	2016-04-15 14:32:16 +02:00
Eelco Dolstra	ef72569cc3	Merge pull request #280 from shlevy/github-status-api Add a plugin to interact with the github status API.	2016-04-14 20:03:45 +02:00
Eelco Dolstra	d6f188a01a	Typo	2016-04-13 16:45:40 +02:00
Eelco Dolstra	b1e36b550c	max-output-size -> max_output_size To be consistent with other Catalyst/Hydra config option names.	2016-04-13 16:30:52 +02:00
Eelco Dolstra	077ed3f571	Periodically clear orphaned build steps These are build steps that remain "busy" in the database even though they have finished, because they couldn't be updated (e.g. due to a PostgreSQL connection problem). To prevent them from showing up as busy in the "Machine status" page, we now periodically purge them.	2016-04-13 16:30:52 +02:00
Eelco Dolstra	f3f661bac1	Reuse build products / metrics stored in the database Previously, if the queue monitor thread encounters a build that Hydra has previously built, it downloaded the output paths from the binary cache, just to determine the build products and metrics. This is very inefficient. In particular, when doing something like merging nixpkgs:staging into nixpkgs:master, the queue monitor thread will be locked up for a long time fetching files from S3, causing the build farm to be mostly idle. Of course this is entirely unnecessary, since the build products/metrics are already in the Hydra database. So now we just look up a previous build with the same output path, and copy the products/metrics.	2016-04-13 16:30:52 +02:00
Eelco Dolstra	8c7edb1005	Fix narrowing conversion	2016-04-13 16:30:52 +02:00
Eelco Dolstra	00c78440b1	Disambiguate "marking build as succeeded" message	2016-04-13 16:30:52 +02:00
Eelco Dolstra	ad834343b5	Fix build against current Nix master	2016-04-13 16:30:52 +02:00
Shea Levy	9b37cb89ae	Add buildStarted plugin hook	2016-04-12 14:42:01 -04:00
Eelco Dolstra	ddc9f3cc6a	Temporarily disable machines on any exception, not just connection failures	2016-03-22 16:54:40 +01:00
Eelco Dolstra	0aecd65e59	/queue-runner-status: Include info about temporarily disabled machines	2016-03-22 16:54:06 +01:00
Eelco Dolstra	5535bc28ca	Tweak	2016-03-10 16:46:15 +01:00
Eelco Dolstra	60e7930d2b	Bump memory limit a bit	2016-03-10 16:46:01 +01:00
Eelco Dolstra	75e7b35477	Fix retry of transient failures	2016-03-10 16:44:26 +01:00
Eelco Dolstra	33da40f272	Doh	2016-03-09 17:31:57 +01:00
Eelco Dolstra	4b9c76e502	hydra-queue-runner: Ensure regular status dumps	2016-03-09 17:11:34 +01:00
Eelco Dolstra	4151be7e69	Make the output size limit configurable The maximum output size per build step (as the sum of the NARs of each output) can be set via hydra.conf, e.g. max-output-size = 1000000000 The default is 2 GiB. Also refactored the build error / status handling a bit.	2016-03-09 17:00:09 +01:00
Eelco Dolstra	dc790c5f7e	Fix bad format string	2016-03-09 16:59:35 +01:00

1 2 3 4 5 ...

322 commits