hydra

ma27/hydra

Author	SHA1	Message	Date
Eelco Dolstra	ea1eb2e3fb	Keep track of requiredSystemFeatures in the machine stats For example, steps that require the "kvm" feature may require a different kind of machine to be provisioned. This can also be used to require performance-sensitive tests to run on a particular kind of machine, e.g., by setting requiredSystemFeatures to something like "ec2-i2.8xlarge".	2015-08-17 14:37:57 +02:00
Eelco Dolstra	d571e44b86	Keep stats for the Hydra auto scaler "hydra-queue-runner --status" now prints how many runnable and running build steps exist for each machine type. This allows additional machines to be provisioned based on the Hydra load.	2015-08-17 13:50:41 +02:00
Eelco Dolstra	d4759c1da2	hydra-queue-runner: Detect changes to the scheduling shares	2015-08-12 13:17:56 +02:00
Eelco Dolstra	576dc0c120	For completeness, re-implement meta.schedulingPriority	2015-08-12 12:05:43 +02:00
Eelco Dolstra	b7965df928	Load the queue in order of global priority	2015-08-11 02:14:34 +02:00
Eelco Dolstra	97f11baa8d	Revive jobset scheduling (I.e. taking the jobset scheduling share into account.)	2015-08-11 01:31:56 +02:00
Eelco Dolstra	eb13007fe6	Allow build to be bumped to the front of the queue via the web interface Builds now have a "Bump up" action. This will cause the queue runner to prioritise the steps of the build above all other steps.	2015-08-10 16:19:47 +02:00
Eelco Dolstra	27182c7c1d	Start steps in order of ascending build ID	2015-08-10 16:19:47 +02:00
Eelco Dolstra	593850b956	Fix potential race in dispatcher wakeup	2015-08-10 12:54:55 +02:00
Eelco Dolstra	6a1c950e94	Unindent	2015-08-10 11:33:22 +02:00
Eelco Dolstra	f21b88e388	Remove superfluous check	2015-08-07 04:20:34 +02:00
Eelco Dolstra	f1fbf8c605	Fix race in finishing builds that have been cancelled	2015-08-07 04:18:48 +02:00
Eelco Dolstra	ff3f5eb4d8	Fix remote building on Nix 1.10	2015-07-31 03:41:55 +02:00
Eelco Dolstra	5b9a288123	Workaround for RemoteStore not supporting cmdBuildDerivation yet	2015-07-31 03:39:20 +02:00
Eelco Dolstra	4d26546d3c	Add support for tracking custom metrics Builds can now emit metrics that Hydra will store in its database and render as time series via flot charts. Typical applications are to keep track of performance indicators, coverage percentages, artifact sizes, and so on. For example, a coverage build can emit the coverage percentage as follows: echo "lineCoverage $pct %" > $out/nix-support/hydra-metrics Graphs of all metrics for a job can be seen at http://.../job/<project>/<jobset>/<job>#tabs-charts Specific metrics are also visible at http://.../job/<project>/<jobset>/<job>/metric/<metric> The latter URL also allows getting the data in JSON format (e.g. via "curl -H 'Accept: application/json'").	2015-07-31 00:57:30 +02:00
Eelco Dolstra	c18fb0ad74	Temporarily disable machines after a connection failure	2015-07-21 15:58:47 +02:00
Eelco Dolstra	7e026d35f7	Split hydra-queue-runner.cc more	2015-07-21 15:14:17 +02:00
Eelco Dolstra	5370be9f52	hydra-queue-runner: Use cmdBuildDerivation See `1511aa9f48` and `eda2f36c2a`.	2015-07-21 01:54:24 +02:00
Eelco Dolstra	3ded87329d	Keep track of how many threads are waiting	2015-07-10 19:10:14 +02:00
Eelco Dolstra	89fb723ace	Notify the queue runner when a build is deleted	2015-07-08 11:43:35 +02:00
Eelco Dolstra	35b7c4f82b	Allow only 1 thread to send a closure to a given machine at the same time This prevents a race where multiple threads see that machine X is missing path P, and start sending it concurrently. Nix handles this correctly, but it's still wasteful (especially for the case where P == GHC). A more refined scheme would be to have per machine, per path locks.	2015-07-07 14:06:48 +02:00
Eelco Dolstra	16696a4aee	Namespace cleanup	2015-07-07 10:29:43 +02:00
Eelco Dolstra	63745b8e25	Move buildRemote() into State	2015-07-07 10:25:33 +02:00
Eelco Dolstra	df29527531	Refactor	2015-07-07 10:17:21 +02:00
Eelco Dolstra	dffb629b8a	Unify Hydra's NixOS module with the one used for hydra.nixos.org In particular, the queue runner and web server now run under different UIDs.	2015-07-02 01:01:44 +02:00
Eelco Dolstra	2ece42b2b9	Support preferLocalBuild Derivations with "preferLocalBuild = true" can now be executed on specific machines (typically localhost) by setting the mandary system features field to include "local". For example: localhost x86_64-linux,i686-linux - 10 100 - local says that "localhost" can only do builds with "preferLocalBuild = true". The speed factor of 100 will make the machine almost always win over other machines.	2015-06-30 00:20:19 +02:00
Eelco Dolstra	008d610467	getQueuedBuilds(): Don't catch errors while loading a build from the queue Otherwise we never recover from reset daemon connections, e.g. hydra-queue-runner[16106]: while loading build 599369: cannot start daemon worker: reading from file: Connection reset by peer hydra-queue-runner[16106]: while loading build 599236: writing to file: Broken pipe ... The error is now handled queueMonitor(), causing the next call to queueMonitorLoop() to create a new connection.	2015-06-26 21:06:35 +02:00
Eelco Dolstra	2f4676bd97	JSONObject doesn't handle 64-bit integers	2015-06-25 16:59:48 +02:00
Eelco Dolstra	c6fcce3b3b	Moar stats	2015-06-25 16:47:39 +02:00
Eelco Dolstra	18a3c3ff1c	Update "make check" for the new queue runner Also, if the machines file contains an entry for localhost, then run "nix-store --serve" directly, without going through SSH.	2015-06-25 16:47:39 +02:00
Eelco Dolstra	32210905d8	Automatically reload $NIX_REMOTE_SYSTEMS when it changes Otherwise, you'd have to restart the queue runner to add or remove machines.	2015-06-25 16:47:25 +02:00
Eelco Dolstra	1a0e1eb5a0	More stats	2015-06-24 13:19:27 +02:00
Eelco Dolstra	3f8891b6ff	Fix incorrect debug message	2015-06-23 17:53:15 +02:00
Eelco Dolstra	af5cbe97aa	createStep(): Cache finished derivations This gets rid of a lot of redundant calls to readDerivation().	2015-06-23 03:25:31 +02:00
Eelco Dolstra	681f63a382	Typo	2015-06-23 02:15:11 +02:00
Eelco Dolstra	524ee295e0	Fix sending notifications in the successful case	2015-06-23 02:13:06 +02:00
Eelco Dolstra	4db7c51b5c	Rate-limit the number of threads copying closures at the same time Having a hundred threads doing I/O at the same time is bad on magnetic disks because of the excessive disk seeks. So allow only 4 threads to copy closures in parallel.	2015-06-23 01:49:14 +02:00
Eelco Dolstra	a317d24b29	hydra-queue-runner: Send build notifications Since our notification plugins are written in Perl, sending notification from C++ requires a small Perl helper named ‘hydra-notify’.	2015-06-23 00:14:49 +02:00
Eelco Dolstra	5312e1209b	Keep per-machine stats	2015-06-22 17:11:17 +02:00
Eelco Dolstra	d06366e7cf	Remove obsolete comment	2015-06-22 16:59:50 +02:00
Eelco Dolstra	e069ee960e	Doh	2015-06-22 16:58:40 +02:00
Eelco Dolstra	41ba7418e2	hydra-queue-runner: More stats	2015-06-22 15:34:33 +02:00
Eelco Dolstra	62b53a0a47	Guard against concurrent invocations of hydra-queue-runner	2015-06-22 14:24:03 +02:00
Eelco Dolstra	fbd7c02217	Periodically dump/log status	2015-06-22 14:15:43 +02:00
Eelco Dolstra	4f4141e1db	Add command ‘hydra-queue-runner --status’ to show current status	2015-06-22 14:06:44 +02:00
Eelco Dolstra	44a2b74f5a	Keep track of the number of build steps that are being built (As opposed to being in the closure copying stage.)	2015-06-22 11:23:00 +02:00
Eelco Dolstra	fed71d3fe9	Move "created" field into Step::State	2015-06-22 11:07:52 +02:00
Eelco Dolstra	90a08db241	hydra-queue-runner: Fix assertion failure	2015-06-22 10:59:07 +02:00
Eelco Dolstra	d744362e4a	hydra-queue-runner: Fix segfault sorting machines by load While sorting machines by load, the load of a machine (machine->currentJobs) can be changed by other threads. If that happens, the comparator is no longer a proper ordering, in which case std::sort() can segfault. So we now make a copy of currentJobs before sorting.	2015-06-21 16:21:42 +02:00
Eelco Dolstra	a0eff6fc15	Fix machine selection	2015-06-19 17:45:26 +02:00
Eelco Dolstra	81abb6e166	Improve parsing of hydra-build-products	2015-06-19 17:20:20 +02:00
Eelco Dolstra	e13477bdf2	Robustness	2015-06-19 16:35:49 +02:00
Eelco Dolstra	f196967c43	Don't create a propagated build step to the same build	2015-06-19 15:33:37 +02:00
Eelco Dolstra	7afc61691b	Doh	2015-06-19 15:27:49 +02:00
Eelco Dolstra	133d298e26	Asynchronously compress build logs	2015-06-19 15:06:12 +02:00
Eelco Dolstra	8e408048e2	Create build step for non-top-level cached failures This fixes the missing build step on failures like http://hydra.nixos.org/build/23222231	2015-06-19 11:33:15 +02:00
Eelco Dolstra	77c8bfd392	Improve logging for aborts	2015-06-19 10:37:22 +02:00
Eelco Dolstra	8db1ae2855	Less verbosity	2015-06-18 17:43:13 +02:00
Eelco Dolstra	89b629eeb1	Fix finishing steps that are not top-level of any build	2015-06-18 17:37:35 +02:00
Eelco Dolstra	9cdbff2fdf	Handle concurrent finishing of the same build There is a slight possibility that the queue monitor and a builder thread simultaneously decide to mark a build as finished. That's fine, as long as we ensure the DB update is idempotent (as ensured by doing "update Builds set finished = 1 ... where finished = 0").	2015-06-18 17:12:51 +02:00
Eelco Dolstra	948473c909	Fix race between the queue monitor and the builder threads	2015-06-18 16:30:28 +02:00
Eelco Dolstra	9c03b11ca8	Simplify retry handling	2015-06-18 14:51:50 +02:00
Eelco Dolstra	e039f5f840	Create failed build steps for cached failures	2015-06-18 04:35:37 +02:00
Eelco Dolstra	92ea800cfb	Set finishedInDB in a few more places	2015-06-18 04:19:21 +02:00
Eelco Dolstra	47367451c7	hydra-queue-runner: Set isCachedBuild	2015-06-18 03:28:58 +02:00
Eelco Dolstra	8257812d0a	Acquire exclusive table lock earlier	2015-06-18 02:44:29 +02:00
Eelco Dolstra	69be3cfe93	hydra-queue-runner: Handle status queries on the main thread Doing it on the queue monitor thread was problematic because processing the queue can take a while.	2015-06-18 01:57:01 +02:00
Eelco Dolstra	a40ca6b76e	hydra-queue-runner: Improve dispatcher We now take the machine speed factor into account, just like build-remote.pl.	2015-06-18 01:52:20 +02:00
Eelco Dolstra	3855131185	hydra-queue-runner: Improve SSH flags	2015-06-18 00:50:48 +02:00
Eelco Dolstra	f57d0b0c54	hydra-queue-runner: Maintain count of active build steps	2015-06-18 00:24:56 +02:00
Eelco Dolstra	59dae60558	hydra-queue-runner: More stats	2015-06-17 22:38:12 +02:00
Eelco Dolstra	ec8e8edc86	hydra-queue-runner: Handle $HYDRA_DBI	2015-06-17 22:11:01 +02:00
Eelco Dolstra	ce9e859a9c	hydra-queue-runner: Implement --unlock	2015-06-17 21:35:20 +02:00
Eelco Dolstra	ca48818b30	Fix remote building	2015-06-17 17:28:59 +02:00
Eelco Dolstra	11be780948	Handle failure with output	2015-06-17 17:11:42 +02:00
Eelco Dolstra	b1a75c7f63	getQueuedBuilds(): Handle dependent builds first If a build A depends on a derivation that is the top-level derivation of some build B, then we should process B before A (meaning we shouldn't make the derivation runnable before B has been added). Otherwise, the derivation will be "accounted" to A rather than B (so the build step will show up in the wrong build).	2015-06-17 14:46:02 +02:00
Eelco Dolstra	745efce828	hydra-queue-runner: Implement timeouts Also, keep track of timeouts in the database as a distinct build status.	2015-06-17 13:32:33 +02:00
Eelco Dolstra	2da4987bc2	Don't lock the CPU	2015-06-17 11:48:38 +02:00
Eelco Dolstra	b91a616520	Automatically retry aborted builds Aborted builds are now put back on the runnable queue and retried after a certain time interval (currently 60 seconds for the first retry, then tripled on each subsequent retry).	2015-06-17 11:45:20 +02:00
Eelco Dolstra	e02654b3a0	Prefer cached failure over unsupported system type	2015-06-16 18:00:39 +02:00
Eelco Dolstra	42e7301c08	Add status dump facility Doing $ psql hydra -c 'notify dump_status' will cause hydra-queue-runner to dump some internal status info on stderr.	2015-06-15 18:20:14 +02:00
Eelco Dolstra	dd104f14ea	Make the queue monitor more robust, and better debug output	2015-06-15 16:54:52 +02:00
Eelco Dolstra	147eb4fd15	Support requiredSystemFeatures	2015-06-15 16:33:50 +02:00
Eelco Dolstra	508ab7f8a2	Tweak build steps	2015-06-15 15:48:05 +02:00
Eelco Dolstra	21aaa0596b	Fail builds with previously failed steps early	2015-06-15 15:31:42 +02:00
Eelco Dolstra	c00bf7cd1a	Check non-runnable steps for unsupported system type	2015-06-15 15:13:03 +02:00
Eelco Dolstra	5019fceb20	Add a error type for "unsupported system type"	2015-06-15 15:07:04 +02:00
Eelco Dolstra	541fbd62cc	Immediately abort builds that require an unsupported system type	2015-06-15 14:51:49 +02:00
Eelco Dolstra	f9cd5adae8	Queue monitor: Get only the fields we need	2015-06-11 18:09:50 +02:00
Eelco Dolstra	c974fb893b	Support cancelling builds	2015-06-11 18:07:45 +02:00
Eelco Dolstra	c08883966c	Use PostgreSQL notifications for queue events Hydra-queue-runner now no longer polls the queue periodically, but instead sleeps until it receives a notification from PostgreSQL about a change to the queue (build added, build cancelled or build restarted). Also, for the "build added" case, we now only check for builds with an ID greater than the previous greatest ID. This is much more efficient if the queue is large.	2015-06-11 17:41:59 +02:00
Eelco Dolstra	d72a88b562	Don't try to handle SIGINT It just makes things unnecessarily complicated. We can just exit without cleaning anything up, since the only thing to do is unmark builds and build steps as busy. But we can do that by having systemd call "hydra-queue-runner --unlock" from ExecStopPost.	2015-06-10 15:55:46 +02:00
Eelco Dolstra	a4fb93c119	Lock builds for a shorter amount of time	2015-06-10 15:36:21 +02:00
Eelco Dolstra	6d738a31bf	Keep track of failed paths in the Hydra database I.e. don't use Nix's failed paths feature anymore. Easier to keep everything in one place.	2015-06-10 14:57:16 +02:00
Eelco Dolstra	c68036f8b0	Pass ssh key	2015-06-10 14:57:07 +02:00
Eelco Dolstra	7dd1f0097e	Finish copyClosure	2015-06-09 16:03:41 +02:00
Eelco Dolstra	c93aa92563	Create BuildSteps race-free If multiple threads create a step for the same build, they could get the same "max(stepnr)" and allocate conflicting new step numbers. So lock the BuildSteps table while doing this. We could use a different isolation level, but this is easier.	2015-06-09 15:03:20 +02:00
Eelco Dolstra	61d4060522	Record the machine used for a build step	2015-06-09 14:57:49 +02:00
Eelco Dolstra	ca1fbdd058	Mark builds as busy	2015-06-09 14:31:43 +02:00
Eelco Dolstra	8b12ac1f6d	Basic remote building This removes the need for Nix's build-remote.pl. Build logs are now written to $HYDRA_DATA/build-logs because hydra-queue-runner doesn't have write permission to /nix/var/log.	2015-06-09 14:21:21 +02:00
Eelco Dolstra	3a6cb2f270	Implement a database connection pool	2015-05-29 20:55:13 +02:00
Eelco Dolstra	214b95706c	On SIGINT, shut down the builder threads Note that they don't get interrupted at the moment (so on SIGINT, any running builds will need to finish first).	2015-05-29 20:02:15 +02:00
Eelco Dolstra	e778821940	Make concurrency more robust	2015-05-29 17:14:20 +02:00
Eelco Dolstra	8640e30787	Very basic multi-threaded queue runner	2015-05-29 01:31:12 +02:00
Eelco Dolstra	604fdb908f	Pass null values to libpqxx properly	2015-05-28 19:06:17 +02:00
Eelco Dolstra	dc446c3980	Start of single-process hydra-queue-runner	2015-05-28 17:39:29 +02:00

1 2 3 4 5

206 commits