Without an index on (machine, stoptime desc), this requires a
sequential scan. And adding a whole index for this seems
overkill. (Possibly the queue runner could maintain this info more
efficiently.)
This prevents a race where multiple threads see that machine X is
missing path P, and start sending it concurrently. Nix handles this
correctly, but it's still wasteful (especially for the case where P ==
GHC).
A more refined scheme would be to have per machine, per path locks.
Derivations with "preferLocalBuild = true" can now be executed on
specific machines (typically localhost) by setting the mandary system
features field to include "local". For example:
localhost x86_64-linux,i686-linux - 10 100 - local
says that "localhost" can *only* do builds with "preferLocalBuild =
true". The speed factor of 100 will make the machine almost always win
over other machines.
Otherwise we never recover from reset daemon connections, e.g.
hydra-queue-runner[16106]: while loading build 599369: cannot start daemon worker: reading from file: Connection reset by peer
hydra-queue-runner[16106]: while loading build 599236: writing to file: Broken pipe
...
The error is now handled queueMonitor(), causing the next call to
queueMonitorLoop() to create a new connection.
This is currently done by a separate program that periodically
calls "hydra-queue-runner --status". Eventually, I'll do this
in the queue runner directly.
Fixes#220.
Having a hundred threads doing I/O at the same time is bad on magnetic
disks because of the excessive disk seeks. So allow only 4 threads to
copy closures in parallel.
While sorting machines by load, the load of a machine
(machine->currentJobs) can be changed by other threads. If that
happens, the comparator is no longer a proper ordering, in which case
std::sort() can segfault. So we now make a copy of currentJobs before
sorting.
There is a slight possibility that the queue monitor and a builder
thread simultaneously decide to mark a build as finished. That's fine,
as long as we ensure the DB update is idempotent (as ensured by doing
"update Builds set finished = 1 ... where finished = 0").
If a build A depends on a derivation that is the top-level derivation
of some build B, then we should process B before A (meaning we
shouldn't make the derivation runnable before B has been
added). Otherwise, the derivation will be "accounted" to A rather than
B (so the build step will show up in the wrong build).
Aborted builds are now put back on the runnable queue and retried
after a certain time interval (currently 60 seconds for the first
retry, then tripled on each subsequent retry).
Hydra-queue-runner now no longer polls the queue periodically, but
instead sleeps until it receives a notification from PostgreSQL about
a change to the queue (build added, build cancelled or build
restarted).
Also, for the "build added" case, we now only check for builds with an
ID greater than the previous greatest ID. This is much more efficient
if the queue is large.
It just makes things unnecessarily complicated. We can just exit
without cleaning anything up, since the only thing to do is unmark
builds and build steps as busy. But we can do that by having systemd
call "hydra-queue-runner --unlock" from ExecStopPost.
If multiple threads create a step for the same build, they could get
the same "max(stepnr)" and allocate conflicting new step numbers. So
lock the BuildSteps table while doing this. We could use a different
isolation level, but this is easier.
This removes the need for Nix's build-remote.pl.
Build logs are now written to $HYDRA_DATA/build-logs because
hydra-queue-runner doesn't have write permission to /nix/var/log.
When visiting the tail-reload page, for a short amount of time the
"unscrolled" version is shown. To circumvent that, let's scroll down
immediately at the first possibility to fill the gap between the loading
of the document and the first AJAX request coming in.
Signed-off-by: aszlig <aszlig@redmoonstudios.org>
There are quite a lot of build outputs which have lines with a length
exceeding the width of the taillog <pre/> and thus visually produce more
lines than 50. This causes the tail "box" to change height frequently
and to get to the bottom you need to scroll down.
We now set a fixed line-height to 120% of the font size and cap the
maximum height based on that value (50 * 1.2 = 60). It's probably not
nice to override the line-height, but max-lines is currently only
available using browser-specific property names. But after all it's just
for the tail output, if people complain about the line-height, we can
still change it :-)
Signed-off-by: aszlig <aszlig@redmoonstudios.org>
We're just implicitly escaping the tail content by not using .load() but
explicitly setting the text content using .text(), so that escaping
isn't needed on our side.
This should get rid of a few formatting errors and possibly XSS if
someone manages to place JS code in the tail of a build and manages to
lurk a user to that tail output.
Signed-off-by: aszlig <aszlig@redmoonstudios.org>
Like eval IDs, build IDs don't convey useful information.
Also, make the job name link to the build rather than the job. When
people click on a build, they expect to go to the build page, not the
job page.
Scheduling is mostly based on jobset shares these days. So showing and
sorting by priority just wastes space and gives the incorrect
impression that Hydra executes builds in the order shown on the queue
page.