They will show up in machineTypes as (e.g.) x86_64-linux:local instead
of x86_64-linux. This is to prevent the Hydra provisioner from
creating machines for steps that are supposed to be executed locally.
It's easier for the Hydra provisioner to put host public keys in the
machines file than to separately manage the known_hosts file
(especially when the provisioner runs on a different machine).
This is necessary because the required system type can become
available later (e.g. by being provisioned by the
auto-scaler). However, in the future, we may want to fail steps if
they have been unsupported for more than a certain amount of time.
For example, steps that require the "kvm" feature may require a
different kind of machine to be provisioned. This can also be used to
require performance-sensitive tests to run on a particular kind of
machine, e.g., by setting requiredSystemFeatures to something like
"ec2-i2.8xlarge".
"hydra-queue-runner --status" now prints how many runnable and running
build steps exist for each machine type. This allows additional
machines to be provisioned based on the Hydra load.
If there is no input named 'inputs', hydra-eval-jobs now passes in a set
of lists, where each attribute corresponds to an input defined in the
jobset specification and each list element is a different input alt, as
an argument named 'inputs'.
Among other things, this allows for generic hydra expressions to be
shared amongst projects with similar structures but different sets of
specific inputs.
Builds can now emit metrics that Hydra will store in its database and
render as time series via flot charts. Typical applications are to
keep track of performance indicators, coverage percentages, artifact
sizes, and so on.
For example, a coverage build can emit the coverage percentage as
follows:
echo "lineCoverage $pct %" > $out/nix-support/hydra-metrics
Graphs of all metrics for a job can be seen at
http://.../job/<project>/<jobset>/<job>#tabs-charts
Specific metrics are also visible at
http://.../job/<project>/<jobset>/<job>/metric/<metric>
The latter URL also allows getting the data in JSON format (e.g. via
"curl -H 'Accept: application/json'").
If Hydra isn't hosted on https://example.com/ but something like
https://example.com/hydra/, the URL for /api/scmdiff would have ended up
on /api/scmdiff rather than /hydra/api/scmdiff.
This is because we didn't use the URI resolver from the controller,
hence we're using it now to build up the whole URL including the query
string.
Signed-off-by: aszlig <aszlig@redmoonstudios.org>
Without an index on (machine, stoptime desc), this requires a
sequential scan. And adding a whole index for this seems
overkill. (Possibly the queue runner could maintain this info more
efficiently.)
This prevents a race where multiple threads see that machine X is
missing path P, and start sending it concurrently. Nix handles this
correctly, but it's still wasteful (especially for the case where P ==
GHC).
A more refined scheme would be to have per machine, per path locks.
Derivations with "preferLocalBuild = true" can now be executed on
specific machines (typically localhost) by setting the mandary system
features field to include "local". For example:
localhost x86_64-linux,i686-linux - 10 100 - local
says that "localhost" can *only* do builds with "preferLocalBuild =
true". The speed factor of 100 will make the machine almost always win
over other machines.
Otherwise we never recover from reset daemon connections, e.g.
hydra-queue-runner[16106]: while loading build 599369: cannot start daemon worker: reading from file: Connection reset by peer
hydra-queue-runner[16106]: while loading build 599236: writing to file: Broken pipe
...
The error is now handled queueMonitor(), causing the next call to
queueMonitorLoop() to create a new connection.
This is currently done by a separate program that periodically
calls "hydra-queue-runner --status". Eventually, I'll do this
in the queue runner directly.
Fixes#220.
Having a hundred threads doing I/O at the same time is bad on magnetic
disks because of the excessive disk seeks. So allow only 4 threads to
copy closures in parallel.
While sorting machines by load, the load of a machine
(machine->currentJobs) can be changed by other threads. If that
happens, the comparator is no longer a proper ordering, in which case
std::sort() can segfault. So we now make a copy of currentJobs before
sorting.
There is a slight possibility that the queue monitor and a builder
thread simultaneously decide to mark a build as finished. That's fine,
as long as we ensure the DB update is idempotent (as ensured by doing
"update Builds set finished = 1 ... where finished = 0").
If a build A depends on a derivation that is the top-level derivation
of some build B, then we should process B before A (meaning we
shouldn't make the derivation runnable before B has been
added). Otherwise, the derivation will be "accounted" to A rather than
B (so the build step will show up in the wrong build).
Aborted builds are now put back on the runnable queue and retried
after a certain time interval (currently 60 seconds for the first
retry, then tripled on each subsequent retry).
Hydra-queue-runner now no longer polls the queue periodically, but
instead sleeps until it receives a notification from PostgreSQL about
a change to the queue (build added, build cancelled or build
restarted).
Also, for the "build added" case, we now only check for builds with an
ID greater than the previous greatest ID. This is much more efficient
if the queue is large.
It just makes things unnecessarily complicated. We can just exit
without cleaning anything up, since the only thing to do is unmark
builds and build steps as busy. But we can do that by having systemd
call "hydra-queue-runner --unlock" from ExecStopPost.
If multiple threads create a step for the same build, they could get
the same "max(stepnr)" and allocate conflicting new step numbers. So
lock the BuildSteps table while doing this. We could use a different
isolation level, but this is easier.
This removes the need for Nix's build-remote.pl.
Build logs are now written to $HYDRA_DATA/build-logs because
hydra-queue-runner doesn't have write permission to /nix/var/log.
When visiting the tail-reload page, for a short amount of time the
"unscrolled" version is shown. To circumvent that, let's scroll down
immediately at the first possibility to fill the gap between the loading
of the document and the first AJAX request coming in.
Signed-off-by: aszlig <aszlig@redmoonstudios.org>
There are quite a lot of build outputs which have lines with a length
exceeding the width of the taillog <pre/> and thus visually produce more
lines than 50. This causes the tail "box" to change height frequently
and to get to the bottom you need to scroll down.
We now set a fixed line-height to 120% of the font size and cap the
maximum height based on that value (50 * 1.2 = 60). It's probably not
nice to override the line-height, but max-lines is currently only
available using browser-specific property names. But after all it's just
for the tail output, if people complain about the line-height, we can
still change it :-)
Signed-off-by: aszlig <aszlig@redmoonstudios.org>
We're just implicitly escaping the tail content by not using .load() but
explicitly setting the text content using .text(), so that escaping
isn't needed on our side.
This should get rid of a few formatting errors and possibly XSS if
someone manages to place JS code in the tail of a build and manages to
lurk a user to that tail output.
Signed-off-by: aszlig <aszlig@redmoonstudios.org>
Like eval IDs, build IDs don't convey useful information.
Also, make the job name link to the build rather than the job. When
people click on a build, they expect to go to the build page, not the
job page.