Builds can now emit metrics that Hydra will store in its database and
render as time series via flot charts. Typical applications are to
keep track of performance indicators, coverage percentages, artifact
sizes, and so on.
For example, a coverage build can emit the coverage percentage as
follows:
echo "lineCoverage $pct %" > $out/nix-support/hydra-metrics
Graphs of all metrics for a job can be seen at
http://.../job/<project>/<jobset>/<job>#tabs-charts
Specific metrics are also visible at
http://.../job/<project>/<jobset>/<job>/metric/<metric>
The latter URL also allows getting the data in JSON format (e.g. via
"curl -H 'Accept: application/json'").
If Hydra isn't hosted on https://example.com/ but something like
https://example.com/hydra/, the URL for /api/scmdiff would have ended up
on /api/scmdiff rather than /hydra/api/scmdiff.
This is because we didn't use the URI resolver from the controller,
hence we're using it now to build up the whole URL including the query
string.
Signed-off-by: aszlig <aszlig@redmoonstudios.org>
Without an index on (machine, stoptime desc), this requires a
sequential scan. And adding a whole index for this seems
overkill. (Possibly the queue runner could maintain this info more
efficiently.)
This prevents a race where multiple threads see that machine X is
missing path P, and start sending it concurrently. Nix handles this
correctly, but it's still wasteful (especially for the case where P ==
GHC).
A more refined scheme would be to have per machine, per path locks.
Derivations with "preferLocalBuild = true" can now be executed on
specific machines (typically localhost) by setting the mandary system
features field to include "local". For example:
localhost x86_64-linux,i686-linux - 10 100 - local
says that "localhost" can *only* do builds with "preferLocalBuild =
true". The speed factor of 100 will make the machine almost always win
over other machines.
Otherwise we never recover from reset daemon connections, e.g.
hydra-queue-runner[16106]: while loading build 599369: cannot start daemon worker: reading from file: Connection reset by peer
hydra-queue-runner[16106]: while loading build 599236: writing to file: Broken pipe
...
The error is now handled queueMonitor(), causing the next call to
queueMonitorLoop() to create a new connection.
This is currently done by a separate program that periodically
calls "hydra-queue-runner --status". Eventually, I'll do this
in the queue runner directly.
Fixes#220.
Having a hundred threads doing I/O at the same time is bad on magnetic
disks because of the excessive disk seeks. So allow only 4 threads to
copy closures in parallel.
While sorting machines by load, the load of a machine
(machine->currentJobs) can be changed by other threads. If that
happens, the comparator is no longer a proper ordering, in which case
std::sort() can segfault. So we now make a copy of currentJobs before
sorting.
There is a slight possibility that the queue monitor and a builder
thread simultaneously decide to mark a build as finished. That's fine,
as long as we ensure the DB update is idempotent (as ensured by doing
"update Builds set finished = 1 ... where finished = 0").
If a build A depends on a derivation that is the top-level derivation
of some build B, then we should process B before A (meaning we
shouldn't make the derivation runnable before B has been
added). Otherwise, the derivation will be "accounted" to A rather than
B (so the build step will show up in the wrong build).
Aborted builds are now put back on the runnable queue and retried
after a certain time interval (currently 60 seconds for the first
retry, then tripled on each subsequent retry).