This is to properly separate channels from regular jobs and also make
sure that we can always iterate on them, no matter whether the build has
failed. The reason why we were not able to do this until now was because
we were iterating on the build products, and whenever some constituent
of a channel job has failed, we didn't get a build output.
So whenever there is a meta.isHydraChannel, we can now properly
distinguish it from the other jobs.
I still don't have any clue, why "make -C src/sql update-dbix" without
*any* modifications tries to create additional schema definitions. But
I've checked the md5sums of the existing schema definitions and they
don't seem to match, so it seems that they already have been tampered
with.
Signed-off-by: aszlig <aszlig@redmoonstudios.org>
Now we can provide different channel expressions for one particular
channel build. Not sure yet how this would be useful, but I found it
more appropriate to use a type instead of a subtype of "file".
This should get us consistent with the provious commit.
Signed-off-by: aszlig <aszlig@redmoonstudios.org>
This is to get a bit more consistency among channel builds but doesn't
do a radical change on the display. Ideally we may want to have a
channel overview with all the constituents and a small help showing how
the user can add the channel.
Unfortunately, this also introduces an inconsistency: We previously used
the *subtype* "channel", but now we're expecting "channel" as the type
of the product, so we need to change this for the channels overview as
well.
Signed-off-by: aszlig <aszlig@redmoonstudios.org>
It's very similar to "jobs" and the code is pretty much the same, except
that we don't do filtering on it. At least it doesn't waste space for a
filter option when there are usually WAY less channel jobs than ordinary
jobs.
Signed-off-by: aszlig <aszlig@redmoonstudios.org>
Currently I'm using a (not very well) downscaled version of the NixOS
logo, so we want to replace it by a proper image ASAP.
Other than that, the idea is to have something like this in
hydra-build-products:
file channel $out/channel.tar.bz2
Right now of course, it's only displayed at the corresponding builds, so
we might want to have aggregates on all channels for a project, jobset
or maybe even single jobs?
Signed-off-by: aszlig <aszlig@redmoonstudios.org>
They will show up in machineTypes as (e.g.) x86_64-linux:local instead
of x86_64-linux. This is to prevent the Hydra provisioner from
creating machines for steps that are supposed to be executed locally.
It's easier for the Hydra provisioner to put host public keys in the
machines file than to separately manage the known_hosts file
(especially when the provisioner runs on a different machine).
This is necessary because the required system type can become
available later (e.g. by being provisioned by the
auto-scaler). However, in the future, we may want to fail steps if
they have been unsupported for more than a certain amount of time.
For example, steps that require the "kvm" feature may require a
different kind of machine to be provisioned. This can also be used to
require performance-sensitive tests to run on a particular kind of
machine, e.g., by setting requiredSystemFeatures to something like
"ec2-i2.8xlarge".
"hydra-queue-runner --status" now prints how many runnable and running
build steps exist for each machine type. This allows additional
machines to be provisioned based on the Hydra load.
If there is no input named 'inputs', hydra-eval-jobs now passes in a set
of lists, where each attribute corresponds to an input defined in the
jobset specification and each list element is a different input alt, as
an argument named 'inputs'.
Among other things, this allows for generic hydra expressions to be
shared amongst projects with similar structures but different sets of
specific inputs.
Builds can now emit metrics that Hydra will store in its database and
render as time series via flot charts. Typical applications are to
keep track of performance indicators, coverage percentages, artifact
sizes, and so on.
For example, a coverage build can emit the coverage percentage as
follows:
echo "lineCoverage $pct %" > $out/nix-support/hydra-metrics
Graphs of all metrics for a job can be seen at
http://.../job/<project>/<jobset>/<job>#tabs-charts
Specific metrics are also visible at
http://.../job/<project>/<jobset>/<job>/metric/<metric>
The latter URL also allows getting the data in JSON format (e.g. via
"curl -H 'Accept: application/json'").
If Hydra isn't hosted on https://example.com/ but something like
https://example.com/hydra/, the URL for /api/scmdiff would have ended up
on /api/scmdiff rather than /hydra/api/scmdiff.
This is because we didn't use the URI resolver from the controller,
hence we're using it now to build up the whole URL including the query
string.
Signed-off-by: aszlig <aszlig@redmoonstudios.org>
Without an index on (machine, stoptime desc), this requires a
sequential scan. And adding a whole index for this seems
overkill. (Possibly the queue runner could maintain this info more
efficiently.)
This prevents a race where multiple threads see that machine X is
missing path P, and start sending it concurrently. Nix handles this
correctly, but it's still wasteful (especially for the case where P ==
GHC).
A more refined scheme would be to have per machine, per path locks.
Derivations with "preferLocalBuild = true" can now be executed on
specific machines (typically localhost) by setting the mandary system
features field to include "local". For example:
localhost x86_64-linux,i686-linux - 10 100 - local
says that "localhost" can *only* do builds with "preferLocalBuild =
true". The speed factor of 100 will make the machine almost always win
over other machines.
Otherwise we never recover from reset daemon connections, e.g.
hydra-queue-runner[16106]: while loading build 599369: cannot start daemon worker: reading from file: Connection reset by peer
hydra-queue-runner[16106]: while loading build 599236: writing to file: Broken pipe
...
The error is now handled queueMonitor(), causing the next call to
queueMonitorLoop() to create a new connection.
This is currently done by a separate program that periodically
calls "hydra-queue-runner --status". Eventually, I'll do this
in the queue runner directly.
Fixes#220.
Having a hundred threads doing I/O at the same time is bad on magnetic
disks because of the excessive disk seeks. So allow only 4 threads to
copy closures in parallel.
While sorting machines by load, the load of a machine
(machine->currentJobs) can be changed by other threads. If that
happens, the comparator is no longer a proper ordering, in which case
std::sort() can segfault. So we now make a copy of currentJobs before
sorting.
There is a slight possibility that the queue monitor and a builder
thread simultaneously decide to mark a build as finished. That's fine,
as long as we ensure the DB update is idempotent (as ensured by doing
"update Builds set finished = 1 ... where finished = 0").
If a build A depends on a derivation that is the top-level derivation
of some build B, then we should process B before A (meaning we
shouldn't make the derivation runnable before B has been
added). Otherwise, the derivation will be "accounted" to A rather than
B (so the build step will show up in the wrong build).
Aborted builds are now put back on the runnable queue and retried
after a certain time interval (currently 60 seconds for the first
retry, then tripled on each subsequent retry).