When I browse failed builds in a jobset-eval on Hydra, I regularly
mistake actual build-failures with temporary issues like timeouts (that
probably disappear at the next eval).
To prevent this kind of issue, I figured that using the stopsign-svg for
builds with timeouts or exceeded log-limits is a reasonable choice for
the following reasons:
* A user can now distinguish between actual build-errors (like
compilation-failures or oversized outputs) and (usually) temporary issues
(like a bloated log or a timeout).
* The stopsign is also used for aborted jobs that are shown in a
different tab and can't be confused with timeouts for that reason.
In the past, jobsets which are automatically evaluated are evaluated
regularly, on a schedule. This schedule means a new evaluation is
created every checkInterval seconds (assuming something changed.)
This model works well for architectures where our build farm can
easily keep up with demand.
This commit adds a new type of evaluation, called ONE_AT_A_TIME, which
only schedules a new evaluation if the previous evaluation of the
jobset has no unfinished builds.
This model of evaluation lets us have 'low-tier' architectures.
For example, we could now have a jobset for ARMv7l builds, where
the buildfarm only has a single, underpowered ARMv7l builder.
Configuring that jobset as ONE_AT_A_TIME will create an evaluation
and then won't schedule another evaluation until every job of
the existing evaluation is complete.
This way, the cache will have a complete collection of pre-built
software for some commits, but the underpowered architecture will
never become backlogged in ancient revisions.
No more need for a reproduction script! It just says something like
If you have Nix installed, you can reproduce this build on your own
machine by running the following command:
# nix build github:edolstra/dwarffs/09c823e977946668b63ad6c88ed358b48220f124:hydraJobs.build.x86_64-linux
When I press "n builds omitted" I get back to the first tab of a jobset.
This is extremely counter-intuitive, instead this notice should link to
the currently opened tab.
This is a good way to make Hydra hang. (E.g. we had a deletion of
nixos:gcc-7 running for > 12 hours and blocking UPDATE statements from
hydra-queue-runner.) Generally it's better to just disable/hide an old
jobset anyway.
Frequently users want Hydra access just to restart jobs. However,
prior to this commit the only way to grant that access was by giving
them full Admin access which isn't necessarily what we want to do.
By having a restart-jobs role, we can grant this privilege to users
who are known to the community and want to help, but aren't long-time
members.
I haven't tested this commit, but it looks good to me...
* The "Jobset" page now shows when evaluations are in progress (rather
than just pending).
* Restored the ability to do a single evaluation from the command line
by doing "hydra-evaluator <project> <jobset>".
* Fix some consistency issues between jobset status in PostgreSQL and
in hydra-evaluator. In particular, "lastCheckedTime" was never
updated internally.
Setting
xxx-jobset-repeats = patchelf:master:2
will cause Hydra to perform every build step in the specified jobset 2
additional times (i.e. 3 times in total). Non-determinism is not fatal
unless the derivation has the attribute "isDeterministic = true"; we
just note the lack of determinism in the Hydra database. This will
allow us to get stats about the (lack of) reproducibility of all of
Nixpkgs.
Builds can now specify the attribute "isDeterministic = true" to tell
Hydra to build with build-repeat > 0. If there is a mismatch between
rounds, the step / build fails with a suitable status.
Maybe this should be a meta attribute, but that makes it invisible to
hydra-queue-runner, and it seems reasonable to make a claim of
mandatory determinism part of the derivation (since e.g. enabling this
flag should trigger a rebuild).