At the moment, aggregate jobs can easily break and cause the entire
evaluation to fail, which is not ideal. For Nixpkgs, we do have some
important aggregate jobs (like `tested`), but for debugging and building
purposes it's still useful to get a partial result even if the channel
won't actually advance.
This commit changes the behaviour of hydra-eval-jobs such that it
aggregates any errors found during the construction of an aggregate, and
will instead annotate the job with the evaluation failure such that it
shows up in a "cleaner" way.
There are really two types of failure that we care about: one is where
the attribute just ends up missing altogether in the final output, and
also where the attribute is in the output but fails to evaluate. Both
are handled here.
Note that this does mean that the same error message may be output
multiple times, but this aids debuggability because it'll be much
clearer what's blocking the job from being created.
It might happen that a job from the aggregate returned an error!
This is what the vague "[json.exception.type_error.302] type must be string, but is null"
was all about in this instance; there was no `drvPath` to stringify!
So we now actively watch for errors and copy them to the aggregate job.
The vague "[json.exception.type_error.302] type must be string, but is null"
is **absolutely** unhelpful in the way Hydra currently handles it on
evaluation.
This is handling *unexpected* errors only; the following commit will
handle the specific instance of the previously mentioned error.
Recently a few internal APIs have changed[1]. The `outputPaths` function
has been removed and a lot of data structures are modeled with
`std::optional` which broke compilation.
This patch updates the code in `hydra-queue-runner` accordingly to make
sure that Hydra compiles again.
[1] https://github.com/NixOS/nix/pull/3883
Some time ago the data structure for maintainer descriptions in
`nixpkgs` changed from a simple attr set with maintainer emails as
values to an attribute set where the maintainer' nick is associated to
an attribute set with email, GitHub handle and full name.
Hydra can either parse a Nix list or fetches `shortName` from the
associated attribute set (which is used for `meta.licenses` as each
value in it contains a `shortName`). This behavior needs to be
replicated for maintainers to retrieve the emails for `hydra-notify`.
This change is backwards-compatible since `queryMetaStrings` is still
able to understand lists, so old versions of `nixpkgs` or packages using
the old maintainer data structure remain usable.
This is because setting only the initial heap size to more than
the default value (or the configured value) will cause all initial evals
until maxHeapSize expands to the given value to abort.
The 1.1 multiplier comes from the the configured defaults on NixOS' hydra,
and from the previous multiplier used before
7876cf677c.
The job has been failing since https://hydra.nixos.org/eval/1461286
with the following error:
hydra-eval-jobs.cc:278:17: error: 'evalSettings' was not declared in this scope
evalSettings.restrictEval = true;
^~~~~~~~~~~~
This is likely due to a typo in 0882519 where that line and the
corresponding comment were moved, and `settings` was changed in that
one place to `evalSettings`.
I reproduced the error by running `nix-build release.nix -A
build.x86_64-linux` on my machine, and this small change fixes it.
You can now set 'evaluator_max_heap_size' to make hydra-eval-jobs
restart itself if the Boehm heap exceeds the specified size.
For example, with 'evaluator_max_heap_size = 256000000',
$ hydra-eval-jobs '<nixpkgs/pkgs/top-level/release.nix>' -I nixpkgs=channel:nixos-17.09
has a max RSS of .56 GiB rather than 4.7 GiB.
Unfortunately it doesn't help much for the NixOS jobsets because of
the "tested" job which requires a huge amount of memory all by itself.
This cannot be done in the hydra-evaluator systemd unit, since then
every other Nix process (e.g. hydra-evaluator and nix-prefetch-*) will
also allocate the specified heap size, probably leading to OOM.