When I browse failed builds in a jobset-eval on Hydra, I regularly
mistake actual build-failures with temporary issues like timeouts (that
probably disappear at the next eval).
To prevent this kind of issue, I figured that using the stopsign-svg for
builds with timeouts or exceeded log-limits is a reasonable choice for
the following reasons:
* A user can now distinguish between actual build-errors (like
compilation-failures or oversized outputs) and (usually) temporary issues
(like a bloated log or a timeout).
* The stopsign is also used for aborted jobs that are shown in a
different tab and can't be confused with timeouts for that reason.
Declarative jobsets were broken by the Nix update, causing
nix cat-file to break silently.
This commit restores declarative jobsets, based on top of a commit
making it easier to see what broke.
In the past, jobsets which are automatically evaluated are evaluated
regularly, on a schedule. This schedule means a new evaluation is
created every checkInterval seconds (assuming something changed.)
This model works well for architectures where our build farm can
easily keep up with demand.
This commit adds a new type of evaluation, called ONE_AT_A_TIME, which
only schedules a new evaluation if the previous evaluation of the
jobset has no unfinished builds.
This model of evaluation lets us have 'low-tier' architectures.
For example, we could now have a jobset for ARMv7l builds, where
the buildfarm only has a single, underpowered ARMv7l builder.
Configuring that jobset as ONE_AT_A_TIME will create an evaluation
and then won't schedule another evaluation until every job of
the existing evaluation is complete.
This way, the cache will have a complete collection of pre-built
software for some commits, but the underpowered architecture will
never become backlogged in ancient revisions.
A postgresql column which is non-null and unique is treated with
the same optimisations as a primary key, so we have no need to
try and recreate the `id` as the primary key.
No read paths are impacted by this change, and the database will
automatically create an ID for each insert. Thus, no code needs to
change.
hydra.nixos.org is already running this rev, and it should be safe to
apply to everyone else. If we make changes to this migration, we'll
need to write another migration anyway.
Lowercasing is due to postgresql not having case-sensitive table names.
It always technically workde before, but those table names never
existed literally.
The switch to generating from postgresql is to handle an upcoming
addition of an auto-incrementign ID to the Jobset table. Sqlite doesn't
seem to be able to handle the table having an auto incrementing ID
field which isn't the primary key, but we can't change the primary
key trivially.
Since hydra doesn't support sqlite and hasn't for many year anyway,
it is easier to just generate from pgsql directly.
Building on macOS with the latest nixpkgs master and NixOS/nixpkgs#77147
fails. It seems some `std::experimental` (optional) for instance are
not available as `experimental`, but are in `std`. Also `toJSON` is
missing for `atomic< unsigned long long >`.
In a NixOS container, cmdBuildDerivation doesn't work because we're
not privileged. But we also don't need it because the store already
has the derivation.
Also, don't copy from/to the store since this gives errors about
missing signatures.
This attribute allows to know if an error occurred or not: when an
error occurs, errormsg is not an empty string. Note we can not use the
errormsg attribute because it can be arbitrarily long and is excluded
from the jobset API response.
This adds the following (pre-existing) attributes to the jobset response:
- nrtotal
- lastcheckedtime
- starttime
- checkinterval
- triggertime
- fetcherrormsg
- errortime
May 15 09:20:10 chef hydra-queue-runner[27523]: Hydra::Plugin::GitlabStatus=HASH(0x519a7b8)->buildFinished: Can't call method "value" on an undefined value at /nix/store/858hinflxcl2jd12wv1r3a8j11ybsf6w-hydra-0.1.2629.89fa829/libexec/hydra/lib/Hydra/Plugin/GitlabStatus.pm line 57.
(cherry picked from commit 438ddf5289)
Plugins are now disabled at startup time unless there is some relevant
configuration in hydra.conf. This avoids hydra-notify having to do a
lot of redundant work (a lot of plugins did a lot of database queries
*before* deciding they were disabled).
Note: BitBucketStatus users will need to add 'enable_bitbucket_status
= 1' to hydra.conf.
* 'eval_started' has the format '<tmpId>\t<project>\t<jobset>'.
* 'eval_failed' has the format '<tmpId>'. (The cause of the error can
be found in the database.)
* 'eval_added' has the format '<tmpId>:<evalId>'.
It now receives notifications about started/finished builds/steps via
PostgreSQL. This gets rid of the (substantial) overhead of starting
hydra-notify for every event. It also allows other programs (even on
other machines) to listen to Hydra notifications.
This adds a `InfluxDBNotification` plugin which is configured as:
```
<influxdb>
url = http://127.0.0.1:8086
db = hydra
</influxdb>
```
which will write a notification for every finished job to the
configured database in InfluxDB looking like:
```
hydra_build_status,cached=false,job=job,jobset=default,project=sample,repo=default,result=success,status=success,system=x86_64-linux build_id="1",build_status=0i,closure_size=584i,duration=0i,main_build_id="1",queued=0i,size=168i 1564156212
```
The creation of the `pg_trgm` extension needs superuser power. So,
this patch makes the extension creation in the Hydra NixOS module when
a local database is used.
If it is not possible to create this extension (remote database for
instance with nosuperuser), the creation of the `pg_trgm` index is
skipped (this index speedup queries on builds.drvpath) and warnings
are emitted:
initialising the Hydra database schema...
WARNING: Can not create extension pg_trgm: permission denied to create extension "pg_trgm"
WARNING: HINT: Temporary provide superuser role to your Hydra Postgresql user and run the script src/sql/upgrade-57.sql
WARNING: The pg_trgm index on builds.drvpath has been skipped (slower complex queries on builds.drvpath)
This allows to keep smooth migrations: the migration process doesn't
require a manual step (but this manual step is recommended on big
remote databases).
The search query uses the LIKE operator which requires a sequential
scan (it can't use the already existing B-tree index). This new
index (trigram) avoids a sequential scan of the builds table when the
LIKE operator is used.
Here is the analyze of a request on the builds table with this index:
explain analyze select * from builds where drvpath like '%k3r71gz0gv16ld8rhcp2bb8gb5w1xc4b%';
QUERY PLAN
-----------------------------------------------------------------------------------------------------------------------------------
Bitmap Heap Scan on builds (cost=128.00..132.01 rows=1 width=492) (actual time=0.070..0.077 rows=1 loops=1)
Recheck Cond: (drvpath ~~ '%k3r71gz0gv16ld8rhcp2bb8gb5w1xc4b%'::text)
-> Bitmap Index Scan on indextrgmbuildsondrvpath (cost=0.00..128.00 rows=1 width=0) (actual time=0.047..0.047 rows=3 loops=1)
Index Cond: (drvpath ~~ '%k3r71gz0gv16ld8rhcp2bb8gb5w1xc4b%'::text)
Total runtime: 0.206 ms
(5 rows)
Currently, a full store path has to be provided to search in
builds. This patch permits to search jobs with a output path or
derivation hash.
Usecase: we are building Docker images with Hydra. The tag of the
Docker image is the hash of the image output path. This patch would
allow us to find back the build job from the tag of a running
container image.
May 15 09:20:10 chef hydra-queue-runner[27523]: Hydra::Plugin::GitlabStatus=HASH(0x519a7b8)->buildFinished: Can't call method "value" on an undefined value at /nix/store/858hinflxcl2jd12wv1r3a8j11ybsf6w-hydra-0.1.2629.89fa829/libexec/hydra/lib/Hydra/Plugin/GitlabStatus.pm line 57.
No more need for a reproduction script! It just says something like
If you have Nix installed, you can reproduce this build on your own
machine by running the following command:
# nix build github:edolstra/dwarffs/09c823e977946668b63ad6c88ed358b48220f124:hydraJobs.build.x86_64-linux
This plugin expects as inputs to a jobset the following:
- gitlab_status_repo => Name of the repository input pointing to that
status updates should be POST'ed, i.e. the jobset has a git input
"nixexprs": "https://gitlab.example.com/project/nixexprs", in which
case "gitlab_status_repo" would be "nixexprs".
- gitlab_project_id => ID of the project in Gitlab, i.e. in the above
case the ID in gitlab of "nixexprs"
The hydra-queue-runner opens a connection to the builder. If the
builder is 'localhost' it starts `nix-store`, otherwise it starts
'ssh'.
Currently, if the hydra-queue-runner can not start `nix-store` (not in
the PATH for instance), the error message is:
cannot connect to ‘localhost’: error: cannot start ssh: No such file
or directory
This is not useful since ssh is actually not started:/
With this patch the error message is now:
cannot connect to ‘localhost’: error: cannot start nix-store: No such file
or directory
Some time ago the data structure for maintainer descriptions in
`nixpkgs` changed from a simple attr set with maintainer emails as
values to an attribute set where the maintainer' nick is associated to
an attribute set with email, GitHub handle and full name.
Hydra can either parse a Nix list or fetches `shortName` from the
associated attribute set (which is used for `meta.licenses` as each
value in it contains a `shortName`). This behavior needs to be
replicated for maintainers to retrieve the emails for `hydra-notify`.
This change is backwards-compatible since `queryMetaStrings` is still
able to understand lists, so old versions of `nixpkgs` or packages using
the old maintainer data structure remain usable.
This is because setting only the initial heap size to more than
the default value (or the configured value) will cause all initial evals
until maxHeapSize expands to the given value to abort.
The 1.1 multiplier comes from the the configured defaults on NixOS' hydra,
and from the previous multiplier used before
7876cf677c.
In order to access protected or private repositories. Using the target
repository URL along with the merge-request ref instead of the source
repository url and branch is necessary to avoid running into issues if
the source repository is not actually accessible to the user Hydra is
authenticating as.
Thanks Alexei Robyn for this patch.
When I press "n builds omitted" I get back to the first tab of a jobset.
This is extremely counter-intuitive, instead this notice should link to
the currently opened tab.
The job has been failing since https://hydra.nixos.org/eval/1461286
with the following error:
hydra-eval-jobs.cc:278:17: error: 'evalSettings' was not declared in this scope
evalSettings.restrictEval = true;
^~~~~~~~~~~~
This is likely due to a typo in 0882519 where that line and the
corresponding comment were moved, and `settings` was changed in that
one place to `evalSettings`.
I reproduced the error by running `nix-build release.nix -A
build.x86_64-linux` on my machine, and this small change fixes it.
You can now set 'evaluator_max_heap_size' to make hydra-eval-jobs
restart itself if the Boehm heap exceeds the specified size.
For example, with 'evaluator_max_heap_size = 256000000',
$ hydra-eval-jobs '<nixpkgs/pkgs/top-level/release.nix>' -I nixpkgs=channel:nixos-17.09
has a max RSS of .56 GiB rather than 4.7 GiB.
Unfortunately it doesn't help much for the NixOS jobsets because of
the "tested" job which requires a huge amount of memory all by itself.
This cannot be done in the hydra-evaluator systemd unit, since then
every other Nix process (e.g. hydra-evaluator and nix-prefetch-*) will
also allocate the specified heap size, probably leading to OOM.
This is a good way to make Hydra hang. (E.g. we had a deletion of
nixos:gcc-7 running for > 12 hours and blocking UPDATE statements from
hydra-queue-runner.) Generally it's better to just disable/hide an old
jobset anyway.