Recently a few internal APIs have changed[1]. The `outputPaths` function
has been removed and a lot of data structures are modeled with
`std::optional` which broke compilation.
This patch updates the code in `hydra-queue-runner` accordingly to make
sure that Hydra compiles again.
[1] https://github.com/NixOS/nix/pull/3883
With the current implementation, if ANY hash was found inside the decl
spec, the spec would be treated as static. This is problematic since
`inputs` is a hash and hence any configuration would be handled as a
static one.
This fixes the code to match the documentation and only switch to static
processing when ALL values are hashes.
As of https://github.com/NixOS/hydra/pull/737 (removal of sqlite
dependency), the only supported database is Postgresql.
This change removes all references to hydra-postgresql.sql file. This
file is generated using a cpp on hydra.sql, but doesn't differ from
hydra.sql at all.
PathInput plugin keeps a cache of path evaluations. This cache is simple, and
path is not checked more than once every N seconds, where N=30. The caching is
there to avoid expensive calls to `nix-store --add`.
This change makes the validity period configurable. The main use case is
`api-test.pl` which was implemented wrong for a while, as the invocation of
`hydra-eval-jobset` would return the previous evaluation, claiming there are no
changes. The test has been fixed to check better for a new evaluation.
`build_finished` Postgres event will never be fired for the dependent builds.
For example, on our Hydra, the following query always returns increasing
numbers, even though all notifications have been delivered:
```
hydra=> select count(1) from builds where notificationpendingsince is not null;
count
-------
4583
(1 row)
```
Thus, we have to iterate over all dependent builds and mark their
`notificationpendingsince` as `null`, otherwise they will pile up until
the next restart of hydra-notify, when they will get delivered.
When deploying Hydra different than hydra.nixos.org one may encounter a problem
as building any job that uses IFD fails with:
May 22 19:41:07 hydra hydra-evaluator[6960]: error: "attempted to realize '/nix/store/1jm02mfiv58rpy8zrx95cpqxzsp64ssh-source.drv' during evaluation but 'allow-import-from-derivation' is false"
May 22 19:41:07 hydra hydra-evaluator[6960]: error: "attempted to realize '/nix/store/av3jr8ix4qcadq2wm3y3hplvxwzlhl4y-source.drv' during evaluation but 'allow-import-from-derivation' is false"
May 22 19:41:07 hydra hydra-evaluator[6960]: error: "attempted to realize
'/nix/store/2jm02mfiv58rpy8zrx95cpqxzsp64ssh-source.drv' during evaluation but
'allow-import-from-derivation' is false"
The recent change enforced passing `--no-allow-import-from-derivation`
to `hydra-eval-job` unconditionally. This change makes it configurable and
defaults to **NOT PASSING IT** -- most of the deployments allow IFDs.
The configuration option is called `allow_import_from_derivation` and
defaults to `true`. It is interpreted as a boolean, with only true option being
`true`.
Taken from `Perl::Critic`:
A common idiom in perl for dealing with possible errors is to use `eval`
followed by a check of `$@`/`$EVAL_ERROR`:
eval {
...
};
if ($EVAL_ERROR) {
...
}
There's a problem with this: the value of `$EVAL_ERROR` (`$@`) can change
between the end of the `eval` and the `if` statement. The issue are object
destructors:
package Foo;
...
sub DESTROY {
...
eval { ... };
...
}
package main;
eval {
my $foo = Foo->new();
...
};
if ($EVAL_ERROR) {
...
}
Assuming there are no other references to `$foo` created, when the
`eval` block in `main` is exited, `Foo::DESTROY()` will be invoked,
regardless of whether the `eval` finished normally or not. If the `eval`
in `main` fails, but the `eval` in `Foo::DESTROY()` succeeds, then
`$EVAL_ERROR` will be empty by the time that the `if` is executed.
Additional issues arise if you depend upon the exact contents of
`$EVAL_ERROR` and both `eval`s fail, because the messages from both will
be concatenated.
Even if there isn't an `eval` directly in the `DESTROY()` method code,
it may invoke code that does use `eval` or otherwise affects
`$EVAL_ERROR`.
The solution is to ensure that, upon normal exit, an `eval` returns a
true value and to test that value:
# Constructors are no problem.
my $object = eval { Class->new() };
# To cover the possiblity that an operation may correctly return a
# false value, end the block with "1":
if ( eval { something(); 1 } ) {
...
}
eval {
...
1;
}
or do {
# Error handling here
};
Unfortunately, you can't use the `defined` function to test the result;
`eval` returns an empty string on failure.
Various modules have been written to take some of the pain out of
properly localizing and checking `$@`/`$EVAL_ERROR`. For example:
use Try::Tiny;
try {
...
} catch {
# Error handling here;
# The exception is in $_/$ARG, not $@/$EVAL_ERROR.
}; # Note semicolon.
"But we don't use DESTROY() anywhere in our code!" you say. That may be
the case, but do any of the third-party modules you use have them? What
about any you may use in the future or updated versions of the ones you
already use?
The original code would return standard "Please come back later" page when there
are only fetch errors on a newly setup declarative project. The problem is that
there are two types of errors: standard errors and fetch errors. Each is
acompanied by a corresponding field for time of occurence. Standard errors use
'errortime', while fetch errors have 'lastchecktime' set to the time of the
error. Unfortunately, jobset.tt file was only using 'errortime' for displaying
the time. This would result in the following errors in logs:
Couldn't render template "date error - bad time/date string: expects 'hⓂ️s dⓂ️y' got: ''
This change includes using 'lastchecktime' when rendering the error times.
The current implementation will pass all values to `create_or_update` method. The
missing values will end up as `undef` (or `NULL`) when assigned to `%update`.
Thus, for columns that are NOT NULL, when, for example, flakes are not used,
will result in a horrible:
DBIx::Class::Storage::DBI::_dbh_execute(): DBI Exception: DBD::Pg::st execute failed:
ERROR: null value in column "type" violates not-null constraint
DETAIL: Failing row contains (.jobsets, 118, hydra, hydra jobsets, src, hydra/jobsets.nix, null,
null, null, 1589536378, 1, 0, 0, , 3, 30, 100, null, null, 1589536379, null, null). [for Statement
"UPDATE jobsets SET checkinterval = ?, description = ?, enableemail = ?, nixexprinput = ?,
nixexprpath = ?, type = ? WHERE ( ( name = ? AND project = ? ) )" with ParamValues: 1='30',
2='hydra jobsets', 3='0', 4='src', 5='hydra/jobsets.nix', 6=undef, 7='.jobsets', 8='hydra'] at
/nix/store/lsf81ip9ybxihk5praf2n0nh14a6i9j0-hydra-0.1.19700101.DIRTY/libexec/hydra/lib/Hydra/Helper/AddBuilds.pm line 50
This change just omits adding such values to `%update`, which results in
PostgreSQL assigning the default values.
The previous code converted option values to ints when the value
contained a digit somewhere. This is too eager since it also converts
strings like `release-0.2` to an int which should not happen.
We now only convert to int when the value is an integer.
This plugin is a counterpart to GithubPulls plugin. Instead of fetching pull
requests, it will fetch all references (branches and tags) that start with a
particular prefix.
The plugin is a copy of GithubPulls plugin with appropriate changes to call the
right API and parse the config matching the need.
To quote the function's comment:
Awful hack to handle timeouts in SQLite: just retry the transaction.
DBD::SQLite *has* a 30 second retry window, but apparently it
doesn't work.
Since SQLite is now dropped entirely, this wrapper can be removed
completely.
SQLite isn't properly supported by Hydra for a few years now[1], but
Hydra still depends on it. Apart from a slightly bigger closure this can
cause confusion by users since Hydra picks up SQLite rather than
PostgreSQL by default if HYDRA_DBI isn't configured properly[2]
[1] 78974abb69
[2] https://logs.nix.samueldr.com/nixos-dev/2020-04-10#3297342;
If we don't see machine that supports a build step for
'max_unsupported_time' seconds, the step is aborted. The default is 0,
which is appropriate for Hydra installations that don't provision
missing machines dynamically.
(cherry picked from commit f5cdbfe21d)
If we don't see machine that supports a build step for
'max_unsupported_time' seconds, the step is aborted. The default is 0,
which is appropriate for Hydra installations that don't provision
missing machines dynamically.
When I browse failed builds in a jobset-eval on Hydra, I regularly
mistake actual build-failures with temporary issues like timeouts (that
probably disappear at the next eval).
To prevent this kind of issue, I figured that using the stopsign-svg for
builds with timeouts or exceeded log-limits is a reasonable choice for
the following reasons:
* A user can now distinguish between actual build-errors (like
compilation-failures or oversized outputs) and (usually) temporary issues
(like a bloated log or a timeout).
* The stopsign is also used for aborted jobs that are shown in a
different tab and can't be confused with timeouts for that reason.
Declarative jobsets were broken by the Nix update, causing
nix cat-file to break silently.
This commit restores declarative jobsets, based on top of a commit
making it easier to see what broke.
In the past, jobsets which are automatically evaluated are evaluated
regularly, on a schedule. This schedule means a new evaluation is
created every checkInterval seconds (assuming something changed.)
This model works well for architectures where our build farm can
easily keep up with demand.
This commit adds a new type of evaluation, called ONE_AT_A_TIME, which
only schedules a new evaluation if the previous evaluation of the
jobset has no unfinished builds.
This model of evaluation lets us have 'low-tier' architectures.
For example, we could now have a jobset for ARMv7l builds, where
the buildfarm only has a single, underpowered ARMv7l builder.
Configuring that jobset as ONE_AT_A_TIME will create an evaluation
and then won't schedule another evaluation until every job of
the existing evaluation is complete.
This way, the cache will have a complete collection of pre-built
software for some commits, but the underpowered architecture will
never become backlogged in ancient revisions.
A postgresql column which is non-null and unique is treated with
the same optimisations as a primary key, so we have no need to
try and recreate the `id` as the primary key.
No read paths are impacted by this change, and the database will
automatically create an ID for each insert. Thus, no code needs to
change.
hydra.nixos.org is already running this rev, and it should be safe to
apply to everyone else. If we make changes to this migration, we'll
need to write another migration anyway.
Lowercasing is due to postgresql not having case-sensitive table names.
It always technically workde before, but those table names never
existed literally.
The switch to generating from postgresql is to handle an upcoming
addition of an auto-incrementign ID to the Jobset table. Sqlite doesn't
seem to be able to handle the table having an auto incrementing ID
field which isn't the primary key, but we can't change the primary
key trivially.
Since hydra doesn't support sqlite and hasn't for many year anyway,
it is easier to just generate from pgsql directly.
Building on macOS with the latest nixpkgs master and NixOS/nixpkgs#77147
fails. It seems some `std::experimental` (optional) for instance are
not available as `experimental`, but are in `std`. Also `toJSON` is
missing for `atomic< unsigned long long >`.
In a NixOS container, cmdBuildDerivation doesn't work because we're
not privileged. But we also don't need it because the store already
has the derivation.
Also, don't copy from/to the store since this gives errors about
missing signatures.
This attribute allows to know if an error occurred or not: when an
error occurs, errormsg is not an empty string. Note we can not use the
errormsg attribute because it can be arbitrarily long and is excluded
from the jobset API response.
This adds the following (pre-existing) attributes to the jobset response:
- nrtotal
- lastcheckedtime
- starttime
- checkinterval
- triggertime
- fetcherrormsg
- errortime
May 15 09:20:10 chef hydra-queue-runner[27523]: Hydra::Plugin::GitlabStatus=HASH(0x519a7b8)->buildFinished: Can't call method "value" on an undefined value at /nix/store/858hinflxcl2jd12wv1r3a8j11ybsf6w-hydra-0.1.2629.89fa829/libexec/hydra/lib/Hydra/Plugin/GitlabStatus.pm line 57.
(cherry picked from commit 438ddf5289)
Plugins are now disabled at startup time unless there is some relevant
configuration in hydra.conf. This avoids hydra-notify having to do a
lot of redundant work (a lot of plugins did a lot of database queries
*before* deciding they were disabled).
Note: BitBucketStatus users will need to add 'enable_bitbucket_status
= 1' to hydra.conf.
* 'eval_started' has the format '<tmpId>\t<project>\t<jobset>'.
* 'eval_failed' has the format '<tmpId>'. (The cause of the error can
be found in the database.)
* 'eval_added' has the format '<tmpId>:<evalId>'.
It now receives notifications about started/finished builds/steps via
PostgreSQL. This gets rid of the (substantial) overhead of starting
hydra-notify for every event. It also allows other programs (even on
other machines) to listen to Hydra notifications.
This adds a `InfluxDBNotification` plugin which is configured as:
```
<influxdb>
url = http://127.0.0.1:8086
db = hydra
</influxdb>
```
which will write a notification for every finished job to the
configured database in InfluxDB looking like:
```
hydra_build_status,cached=false,job=job,jobset=default,project=sample,repo=default,result=success,status=success,system=x86_64-linux build_id="1",build_status=0i,closure_size=584i,duration=0i,main_build_id="1",queued=0i,size=168i 1564156212
```
The creation of the `pg_trgm` extension needs superuser power. So,
this patch makes the extension creation in the Hydra NixOS module when
a local database is used.
If it is not possible to create this extension (remote database for
instance with nosuperuser), the creation of the `pg_trgm` index is
skipped (this index speedup queries on builds.drvpath) and warnings
are emitted:
initialising the Hydra database schema...
WARNING: Can not create extension pg_trgm: permission denied to create extension "pg_trgm"
WARNING: HINT: Temporary provide superuser role to your Hydra Postgresql user and run the script src/sql/upgrade-57.sql
WARNING: The pg_trgm index on builds.drvpath has been skipped (slower complex queries on builds.drvpath)
This allows to keep smooth migrations: the migration process doesn't
require a manual step (but this manual step is recommended on big
remote databases).
The search query uses the LIKE operator which requires a sequential
scan (it can't use the already existing B-tree index). This new
index (trigram) avoids a sequential scan of the builds table when the
LIKE operator is used.
Here is the analyze of a request on the builds table with this index:
explain analyze select * from builds where drvpath like '%k3r71gz0gv16ld8rhcp2bb8gb5w1xc4b%';
QUERY PLAN
-----------------------------------------------------------------------------------------------------------------------------------
Bitmap Heap Scan on builds (cost=128.00..132.01 rows=1 width=492) (actual time=0.070..0.077 rows=1 loops=1)
Recheck Cond: (drvpath ~~ '%k3r71gz0gv16ld8rhcp2bb8gb5w1xc4b%'::text)
-> Bitmap Index Scan on indextrgmbuildsondrvpath (cost=0.00..128.00 rows=1 width=0) (actual time=0.047..0.047 rows=3 loops=1)
Index Cond: (drvpath ~~ '%k3r71gz0gv16ld8rhcp2bb8gb5w1xc4b%'::text)
Total runtime: 0.206 ms
(5 rows)
Currently, a full store path has to be provided to search in
builds. This patch permits to search jobs with a output path or
derivation hash.
Usecase: we are building Docker images with Hydra. The tag of the
Docker image is the hash of the image output path. This patch would
allow us to find back the build job from the tag of a running
container image.
May 15 09:20:10 chef hydra-queue-runner[27523]: Hydra::Plugin::GitlabStatus=HASH(0x519a7b8)->buildFinished: Can't call method "value" on an undefined value at /nix/store/858hinflxcl2jd12wv1r3a8j11ybsf6w-hydra-0.1.2629.89fa829/libexec/hydra/lib/Hydra/Plugin/GitlabStatus.pm line 57.
No more need for a reproduction script! It just says something like
If you have Nix installed, you can reproduce this build on your own
machine by running the following command:
# nix build github:edolstra/dwarffs/09c823e977946668b63ad6c88ed358b48220f124:hydraJobs.build.x86_64-linux
This plugin expects as inputs to a jobset the following:
- gitlab_status_repo => Name of the repository input pointing to that
status updates should be POST'ed, i.e. the jobset has a git input
"nixexprs": "https://gitlab.example.com/project/nixexprs", in which
case "gitlab_status_repo" would be "nixexprs".
- gitlab_project_id => ID of the project in Gitlab, i.e. in the above
case the ID in gitlab of "nixexprs"
The hydra-queue-runner opens a connection to the builder. If the
builder is 'localhost' it starts `nix-store`, otherwise it starts
'ssh'.
Currently, if the hydra-queue-runner can not start `nix-store` (not in
the PATH for instance), the error message is:
cannot connect to ‘localhost’: error: cannot start ssh: No such file
or directory
This is not useful since ssh is actually not started:/
With this patch the error message is now:
cannot connect to ‘localhost’: error: cannot start nix-store: No such file
or directory
Some time ago the data structure for maintainer descriptions in
`nixpkgs` changed from a simple attr set with maintainer emails as
values to an attribute set where the maintainer' nick is associated to
an attribute set with email, GitHub handle and full name.
Hydra can either parse a Nix list or fetches `shortName` from the
associated attribute set (which is used for `meta.licenses` as each
value in it contains a `shortName`). This behavior needs to be
replicated for maintainers to retrieve the emails for `hydra-notify`.
This change is backwards-compatible since `queryMetaStrings` is still
able to understand lists, so old versions of `nixpkgs` or packages using
the old maintainer data structure remain usable.
This is because setting only the initial heap size to more than
the default value (or the configured value) will cause all initial evals
until maxHeapSize expands to the given value to abort.
The 1.1 multiplier comes from the the configured defaults on NixOS' hydra,
and from the previous multiplier used before
7876cf677c.
In order to access protected or private repositories. Using the target
repository URL along with the merge-request ref instead of the source
repository url and branch is necessary to avoid running into issues if
the source repository is not actually accessible to the user Hydra is
authenticating as.
Thanks Alexei Robyn for this patch.
The PathInput input for local paths was previously enhanced to allow
URLs for which it would use a nix-prefetch-url operation. This change
updates the prompt for the declarative input type to indicate this
capability.
When I press "n builds omitted" I get back to the first tab of a jobset.
This is extremely counter-intuitive, instead this notice should link to
the currently opened tab.
The job has been failing since https://hydra.nixos.org/eval/1461286
with the following error:
hydra-eval-jobs.cc:278:17: error: 'evalSettings' was not declared in this scope
evalSettings.restrictEval = true;
^~~~~~~~~~~~
This is likely due to a typo in 0882519 where that line and the
corresponding comment were moved, and `settings` was changed in that
one place to `evalSettings`.
I reproduced the error by running `nix-build release.nix -A
build.x86_64-linux` on my machine, and this small change fixes it.
You can now set 'evaluator_max_heap_size' to make hydra-eval-jobs
restart itself if the Boehm heap exceeds the specified size.
For example, with 'evaluator_max_heap_size = 256000000',
$ hydra-eval-jobs '<nixpkgs/pkgs/top-level/release.nix>' -I nixpkgs=channel:nixos-17.09
has a max RSS of .56 GiB rather than 4.7 GiB.
Unfortunately it doesn't help much for the NixOS jobsets because of
the "tested" job which requires a huge amount of memory all by itself.
This cannot be done in the hydra-evaluator systemd unit, since then
every other Nix process (e.g. hydra-evaluator and nix-prefetch-*) will
also allocate the specified heap size, probably leading to OOM.
This is a good way to make Hydra hang. (E.g. we had a deletion of
nixos:gcc-7 running for > 12 hours and blocking UPDATE statements from
hydra-queue-runner.) Generally it's better to just disable/hide an old
jobset anyway.
Frequently users want Hydra access just to restart jobs. However,
prior to this commit the only way to grant that access was by giving
them full Admin access which isn't necessarily what we want to do.
By having a restart-jobs role, we can grant this privilege to users
who are known to the community and want to help, but aren't long-time
members.
I haven't tested this commit, but it looks good to me...
When using the "build" or "sysbuild" jobset input types in conjunction
with a binary cache store, the evaluator needs to be able to fetch
store paths from the binary cache. Typical usage:
store_uri = s3://nix-test-cache?secret-key=...
eval_substituter = s3://nix-test-cache
Also, the public key of the binary cache must be added to
binary-cache-public-keys in nix.conf, otherwise the local nix-daemon
won't allow the store paths to be copied over.
Also, remove support in hydra-eval-jobs for multiple jobset input
alternatives. The web interface hasn't supported this in a long
time. Thus we can use the regular "--arg" handler.
This makes downloading/viewing build results work with binary cache
stores. For good performance, this should be used in conjunction with
ca580bec35,
i.e. you should set store_uri to something like
s3://my-cache?local-nar-cache=/tmp/nar-cache
to cache NARs between requests.
When creating a Hydra user with the `hydra-create-user` command, you can now
provide a SHA1 password hash with the `--password-hash` flag. This is useful for
the upcoming work on Fully Declarative Hydra, since the end user should not have
to specify plaintext passwords in their `configuration.nix` file.
Thus, we no longer hold the send lock while substituting missing paths
on the build machine. This is a good thing in particular for macOS
builders which have a tendency to hang forever in curl downloads.
Previously, when hydra-queue-runner was restarted, any pending "build
finished" notifications were lost. Now hydra-queue-runner marks
finished but unnotified builds in the database and uses that to run
pending notifications at startup.
The queue runner can now run up to ‘max-concurrent-notifications’ in
parallel (default is 2). This is useful when some hydra-notify
invocations can take a long time to complete (e.g. because they need
to compress a giant build log) and we don't want this to block all
other notifications.
As @dtzWill discovered, with the concurrent hydra-evaluator, there can
be multiple active transactions adding builds to the database. As a
result, builds can become visible in a non-monotonically increasing
order, breaking the queue monitor's assumption that build IDs only go
up.
The fix is to have hydra-eval-jobset provide the lowest build ID it
just added in the builds_added notification, and have the queue
monitor check from there.
Fixes#496.
This plugin will post to the build status system in BitBucket. In order
to use it you need to add to ExtraConfig
<bitbucket>
username = bitbucket_username
password = bitbucket_password
</bitbucket>
You can use an application password https://blog.bitbucket.org/2016/06/06/app-passwords-bitbucket-cloud/
This can take an excessive amount of time. For example, on
hydra.nixos.org, a call to hydra-notify takes 0.7s even if there are
no plugins. So for an eval with ~45K new builds, the calls to
hydra-notify add up to about 9 hours.
The proper fix would be to pass a list of build IDs, or an eval ID.
This can be used with declarative projects to build PRs.
The github_authorization section should contain verbatim Authorization header contents keyed by repo owner for private repos
1. From the hydra configuration file.
The configuration is loaded from the "git-input" block.
Currently only the "timeout" variable is been looked up in the file.
<git-input>
# general timeout
timeout = 400
<input-name>
# specific timeout for a particular input name
timeout = 400
</input-name>
# use quotes when the input name has spaces
<"foot with spaces">
# specific timeout for a particular input name
timeout = 400
</"foo with spaces">
</git-input>
2. As an argument in the input value after the repo url and branch (and after the deepClone if is defined)
"timeout=<value>"
The preference on which value is used:
1. input value
2. Block with the name of the input in the <git-input> block
3. "timeout" inside the <git-input> block
4. Default value of 600 seconds. (original hard-coded value)
The code is generalized for more values to be configured, it might be too much
for a single value on a single plugin.
Adding a 96-core aarch64 build machine to the build farm caused the
potential number of database connections to increase a lot, so we
started hitting the Postgres connection limit.
* The "Jobset" page now shows when evaluations are in progress (rather
than just pending).
* Restored the ability to do a single evaluation from the command line
by doing "hydra-evaluator <project> <jobset>".
* Fix some consistency issues between jobset status in PostgreSQL and
in hydra-evaluator. In particular, "lastCheckedTime" was never
updated internally.
Setting
xxx-jobset-repeats = patchelf:master:2
will cause Hydra to perform every build step in the specified jobset 2
additional times (i.e. 3 times in total). Non-determinism is not fatal
unless the derivation has the attribute "isDeterministic = true"; we
just note the lack of determinism in the Hydra database. This will
allow us to get stats about the (lack of) reproducibility of all of
Nixpkgs.
Builds can now specify the attribute "isDeterministic = true" to tell
Hydra to build with build-repeat > 0. If there is a mismatch between
rounds, the step / build fails with a suitable status.
Maybe this should be a meta attribute, but that makes it invisible to
hydra-queue-runner, and it seems reasonable to make a claim of
mandatory determinism part of the derivation (since e.g. enabling this
flag should trigger a rebuild).