Passwords that are sha1 will be transparently upgraded to argon2,
and future comparisons will use Argon2
Co-authored-by: Graham Christensen <graham@grahamc.com>
The default password comparison logic does not use
constant time validation. Switching to constant time
offers a meager improvement by removing a timing
oracle.
A prepatory step in moving to Argon2id password storage, since we'll need this change anyway after
for validating existing passwords.
Co-authored-by: Graham Christensen <graham@grahamc.com>
Some time in the last decade the plugin switched to preferring
a flatter namespace for realm config.
Co-authored-by: Graham Christensen <graham@grahamc.com>
In Nix the protocol was slightly altered[1] to also contain more
information about realisations. This however wasn't read from the pipe
that was used to read from the store.
After the `cmdBuildDerivation` command which caused this issue, Hydra
will issue a `cmdQueryPathInfos` that tries to read from the remote
store as well. However, there's still left over to read from the
previous command and thus Nix fails to properly allocate the expected
string.
[1] See rev a2b69660a9b326b95d48bd222993c5225bbd5b5f
Fixes#898
Co-authored-by: Graham Christensen <graham@grahamc.com>
... but just fixing up merge conflicts from the introduction of flakes
and the removal of the Jobs table.
This is a breaking change. Previously, packages named `packageset.foo`
would be exposed in the fake derivation channel as `packageset-foo`.
Presumably this was done to avoid needing to track attribute sets, and
to avoid the complexity. I think this now correctly handles the
complexity and properly mirrors the input expressions layout.
Previously, the build ID would never flow through channels which
exited.
This patch tracks the buildOne state as part of State and exits avoids
waiting forever for new work.
The code around buildOnly is a bit rough, making this a bit weird to
implement but since it is only used for testing the value of improving
it on its own is a bit questionable.
A reproduce script includes a logline that may resemble:
> using these flags: --arg nixpkgs { outPath = /tmp/build-137689173/nixpkgs/source; rev = "fdc872fa200a32456f12cc849d33b1fdbd6a933c"; shortRev = "fdc872f"; revCount = 273100; } -I nixpkgs=/tmp/build-137689173/nixpkgs/source --arg officialRelease false --option extra-binary-caches https://hydra.nixos.org/ --option system x86_64-linux /tmp/build-137689173/nixpkgs/source/pkgs/top-level/release.nix -A
These are passed along to nix-build and that's fine and dandy, but you can't just copy-paste this as is, as the `{}` introduces a syntax error and the value accompanying `-A` is `''`.
A very naive approach is to just `printf "%q"` the individual args, which makes them safe to copy-paste. Unfortunately, this looks awful due to the liberal usage of slashes:
```
$ printf "%q" '{ outPath = /tmp/build-137689173/nixpkgs/source; rev = "fdc872fa200a32456f12cc849d33b1fdbd6a933c"; shortRev = "fdc872f"; revCount = 273100; }'
\{\ outPath\ =\ /tmp/build-137689173/nixpkgs/source\;\ rev\ =\ \"fdc872fa200a32456f12cc849d33b1fdbd6a933c\"\;\ shortRev\ =\ \"fdc872f\"\;\ revCount\ =\ 273100\;\ \}
```
Alternatively, if we just use `set -x` before we execute nix-build, we'll get the whole invocation in a friendly, copy-pastable format that nicely displays `{}`-enclosed content and preserves the empty arg following `-A`:
```
running nix-build...
using this invocation:
+ nix-build --arg nixpkgs '{ outPath = /tmp/build-138165173/nixpkgs/source; rev = "e0e4484f2c028d2269f5ebad0660a51bbe46caa4"; shortRev = "e0e4484"; revCount = 274008; }' -I nixpkgs=/tmp/build-138165173/nixpkgs/source --arg officialRelease false --option extra-binary-caches https://hydra.nixos.org/ --option system x86_64-linux /tmp/build-138165173/nixpkgs/source/pkgs/top-level/release.nix -A ''
```
The queue runner used to special-case `localhost` as a remote builder:
Rather than using the normal remote-build (using the
`cmdBuildDerivation` command), it was using the (generally less
efficient, except when running against localhost) `cmdBuildPaths`
command because the latter didn't require a privileged Nix user (so made
testing easier − allowing to run hydra in a container in particular).
However:
1. this means that the build loop can follow two discint code paths depending
on the setup, the irony being that the most commonly used one in production
(the “non-localhost” case) isn't the one used in the testsuite (because all
the tests run against a local store);
2. It turns out that the “localhost” version is buggy in relatively obvious
ways − in particular a failure in a fixed-output derivation or a hash
mismatch isn't reported properly;
3. If the “run in a container” use-case is indeed that important, it can be
(partially) restored using a chroot store (which wouldn't behave excactly
the same way of course, but would be more than good-enough for testing)
The current check happening in jobsets is incorrect.
The wanted constraint is stated as follow :
- If type is 0 (legacy), then the flake field should be null, and
both nixExprInput and nixExprPath should be non-null
- If type is 1 (flake), then the flake field should be non-null, and
both nixExprInput and nixExprPath should be null
The current version will not catch (i.e. it will accept) situations
where you have for instance :
type = 1, nixExprPath null, nixExprInput non-null, flake non-null
This commit fixes that.
I split(ted) that into two constraints, to make it more readable and
easier to extend if a new type appears in the future.
The complete query could be instead :
( type = 0
AND nixExprInput IS NOT NULL AND nixExprPath IS NOT NULL AND flake IS NULL )
OR ( type = 1
AND nixExprInput IS NULL AND nixExprPath IS NULL AND flake IS NOT NULL )
(but an "OR" cannot be split, hence the other formulation)
DBIx likes to eagerly select all columns without a way to really tell
it so. Therefore, this splits this one large column in to its own
table.
I'd also like to make "jobsets" use this table too, but that is on hold
to stop the bleeding caused by the extreme amount of traffic this is
causing.
The database has these constraints:
check ((type = 0) = (nixExprInput is not null and nixExprPath is not null)),
check ((type = 1) = (flake is not null)),
which prevented switching to flakes in a declarative jobspec, since the
nixexpr{path,input} fields were not nulled in such an update
Co-Authored-By: Graham Christensen <graham@grahamc.com>
This search query is pretty heavy. Defaulting to 500 has caused
Hydra's web UI to appear to be down. Since 500 can take it down, users
probably shouldn't be allowed t ask for that many.
Duplicating this data on every record of the builds table cost
approximately 4G of duplication.
Note that the database migration included took about 4h45m on an
untuned server which uses very slow rotational disks in a RAID5 setup,
with not a lot of RAM. I imagine in production it might take an hour
or two, but not 4. If this should become a chunked migration, I can do
that.
Note: Because of the question about chunked migrations, I have NOT
YET tested this migration thoroughly enough for merge.
Looking at AWS' Performance Insights for a Hydra instance, I found
the hydra-queue-runner's query:
select id, buildStatus, releaseName, closureSize, size
from Builds b
join BuildOutputs o on b.id = o.build
where
finished = ?
and (buildStatus = ? or buildStatus = ?)
and path = $1
was the slowest query by at least 10x. Running an explain on this
showed why:
hydra=> explain select id, buildStatus, releaseName, closureSize, size
from Builds b join BuildOutputs o on b.id = o.build where
finished = 1 and (buildStatus = 0 or buildStatus = 6) and
path = '/nix/store/s93khs2dncf2cy273mbyr4fb4ns3db20-MIDIVisualizer-5.1';
QUERY PLAN
------------------------------------------------------------------------
Gather (cost=1000.43..33718.98 rows=2 width=56)
Workers Planned: 2
-> Nested Loop (cost=0.43..32718.78 rows=1 width=56)
-> Parallel Seq Scan on buildoutputs o (cost=0.00..32710.32
rows=1
width=4)
Filter: (path = '/nix/store/s93kh...snip...'::text)
-> Index Scan using indexbuildsonjobsetidfinishedid on builds b
(cost=0.43..8.45 rows=1 width=56)
Index Cond: ((id = o.build) AND (finished = 1))
Filter: ((buildstatus = 0) OR (buildstatus = 6))
(8 rows)
A paralell sequential scan is definitely better than a sequential scan, but the
cost ranging from 0 to 32710 is not great. Looking at the table, I saw the `path`
column is completely unindex:
hydra=> \d buildoutputs
Table "public.buildoutputs"
Column | Type | Collation | Nullable | Default
--------+---------+-----------+----------+---------
build | integer | | not null |
name | text | | not null |
path | text | | not null |
Indexes:
"buildoutputs_pkey" PRIMARY KEY, btree (build, name)
Foreign-key constraints:
"buildoutputs_build_fkey" FOREIGN KEY (build) REFERENCES builds(id)
ON DELETE CASCADE
Since we always do exact matches on the path and don't care about ordering,
and since the path column is very high cardinality a `hash` index is a
good candidate. Note that I did test a btree index and it performed
similarly well, but slightly worse.
After creating the index (this took about 10 seconds) on a test database:
create index IndexBuildOutputsPath on BuildOutputs using hash(path);
We get a *significantly* reduced cost:
hydra=> explain select id, buildStatus, releaseName, closureSize, size
hydra-> from Builds b join BuildOutputs o on b.id = o.build where
hydra-> finished = 1 and (buildStatus = 0 or buildStatus = 6) and
hydra-> path = '/nix/store/s93khs2dncf2cy273mbyr4fb4ns3db20-MIDIVisualizer-5.1';
QUERY PLAN
-------------------------------------------------------------------------------------------------------
Nested Loop (cost=0.43..41.41 rows=2 width=56)
-> Index Scan using buildoutputs_path_hash on buildoutputs o (cost=0.00..16.05 rows=3 width=4)
Index Cond: (path = '/nix/store/s93khs2dncf2cy273mbyr4fb4ns3db20-MIDIVisualizer-5.1'::text)
-> Index Scan using indexbuildsonjobsetidfinishedid on builds b (cost=0.43..8.45 rows=1 width=56)
Index Cond: ((id = o.build) AND (finished = 1))
Filter: ((buildstatus = 0) OR (buildstatus = 6))
(6 rows)
For direct comparison, the overall query plan was changed:
From: Gather (cost=1000.43..33718.98 rows=2 width=56)
To: Nested Loop (cost= 0.43.....41.41 rows=2 width=56)
and the query plan for buildoutputs changed from a maximum cost of
32,710 down to 16.
In practical terms, the query's planning and execution time was reduced:
Before (ms) | Try 1 | Try 2 | Try 3
------------+---------+---------+--------
Planning | 0.898 | 0.416 | 0.383
Execution | 138.644 | 172.331 | 375.585
After (ms) | Try 1 | Try 2 | Try 3
------------+---------+---------+--------
Planning | 0.298 | 0.290 | 0.296
Execution | 219.625 | 0.035 | 0.034
Requires the following configuration options
enable_github_login = 1
github_client_id
github_client_secret
Or github_client_secret_file which points to a file with the secret
Fixes this error:
ERROR: failed to process declarative jobset test:inputs,
DBIx::Class::Storage::DBI::_dbh_execute(): DBI Exception: DBD::Pg::st
execute failed: ERROR: null value in column "emailoverride" violates
not-null constraint
This would start happening if the network connection between the Hydra
server and the remote build server breaks after sucessfully importing
at least one output of a derivation, but before having finished
importing all outputs.
Fixes#816.
These make the hydra-queue-runner logs very noisy even when not using the GitlabStatus plugin.
Also, they shouldn't be necessary except when developing the plugin itself and should have been removed before release.
It might happen that a job from the aggregate returned an error!
This is what the vague "[json.exception.type_error.302] type must be string, but is null"
was all about in this instance; there was no `drvPath` to stringify!
So we now actively watch for errors and copy them to the aggregate job.
The vague "[json.exception.type_error.302] type must be string, but is null"
is **absolutely** unhelpful in the way Hydra currently handles it on
evaluation.
This is handling *unexpected* errors only; the following commit will
handle the specific instance of the previously mentioned error.
Recently a few internal APIs have changed[1]. The `outputPaths` function
has been removed and a lot of data structures are modeled with
`std::optional` which broke compilation.
This patch updates the code in `hydra-queue-runner` accordingly to make
sure that Hydra compiles again.
[1] https://github.com/NixOS/nix/pull/3883
With the current implementation, if ANY hash was found inside the decl
spec, the spec would be treated as static. This is problematic since
`inputs` is a hash and hence any configuration would be handled as a
static one.
This fixes the code to match the documentation and only switch to static
processing when ALL values are hashes.
As of https://github.com/NixOS/hydra/pull/737 (removal of sqlite
dependency), the only supported database is Postgresql.
This change removes all references to hydra-postgresql.sql file. This
file is generated using a cpp on hydra.sql, but doesn't differ from
hydra.sql at all.
PathInput plugin keeps a cache of path evaluations. This cache is simple, and
path is not checked more than once every N seconds, where N=30. The caching is
there to avoid expensive calls to `nix-store --add`.
This change makes the validity period configurable. The main use case is
`api-test.pl` which was implemented wrong for a while, as the invocation of
`hydra-eval-jobset` would return the previous evaluation, claiming there are no
changes. The test has been fixed to check better for a new evaluation.
`build_finished` Postgres event will never be fired for the dependent builds.
For example, on our Hydra, the following query always returns increasing
numbers, even though all notifications have been delivered:
```
hydra=> select count(1) from builds where notificationpendingsince is not null;
count
-------
4583
(1 row)
```
Thus, we have to iterate over all dependent builds and mark their
`notificationpendingsince` as `null`, otherwise they will pile up until
the next restart of hydra-notify, when they will get delivered.
When deploying Hydra different than hydra.nixos.org one may encounter a problem
as building any job that uses IFD fails with:
May 22 19:41:07 hydra hydra-evaluator[6960]: error: "attempted to realize '/nix/store/1jm02mfiv58rpy8zrx95cpqxzsp64ssh-source.drv' during evaluation but 'allow-import-from-derivation' is false"
May 22 19:41:07 hydra hydra-evaluator[6960]: error: "attempted to realize '/nix/store/av3jr8ix4qcadq2wm3y3hplvxwzlhl4y-source.drv' during evaluation but 'allow-import-from-derivation' is false"
May 22 19:41:07 hydra hydra-evaluator[6960]: error: "attempted to realize
'/nix/store/2jm02mfiv58rpy8zrx95cpqxzsp64ssh-source.drv' during evaluation but
'allow-import-from-derivation' is false"
The recent change enforced passing `--no-allow-import-from-derivation`
to `hydra-eval-job` unconditionally. This change makes it configurable and
defaults to **NOT PASSING IT** -- most of the deployments allow IFDs.
The configuration option is called `allow_import_from_derivation` and
defaults to `true`. It is interpreted as a boolean, with only true option being
`true`.
Taken from `Perl::Critic`:
A common idiom in perl for dealing with possible errors is to use `eval`
followed by a check of `$@`/`$EVAL_ERROR`:
eval {
...
};
if ($EVAL_ERROR) {
...
}
There's a problem with this: the value of `$EVAL_ERROR` (`$@`) can change
between the end of the `eval` and the `if` statement. The issue are object
destructors:
package Foo;
...
sub DESTROY {
...
eval { ... };
...
}
package main;
eval {
my $foo = Foo->new();
...
};
if ($EVAL_ERROR) {
...
}
Assuming there are no other references to `$foo` created, when the
`eval` block in `main` is exited, `Foo::DESTROY()` will be invoked,
regardless of whether the `eval` finished normally or not. If the `eval`
in `main` fails, but the `eval` in `Foo::DESTROY()` succeeds, then
`$EVAL_ERROR` will be empty by the time that the `if` is executed.
Additional issues arise if you depend upon the exact contents of
`$EVAL_ERROR` and both `eval`s fail, because the messages from both will
be concatenated.
Even if there isn't an `eval` directly in the `DESTROY()` method code,
it may invoke code that does use `eval` or otherwise affects
`$EVAL_ERROR`.
The solution is to ensure that, upon normal exit, an `eval` returns a
true value and to test that value:
# Constructors are no problem.
my $object = eval { Class->new() };
# To cover the possiblity that an operation may correctly return a
# false value, end the block with "1":
if ( eval { something(); 1 } ) {
...
}
eval {
...
1;
}
or do {
# Error handling here
};
Unfortunately, you can't use the `defined` function to test the result;
`eval` returns an empty string on failure.
Various modules have been written to take some of the pain out of
properly localizing and checking `$@`/`$EVAL_ERROR`. For example:
use Try::Tiny;
try {
...
} catch {
# Error handling here;
# The exception is in $_/$ARG, not $@/$EVAL_ERROR.
}; # Note semicolon.
"But we don't use DESTROY() anywhere in our code!" you say. That may be
the case, but do any of the third-party modules you use have them? What
about any you may use in the future or updated versions of the ones you
already use?
The original code would return standard "Please come back later" page when there
are only fetch errors on a newly setup declarative project. The problem is that
there are two types of errors: standard errors and fetch errors. Each is
acompanied by a corresponding field for time of occurence. Standard errors use
'errortime', while fetch errors have 'lastchecktime' set to the time of the
error. Unfortunately, jobset.tt file was only using 'errortime' for displaying
the time. This would result in the following errors in logs:
Couldn't render template "date error - bad time/date string: expects 'hⓂ️s dⓂ️y' got: ''
This change includes using 'lastchecktime' when rendering the error times.
The current implementation will pass all values to `create_or_update` method. The
missing values will end up as `undef` (or `NULL`) when assigned to `%update`.
Thus, for columns that are NOT NULL, when, for example, flakes are not used,
will result in a horrible:
DBIx::Class::Storage::DBI::_dbh_execute(): DBI Exception: DBD::Pg::st execute failed:
ERROR: null value in column "type" violates not-null constraint
DETAIL: Failing row contains (.jobsets, 118, hydra, hydra jobsets, src, hydra/jobsets.nix, null,
null, null, 1589536378, 1, 0, 0, , 3, 30, 100, null, null, 1589536379, null, null). [for Statement
"UPDATE jobsets SET checkinterval = ?, description = ?, enableemail = ?, nixexprinput = ?,
nixexprpath = ?, type = ? WHERE ( ( name = ? AND project = ? ) )" with ParamValues: 1='30',
2='hydra jobsets', 3='0', 4='src', 5='hydra/jobsets.nix', 6=undef, 7='.jobsets', 8='hydra'] at
/nix/store/lsf81ip9ybxihk5praf2n0nh14a6i9j0-hydra-0.1.19700101.DIRTY/libexec/hydra/lib/Hydra/Helper/AddBuilds.pm line 50
This change just omits adding such values to `%update`, which results in
PostgreSQL assigning the default values.
The previous code converted option values to ints when the value
contained a digit somewhere. This is too eager since it also converts
strings like `release-0.2` to an int which should not happen.
We now only convert to int when the value is an integer.
This plugin is a counterpart to GithubPulls plugin. Instead of fetching pull
requests, it will fetch all references (branches and tags) that start with a
particular prefix.
The plugin is a copy of GithubPulls plugin with appropriate changes to call the
right API and parse the config matching the need.
To quote the function's comment:
Awful hack to handle timeouts in SQLite: just retry the transaction.
DBD::SQLite *has* a 30 second retry window, but apparently it
doesn't work.
Since SQLite is now dropped entirely, this wrapper can be removed
completely.
SQLite isn't properly supported by Hydra for a few years now[1], but
Hydra still depends on it. Apart from a slightly bigger closure this can
cause confusion by users since Hydra picks up SQLite rather than
PostgreSQL by default if HYDRA_DBI isn't configured properly[2]
[1] 78974abb69
[2] https://logs.nix.samueldr.com/nixos-dev/2020-04-10#3297342;
If we don't see machine that supports a build step for
'max_unsupported_time' seconds, the step is aborted. The default is 0,
which is appropriate for Hydra installations that don't provision
missing machines dynamically.
(cherry picked from commit f5cdbfe21d)
If we don't see machine that supports a build step for
'max_unsupported_time' seconds, the step is aborted. The default is 0,
which is appropriate for Hydra installations that don't provision
missing machines dynamically.
When I browse failed builds in a jobset-eval on Hydra, I regularly
mistake actual build-failures with temporary issues like timeouts (that
probably disappear at the next eval).
To prevent this kind of issue, I figured that using the stopsign-svg for
builds with timeouts or exceeded log-limits is a reasonable choice for
the following reasons:
* A user can now distinguish between actual build-errors (like
compilation-failures or oversized outputs) and (usually) temporary issues
(like a bloated log or a timeout).
* The stopsign is also used for aborted jobs that are shown in a
different tab and can't be confused with timeouts for that reason.
Declarative jobsets were broken by the Nix update, causing
nix cat-file to break silently.
This commit restores declarative jobsets, based on top of a commit
making it easier to see what broke.
In the past, jobsets which are automatically evaluated are evaluated
regularly, on a schedule. This schedule means a new evaluation is
created every checkInterval seconds (assuming something changed.)
This model works well for architectures where our build farm can
easily keep up with demand.
This commit adds a new type of evaluation, called ONE_AT_A_TIME, which
only schedules a new evaluation if the previous evaluation of the
jobset has no unfinished builds.
This model of evaluation lets us have 'low-tier' architectures.
For example, we could now have a jobset for ARMv7l builds, where
the buildfarm only has a single, underpowered ARMv7l builder.
Configuring that jobset as ONE_AT_A_TIME will create an evaluation
and then won't schedule another evaluation until every job of
the existing evaluation is complete.
This way, the cache will have a complete collection of pre-built
software for some commits, but the underpowered architecture will
never become backlogged in ancient revisions.
A postgresql column which is non-null and unique is treated with
the same optimisations as a primary key, so we have no need to
try and recreate the `id` as the primary key.
No read paths are impacted by this change, and the database will
automatically create an ID for each insert. Thus, no code needs to
change.
hydra.nixos.org is already running this rev, and it should be safe to
apply to everyone else. If we make changes to this migration, we'll
need to write another migration anyway.
Lowercasing is due to postgresql not having case-sensitive table names.
It always technically workde before, but those table names never
existed literally.
The switch to generating from postgresql is to handle an upcoming
addition of an auto-incrementign ID to the Jobset table. Sqlite doesn't
seem to be able to handle the table having an auto incrementing ID
field which isn't the primary key, but we can't change the primary
key trivially.
Since hydra doesn't support sqlite and hasn't for many year anyway,
it is easier to just generate from pgsql directly.
Building on macOS with the latest nixpkgs master and NixOS/nixpkgs#77147
fails. It seems some `std::experimental` (optional) for instance are
not available as `experimental`, but are in `std`. Also `toJSON` is
missing for `atomic< unsigned long long >`.
In a NixOS container, cmdBuildDerivation doesn't work because we're
not privileged. But we also don't need it because the store already
has the derivation.
Also, don't copy from/to the store since this gives errors about
missing signatures.
This attribute allows to know if an error occurred or not: when an
error occurs, errormsg is not an empty string. Note we can not use the
errormsg attribute because it can be arbitrarily long and is excluded
from the jobset API response.
This adds the following (pre-existing) attributes to the jobset response:
- nrtotal
- lastcheckedtime
- starttime
- checkinterval
- triggertime
- fetcherrormsg
- errortime
May 15 09:20:10 chef hydra-queue-runner[27523]: Hydra::Plugin::GitlabStatus=HASH(0x519a7b8)->buildFinished: Can't call method "value" on an undefined value at /nix/store/858hinflxcl2jd12wv1r3a8j11ybsf6w-hydra-0.1.2629.89fa829/libexec/hydra/lib/Hydra/Plugin/GitlabStatus.pm line 57.
(cherry picked from commit 438ddf5289)
Plugins are now disabled at startup time unless there is some relevant
configuration in hydra.conf. This avoids hydra-notify having to do a
lot of redundant work (a lot of plugins did a lot of database queries
*before* deciding they were disabled).
Note: BitBucketStatus users will need to add 'enable_bitbucket_status
= 1' to hydra.conf.
* 'eval_started' has the format '<tmpId>\t<project>\t<jobset>'.
* 'eval_failed' has the format '<tmpId>'. (The cause of the error can
be found in the database.)
* 'eval_added' has the format '<tmpId>:<evalId>'.
It now receives notifications about started/finished builds/steps via
PostgreSQL. This gets rid of the (substantial) overhead of starting
hydra-notify for every event. It also allows other programs (even on
other machines) to listen to Hydra notifications.
This adds a `InfluxDBNotification` plugin which is configured as:
```
<influxdb>
url = http://127.0.0.1:8086
db = hydra
</influxdb>
```
which will write a notification for every finished job to the
configured database in InfluxDB looking like:
```
hydra_build_status,cached=false,job=job,jobset=default,project=sample,repo=default,result=success,status=success,system=x86_64-linux build_id="1",build_status=0i,closure_size=584i,duration=0i,main_build_id="1",queued=0i,size=168i 1564156212
```
The creation of the `pg_trgm` extension needs superuser power. So,
this patch makes the extension creation in the Hydra NixOS module when
a local database is used.
If it is not possible to create this extension (remote database for
instance with nosuperuser), the creation of the `pg_trgm` index is
skipped (this index speedup queries on builds.drvpath) and warnings
are emitted:
initialising the Hydra database schema...
WARNING: Can not create extension pg_trgm: permission denied to create extension "pg_trgm"
WARNING: HINT: Temporary provide superuser role to your Hydra Postgresql user and run the script src/sql/upgrade-57.sql
WARNING: The pg_trgm index on builds.drvpath has been skipped (slower complex queries on builds.drvpath)
This allows to keep smooth migrations: the migration process doesn't
require a manual step (but this manual step is recommended on big
remote databases).
The search query uses the LIKE operator which requires a sequential
scan (it can't use the already existing B-tree index). This new
index (trigram) avoids a sequential scan of the builds table when the
LIKE operator is used.
Here is the analyze of a request on the builds table with this index:
explain analyze select * from builds where drvpath like '%k3r71gz0gv16ld8rhcp2bb8gb5w1xc4b%';
QUERY PLAN
-----------------------------------------------------------------------------------------------------------------------------------
Bitmap Heap Scan on builds (cost=128.00..132.01 rows=1 width=492) (actual time=0.070..0.077 rows=1 loops=1)
Recheck Cond: (drvpath ~~ '%k3r71gz0gv16ld8rhcp2bb8gb5w1xc4b%'::text)
-> Bitmap Index Scan on indextrgmbuildsondrvpath (cost=0.00..128.00 rows=1 width=0) (actual time=0.047..0.047 rows=3 loops=1)
Index Cond: (drvpath ~~ '%k3r71gz0gv16ld8rhcp2bb8gb5w1xc4b%'::text)
Total runtime: 0.206 ms
(5 rows)
Currently, a full store path has to be provided to search in
builds. This patch permits to search jobs with a output path or
derivation hash.
Usecase: we are building Docker images with Hydra. The tag of the
Docker image is the hash of the image output path. This patch would
allow us to find back the build job from the tag of a running
container image.
May 15 09:20:10 chef hydra-queue-runner[27523]: Hydra::Plugin::GitlabStatus=HASH(0x519a7b8)->buildFinished: Can't call method "value" on an undefined value at /nix/store/858hinflxcl2jd12wv1r3a8j11ybsf6w-hydra-0.1.2629.89fa829/libexec/hydra/lib/Hydra/Plugin/GitlabStatus.pm line 57.
No more need for a reproduction script! It just says something like
If you have Nix installed, you can reproduce this build on your own
machine by running the following command:
# nix build github:edolstra/dwarffs/09c823e977946668b63ad6c88ed358b48220f124:hydraJobs.build.x86_64-linux
This plugin expects as inputs to a jobset the following:
- gitlab_status_repo => Name of the repository input pointing to that
status updates should be POST'ed, i.e. the jobset has a git input
"nixexprs": "https://gitlab.example.com/project/nixexprs", in which
case "gitlab_status_repo" would be "nixexprs".
- gitlab_project_id => ID of the project in Gitlab, i.e. in the above
case the ID in gitlab of "nixexprs"
The hydra-queue-runner opens a connection to the builder. If the
builder is 'localhost' it starts `nix-store`, otherwise it starts
'ssh'.
Currently, if the hydra-queue-runner can not start `nix-store` (not in
the PATH for instance), the error message is:
cannot connect to ‘localhost’: error: cannot start ssh: No such file
or directory
This is not useful since ssh is actually not started:/
With this patch the error message is now:
cannot connect to ‘localhost’: error: cannot start nix-store: No such file
or directory
Some time ago the data structure for maintainer descriptions in
`nixpkgs` changed from a simple attr set with maintainer emails as
values to an attribute set where the maintainer' nick is associated to
an attribute set with email, GitHub handle and full name.
Hydra can either parse a Nix list or fetches `shortName` from the
associated attribute set (which is used for `meta.licenses` as each
value in it contains a `shortName`). This behavior needs to be
replicated for maintainers to retrieve the emails for `hydra-notify`.
This change is backwards-compatible since `queryMetaStrings` is still
able to understand lists, so old versions of `nixpkgs` or packages using
the old maintainer data structure remain usable.
This is because setting only the initial heap size to more than
the default value (or the configured value) will cause all initial evals
until maxHeapSize expands to the given value to abort.
The 1.1 multiplier comes from the the configured defaults on NixOS' hydra,
and from the previous multiplier used before
7876cf677c.
In order to access protected or private repositories. Using the target
repository URL along with the merge-request ref instead of the source
repository url and branch is necessary to avoid running into issues if
the source repository is not actually accessible to the user Hydra is
authenticating as.
Thanks Alexei Robyn for this patch.
The PathInput input for local paths was previously enhanced to allow
URLs for which it would use a nix-prefetch-url operation. This change
updates the prompt for the declarative input type to indicate this
capability.
When I press "n builds omitted" I get back to the first tab of a jobset.
This is extremely counter-intuitive, instead this notice should link to
the currently opened tab.
The job has been failing since https://hydra.nixos.org/eval/1461286
with the following error:
hydra-eval-jobs.cc:278:17: error: 'evalSettings' was not declared in this scope
evalSettings.restrictEval = true;
^~~~~~~~~~~~
This is likely due to a typo in 0882519 where that line and the
corresponding comment were moved, and `settings` was changed in that
one place to `evalSettings`.
I reproduced the error by running `nix-build release.nix -A
build.x86_64-linux` on my machine, and this small change fixes it.
You can now set 'evaluator_max_heap_size' to make hydra-eval-jobs
restart itself if the Boehm heap exceeds the specified size.
For example, with 'evaluator_max_heap_size = 256000000',
$ hydra-eval-jobs '<nixpkgs/pkgs/top-level/release.nix>' -I nixpkgs=channel:nixos-17.09
has a max RSS of .56 GiB rather than 4.7 GiB.
Unfortunately it doesn't help much for the NixOS jobsets because of
the "tested" job which requires a huge amount of memory all by itself.
This cannot be done in the hydra-evaluator systemd unit, since then
every other Nix process (e.g. hydra-evaluator and nix-prefetch-*) will
also allocate the specified heap size, probably leading to OOM.
This is a good way to make Hydra hang. (E.g. we had a deletion of
nixos:gcc-7 running for > 12 hours and blocking UPDATE statements from
hydra-queue-runner.) Generally it's better to just disable/hide an old
jobset anyway.
Frequently users want Hydra access just to restart jobs. However,
prior to this commit the only way to grant that access was by giving
them full Admin access which isn't necessarily what we want to do.
By having a restart-jobs role, we can grant this privilege to users
who are known to the community and want to help, but aren't long-time
members.
I haven't tested this commit, but it looks good to me...
When using the "build" or "sysbuild" jobset input types in conjunction
with a binary cache store, the evaluator needs to be able to fetch
store paths from the binary cache. Typical usage:
store_uri = s3://nix-test-cache?secret-key=...
eval_substituter = s3://nix-test-cache
Also, the public key of the binary cache must be added to
binary-cache-public-keys in nix.conf, otherwise the local nix-daemon
won't allow the store paths to be copied over.
Also, remove support in hydra-eval-jobs for multiple jobset input
alternatives. The web interface hasn't supported this in a long
time. Thus we can use the regular "--arg" handler.
This makes downloading/viewing build results work with binary cache
stores. For good performance, this should be used in conjunction with
ca580bec35,
i.e. you should set store_uri to something like
s3://my-cache?local-nar-cache=/tmp/nar-cache
to cache NARs between requests.
When creating a Hydra user with the `hydra-create-user` command, you can now
provide a SHA1 password hash with the `--password-hash` flag. This is useful for
the upcoming work on Fully Declarative Hydra, since the end user should not have
to specify plaintext passwords in their `configuration.nix` file.
Thus, we no longer hold the send lock while substituting missing paths
on the build machine. This is a good thing in particular for macOS
builders which have a tendency to hang forever in curl downloads.
Previously, when hydra-queue-runner was restarted, any pending "build
finished" notifications were lost. Now hydra-queue-runner marks
finished but unnotified builds in the database and uses that to run
pending notifications at startup.
The queue runner can now run up to ‘max-concurrent-notifications’ in
parallel (default is 2). This is useful when some hydra-notify
invocations can take a long time to complete (e.g. because they need
to compress a giant build log) and we don't want this to block all
other notifications.
As @dtzWill discovered, with the concurrent hydra-evaluator, there can
be multiple active transactions adding builds to the database. As a
result, builds can become visible in a non-monotonically increasing
order, breaking the queue monitor's assumption that build IDs only go
up.
The fix is to have hydra-eval-jobset provide the lowest build ID it
just added in the builds_added notification, and have the queue
monitor check from there.
Fixes#496.
This plugin will post to the build status system in BitBucket. In order
to use it you need to add to ExtraConfig
<bitbucket>
username = bitbucket_username
password = bitbucket_password
</bitbucket>
You can use an application password https://blog.bitbucket.org/2016/06/06/app-passwords-bitbucket-cloud/
This can take an excessive amount of time. For example, on
hydra.nixos.org, a call to hydra-notify takes 0.7s even if there are
no plugins. So for an eval with ~45K new builds, the calls to
hydra-notify add up to about 9 hours.
The proper fix would be to pass a list of build IDs, or an eval ID.
This can be used with declarative projects to build PRs.
The github_authorization section should contain verbatim Authorization header contents keyed by repo owner for private repos
1. From the hydra configuration file.
The configuration is loaded from the "git-input" block.
Currently only the "timeout" variable is been looked up in the file.
<git-input>
# general timeout
timeout = 400
<input-name>
# specific timeout for a particular input name
timeout = 400
</input-name>
# use quotes when the input name has spaces
<"foot with spaces">
# specific timeout for a particular input name
timeout = 400
</"foo with spaces">
</git-input>
2. As an argument in the input value after the repo url and branch (and after the deepClone if is defined)
"timeout=<value>"
The preference on which value is used:
1. input value
2. Block with the name of the input in the <git-input> block
3. "timeout" inside the <git-input> block
4. Default value of 600 seconds. (original hard-coded value)
The code is generalized for more values to be configured, it might be too much
for a single value on a single plugin.
Adding a 96-core aarch64 build machine to the build farm caused the
potential number of database connections to increase a lot, so we
started hitting the Postgres connection limit.
* The "Jobset" page now shows when evaluations are in progress (rather
than just pending).
* Restored the ability to do a single evaluation from the command line
by doing "hydra-evaluator <project> <jobset>".
* Fix some consistency issues between jobset status in PostgreSQL and
in hydra-evaluator. In particular, "lastCheckedTime" was never
updated internally.
Setting
xxx-jobset-repeats = patchelf:master:2
will cause Hydra to perform every build step in the specified jobset 2
additional times (i.e. 3 times in total). Non-determinism is not fatal
unless the derivation has the attribute "isDeterministic = true"; we
just note the lack of determinism in the Hydra database. This will
allow us to get stats about the (lack of) reproducibility of all of
Nixpkgs.
Builds can now specify the attribute "isDeterministic = true" to tell
Hydra to build with build-repeat > 0. If there is a mismatch between
rounds, the step / build fails with a suitable status.
Maybe this should be a meta attribute, but that makes it invisible to
hydra-queue-runner, and it seems reasonable to make a claim of
mandatory determinism part of the derivation (since e.g. enabling this
flag should trigger a rebuild).
We now take into account the memory necessary for compressing the NAR
being exported to the binary cache, plus xz compression overhead.
Also, we now release the memory tokens for the NAR accessor *after*
releasing the NAR accessor. Previously the memory for the NAR accessor
might still be in use while another thread does an allocation, causing
the maximum to be exceeded temporarily.
Also, use notify_all instead of notify_one to wake up memory token
waiters. This is not very nice, but not every waiter is requesting the
same number of tokens, so some might be able to proceed.
If a step is cancelled just as its builder step is starting,
doBuildStep() will return sRetry. This causes builder() to make the
step runnable again, since the queue monitor may have added new builds
referencing it. The idea is that if the latter condition is not true,
the step's reference count will drop to zero and it will be
deleted. However, if the dispatcher thread sees and locks the step
before the reference count can drop to zero in the builder thread, the
dispatcher thread will start a new builder thread for the step. Thus
the step can be kept alive for an indefinite amount of time.
The fix is for State::builder() to use a weak pointer to the step, to
ensure that the step's reference count can drop to zero *before* it's
added to the runnable queue.
This was a bad idea because pthread_cancel() is unsalvageable broken
in C++. Destructors are not allowed to throw exceptions (especially in
C++11), but pthread_cancel() can cause a __cxxabiv1::__forced_unwind
exception inside any destructor that invokes a cancellation
point. (This exception can be caught but *must* be rethrown.) So let's
just kill the builder process instead.
It was hitting
assert(reservation.unique());
Since we do want the machine reservation to be released before calling
wakeDispatcher(), let's use a different object for keeping track of
active steps.
We now kill active build steps when there are no more referring
builds. This is useful e.g. for preventing cancelled multi-hour TPC-H
benchmark runs from hogging build machines.
If two active steps of the same build failed, then the first would be
marked as "failed", but the second would end up as "orphaned", causing
it to be marked as "aborted" later on. Now it's correctly marked as
"failed".
Without this, if (failed or aborted) derivations have been
garbage-collected, there is no way to restart them, which is very
annoying. Now we set a forceEval flag in the jobset to cause it to be
re-evaluated even if none of the inputs have changed.