The indentation in the hydra.conf makes it possible to include multi-line
strings without it being likely that the contents of the tracker
is mis-parsed or interrupts tho config parser.
It isn't impossible / foolproof probably, but it shouldn't be likely.
Currently we only track how long individual plugins take.
With #1083 we stop executing a lot of plugins, but we
don't have a way to measure its practical impact on the
execution time of handling events.
This happens with flake jobsets for obvious reasons (namely, that nixexprinput
and nixexprpath may be undefined for a flake jobset).
12:38:59 hydra-evaluator.1 | Use of uninitialized value $args[0] in join or string at /home/vin/workspace/vcs/hydra/src/script/hydra-eval-jobset line 648.
12:38:59 hydra-evaluator.1 | Use of uninitialized value $args[1] in join or string at /home/vin/workspace/vcs/hydra/src/script/hydra-eval-jobset line 648.
11:38:20 hydra-server.1 | DEPRECATION WARNING: The Regex dispatch type is deprecated.
11:38:20 hydra-server.1 | It is recommended that you convert Regex and LocalRegex
11:38:20 hydra-server.1 | methods to Chained methods. at /nix/store/aa6gw57fnahd4824pbhmvcs0jlypmynq-hydra-perl-deps/lib/perl5/site_perl/5.32.1/Catalyst/DispatchType/Regex.pm line 210.
12:34:12 hydra-server.1 | Use of uninitialized value $s in substitution (s///) at /home/vin/workspace/vcs/hydra/src/script/../lib/Hydra/Helper/CatalystUtils.pm line 283, <$fh> line 1.
11:28:20 hydra-server.1 | [warn] Unicode::Encoding plugin is auto-applied, please remove this from your appclass and make sure to define "encoding" config
12:10:15 hydra-notify.1 | %channels_to_events{...} in scalar context better written as $channels_to_events{...} at /home/vin/workspace/vcs/hydra/src/lib/Hydra/Event.pm line 20.
At the moment, aggregate jobs can easily break and cause the entire
evaluation to fail, which is not ideal. For Nixpkgs, we do have some
important aggregate jobs (like `tested`), but for debugging and building
purposes it's still useful to get a partial result even if the channel
won't actually advance.
This commit changes the behaviour of hydra-eval-jobs such that it
aggregates any errors found during the construction of an aggregate, and
will instead annotate the job with the evaluation failure such that it
shows up in a "cleaner" way.
There are really two types of failure that we care about: one is where
the attribute just ends up missing altogether in the final output, and
also where the attribute is in the output but fails to evaluate. Both
are handled here.
Note that this does mean that the same error message may be output
multiple times, but this aids debuggability because it'll be much
clearer what's blocking the job from being created.
Fix parsing breakage from #1003: assigning the lines to $lines broke chomp and the filters.
This test validates the parsing works as expected, and also fixes
a minor bug where '-' in features isn't pruned, like in the C++
repo.
Fixes
mv: cannot move './static/bootstrap-4.3.1-dist' to './static/bootstrap/bootstrap-4.3.1-dist': Directory not empty
when 'make' is called more than once.
When I take a look at *all* failing builds (by clicking at `[...] more
jobs omitted`) and I try to compare the failures to another jobset, I'd
like to still view *all* failing builds in the compare-view.
This wasn't the case before since the `full=`-param was ignored by the
compare-buttons.
Yet again, manual testing is proving to be insufficient. I'm pretty
sure I wrote this code but lost it in a rebase, or perhaps the switch
to result classes.
At any rate, this implements the actual "fetch a retry row and run it"
for the hydra-notify daemon.
Tested by hand.
Get the number of seconds before the next retriable task is ready.
This number is specifically intended to be used as a timeout, where
`undef` means never time out.
This gives us a place to put helper functions that act on entire
tables, not just individual records.
This should be a backwards compatible change, except in places we're
manually using result class names.
Declarative jobsets were sort of tucked in to the event hanlder
itself. It turned out that it could have been implemented as a
plugin without much trouble.
Without this commit, two jobsets using the same repository as input,
but different `deepClone` options, end up incorrectly sharing the same
"checkout" for a given (`uri`, `branch`, `revision`) tuple. The
presence or absence of `.git` is determined by the jobset execution
order.
This patch adds the missing `isDeepClone` boolean to the cache key.
The database upgrade script empties the `CachedGitInputs` table, as we
don't know if existing checkouts are deep clones. Unfortunately, this
generally forces rebuilds even for correct `deepClone` checkouts, as
the binary contents of `.git` are not deterministic.
Fixes#510
I broke this when I added `me.` in f1e75c8bff
I added me. to disambiguate `id`, but:
* eval.id works on the per-build page
* me.id works on the other pages
* Just id works everywhere if I drop:
, prefetch => { evaluationerror => [ ] },
but this causes a query per row to collect the evaluationerror
records later, this becomes significantly slow on non-trivial
datasets.
Using evals->current_source_alias will use the correct alias
whether it is me or eval or something else.
Exposes metrics:
* http_request_duration_seconds_bucket
* http_request_size_bytes_bucket
* http_response_size_bytes_bucket
* http_requests_total
with labels of action and controller to help identify popular
endpoints and their performance characteristics.
If the project isn't declarative, who cares about it in the response? After
setting the `declfile` to an empty string, everything related to declarative-
ness is wiped out, anyways.
It appears the Jobs table was removed in
8adb433e3b, but the Jobsets schema was never
updated to reflect this. This relationship was added in
efa1f1d4fb, roughly 3 months prior.
Previously, one would see a message similar to the following logged when
deleting a jobset:
17:38:23 hydra-server.1 | DBIx::Class::Relationship::CascadeActions::delete(): Skipping cascade delete on relationship 'jobs' - related resultsource 'Hydra::Schema::Jobs' is not registered with this schema at /home/vin/workspace/vcs/hydra/src/script/../lib/Hydra/Controller/Jobset.pm line 106
Something in the upgrade of Bootstrap and JQuery broke lazy tab loading.
I don't understand what is providing the tab behavior, how it should
work, or what the correct fix is.
I can tell you that this patch fixes the issue: when loading a tab
with a URL fragment deep-linking to a lazily loaded tab... it now
loads.
Close#959
This appears to have been broken in ac3e8a4a59,
which removed the `jobsetevals` column from the Projects schema, but didn't
update the Controller accordingly.
Fixes the test added in the previous commit.
To further align with the API, we return custom JSON in order to display a
`visible` field rather than `hidden` -- a `PUT` request expects `visible`, while
a `GET` request returns `hidden`.
This also allows us to rename the `jobsetinputs` field to `inputs` for the same
reason: `PUT` expects `inputs`, while `GET` returns `jobsetinputs`.
`PUT /jobsets/{project-id}/{jobset-id}` expects a JSON object `inputs` which
maps a name to a name, a type, a value, and a boolean that enables emailing
responsible parties. However, `GET /jobsets/{project-id}/{jobset-id}` responds
with an object that doesn't contain a value, but does contain a jobsetinputalts
(which is old and should be unused).
This commit aligns the two by removing the old and unused `jobsetinputalts` from
the response and replaces it with `value`.
* made all columns available via the API (except for forceeval)
* renamed flakeref to flake to unify the API with the database schema
* renamed inputs to jobsetinputs to unify the API with the database schema
The checkbox is only enabled if `email_notification = 1` is set in
`hydra.conf`. However, when creating jobset (in contrast to the edit
form), the checkbox is always disabled because the `emailNotification`
parameter in Catalyst's stash was missing.
Passwords that are sha1 will be transparently upgraded to argon2,
and future comparisons will use Argon2
Co-authored-by: Graham Christensen <graham@grahamc.com>
The default password comparison logic does not use
constant time validation. Switching to constant time
offers a meager improvement by removing a timing
oracle.
A prepatory step in moving to Argon2id password storage, since we'll need this change anyway after
for validating existing passwords.
Co-authored-by: Graham Christensen <graham@grahamc.com>
Some time in the last decade the plugin switched to preferring
a flatter namespace for realm config.
Co-authored-by: Graham Christensen <graham@grahamc.com>
In Nix the protocol was slightly altered[1] to also contain more
information about realisations. This however wasn't read from the pipe
that was used to read from the store.
After the `cmdBuildDerivation` command which caused this issue, Hydra
will issue a `cmdQueryPathInfos` that tries to read from the remote
store as well. However, there's still left over to read from the
previous command and thus Nix fails to properly allocate the expected
string.
[1] See rev a2b69660a9b326b95d48bd222993c5225bbd5b5f
Fixes#898
Co-authored-by: Graham Christensen <graham@grahamc.com>
... but just fixing up merge conflicts from the introduction of flakes
and the removal of the Jobs table.
This is a breaking change. Previously, packages named `packageset.foo`
would be exposed in the fake derivation channel as `packageset-foo`.
Presumably this was done to avoid needing to track attribute sets, and
to avoid the complexity. I think this now correctly handles the
complexity and properly mirrors the input expressions layout.
Previously, the build ID would never flow through channels which
exited.
This patch tracks the buildOne state as part of State and exits avoids
waiting forever for new work.
The code around buildOnly is a bit rough, making this a bit weird to
implement but since it is only used for testing the value of improving
it on its own is a bit questionable.
A reproduce script includes a logline that may resemble:
> using these flags: --arg nixpkgs { outPath = /tmp/build-137689173/nixpkgs/source; rev = "fdc872fa200a32456f12cc849d33b1fdbd6a933c"; shortRev = "fdc872f"; revCount = 273100; } -I nixpkgs=/tmp/build-137689173/nixpkgs/source --arg officialRelease false --option extra-binary-caches https://hydra.nixos.org/ --option system x86_64-linux /tmp/build-137689173/nixpkgs/source/pkgs/top-level/release.nix -A
These are passed along to nix-build and that's fine and dandy, but you can't just copy-paste this as is, as the `{}` introduces a syntax error and the value accompanying `-A` is `''`.
A very naive approach is to just `printf "%q"` the individual args, which makes them safe to copy-paste. Unfortunately, this looks awful due to the liberal usage of slashes:
```
$ printf "%q" '{ outPath = /tmp/build-137689173/nixpkgs/source; rev = "fdc872fa200a32456f12cc849d33b1fdbd6a933c"; shortRev = "fdc872f"; revCount = 273100; }'
\{\ outPath\ =\ /tmp/build-137689173/nixpkgs/source\;\ rev\ =\ \"fdc872fa200a32456f12cc849d33b1fdbd6a933c\"\;\ shortRev\ =\ \"fdc872f\"\;\ revCount\ =\ 273100\;\ \}
```
Alternatively, if we just use `set -x` before we execute nix-build, we'll get the whole invocation in a friendly, copy-pastable format that nicely displays `{}`-enclosed content and preserves the empty arg following `-A`:
```
running nix-build...
using this invocation:
+ nix-build --arg nixpkgs '{ outPath = /tmp/build-138165173/nixpkgs/source; rev = "e0e4484f2c028d2269f5ebad0660a51bbe46caa4"; shortRev = "e0e4484"; revCount = 274008; }' -I nixpkgs=/tmp/build-138165173/nixpkgs/source --arg officialRelease false --option extra-binary-caches https://hydra.nixos.org/ --option system x86_64-linux /tmp/build-138165173/nixpkgs/source/pkgs/top-level/release.nix -A ''
```
The queue runner used to special-case `localhost` as a remote builder:
Rather than using the normal remote-build (using the
`cmdBuildDerivation` command), it was using the (generally less
efficient, except when running against localhost) `cmdBuildPaths`
command because the latter didn't require a privileged Nix user (so made
testing easier − allowing to run hydra in a container in particular).
However:
1. this means that the build loop can follow two discint code paths depending
on the setup, the irony being that the most commonly used one in production
(the “non-localhost” case) isn't the one used in the testsuite (because all
the tests run against a local store);
2. It turns out that the “localhost” version is buggy in relatively obvious
ways − in particular a failure in a fixed-output derivation or a hash
mismatch isn't reported properly;
3. If the “run in a container” use-case is indeed that important, it can be
(partially) restored using a chroot store (which wouldn't behave excactly
the same way of course, but would be more than good-enough for testing)
The current check happening in jobsets is incorrect.
The wanted constraint is stated as follow :
- If type is 0 (legacy), then the flake field should be null, and
both nixExprInput and nixExprPath should be non-null
- If type is 1 (flake), then the flake field should be non-null, and
both nixExprInput and nixExprPath should be null
The current version will not catch (i.e. it will accept) situations
where you have for instance :
type = 1, nixExprPath null, nixExprInput non-null, flake non-null
This commit fixes that.
I split(ted) that into two constraints, to make it more readable and
easier to extend if a new type appears in the future.
The complete query could be instead :
( type = 0
AND nixExprInput IS NOT NULL AND nixExprPath IS NOT NULL AND flake IS NULL )
OR ( type = 1
AND nixExprInput IS NULL AND nixExprPath IS NULL AND flake IS NOT NULL )
(but an "OR" cannot be split, hence the other formulation)
DBIx likes to eagerly select all columns without a way to really tell
it so. Therefore, this splits this one large column in to its own
table.
I'd also like to make "jobsets" use this table too, but that is on hold
to stop the bleeding caused by the extreme amount of traffic this is
causing.
The database has these constraints:
check ((type = 0) = (nixExprInput is not null and nixExprPath is not null)),
check ((type = 1) = (flake is not null)),
which prevented switching to flakes in a declarative jobspec, since the
nixexpr{path,input} fields were not nulled in such an update
Co-Authored-By: Graham Christensen <graham@grahamc.com>
This search query is pretty heavy. Defaulting to 500 has caused
Hydra's web UI to appear to be down. Since 500 can take it down, users
probably shouldn't be allowed t ask for that many.
Duplicating this data on every record of the builds table cost
approximately 4G of duplication.
Note that the database migration included took about 4h45m on an
untuned server which uses very slow rotational disks in a RAID5 setup,
with not a lot of RAM. I imagine in production it might take an hour
or two, but not 4. If this should become a chunked migration, I can do
that.
Note: Because of the question about chunked migrations, I have NOT
YET tested this migration thoroughly enough for merge.
Looking at AWS' Performance Insights for a Hydra instance, I found
the hydra-queue-runner's query:
select id, buildStatus, releaseName, closureSize, size
from Builds b
join BuildOutputs o on b.id = o.build
where
finished = ?
and (buildStatus = ? or buildStatus = ?)
and path = $1
was the slowest query by at least 10x. Running an explain on this
showed why:
hydra=> explain select id, buildStatus, releaseName, closureSize, size
from Builds b join BuildOutputs o on b.id = o.build where
finished = 1 and (buildStatus = 0 or buildStatus = 6) and
path = '/nix/store/s93khs2dncf2cy273mbyr4fb4ns3db20-MIDIVisualizer-5.1';
QUERY PLAN
------------------------------------------------------------------------
Gather (cost=1000.43..33718.98 rows=2 width=56)
Workers Planned: 2
-> Nested Loop (cost=0.43..32718.78 rows=1 width=56)
-> Parallel Seq Scan on buildoutputs o (cost=0.00..32710.32
rows=1
width=4)
Filter: (path = '/nix/store/s93kh...snip...'::text)
-> Index Scan using indexbuildsonjobsetidfinishedid on builds b
(cost=0.43..8.45 rows=1 width=56)
Index Cond: ((id = o.build) AND (finished = 1))
Filter: ((buildstatus = 0) OR (buildstatus = 6))
(8 rows)
A paralell sequential scan is definitely better than a sequential scan, but the
cost ranging from 0 to 32710 is not great. Looking at the table, I saw the `path`
column is completely unindex:
hydra=> \d buildoutputs
Table "public.buildoutputs"
Column | Type | Collation | Nullable | Default
--------+---------+-----------+----------+---------
build | integer | | not null |
name | text | | not null |
path | text | | not null |
Indexes:
"buildoutputs_pkey" PRIMARY KEY, btree (build, name)
Foreign-key constraints:
"buildoutputs_build_fkey" FOREIGN KEY (build) REFERENCES builds(id)
ON DELETE CASCADE
Since we always do exact matches on the path and don't care about ordering,
and since the path column is very high cardinality a `hash` index is a
good candidate. Note that I did test a btree index and it performed
similarly well, but slightly worse.
After creating the index (this took about 10 seconds) on a test database:
create index IndexBuildOutputsPath on BuildOutputs using hash(path);
We get a *significantly* reduced cost:
hydra=> explain select id, buildStatus, releaseName, closureSize, size
hydra-> from Builds b join BuildOutputs o on b.id = o.build where
hydra-> finished = 1 and (buildStatus = 0 or buildStatus = 6) and
hydra-> path = '/nix/store/s93khs2dncf2cy273mbyr4fb4ns3db20-MIDIVisualizer-5.1';
QUERY PLAN
-------------------------------------------------------------------------------------------------------
Nested Loop (cost=0.43..41.41 rows=2 width=56)
-> Index Scan using buildoutputs_path_hash on buildoutputs o (cost=0.00..16.05 rows=3 width=4)
Index Cond: (path = '/nix/store/s93khs2dncf2cy273mbyr4fb4ns3db20-MIDIVisualizer-5.1'::text)
-> Index Scan using indexbuildsonjobsetidfinishedid on builds b (cost=0.43..8.45 rows=1 width=56)
Index Cond: ((id = o.build) AND (finished = 1))
Filter: ((buildstatus = 0) OR (buildstatus = 6))
(6 rows)
For direct comparison, the overall query plan was changed:
From: Gather (cost=1000.43..33718.98 rows=2 width=56)
To: Nested Loop (cost= 0.43.....41.41 rows=2 width=56)
and the query plan for buildoutputs changed from a maximum cost of
32,710 down to 16.
In practical terms, the query's planning and execution time was reduced:
Before (ms) | Try 1 | Try 2 | Try 3
------------+---------+---------+--------
Planning | 0.898 | 0.416 | 0.383
Execution | 138.644 | 172.331 | 375.585
After (ms) | Try 1 | Try 2 | Try 3
------------+---------+---------+--------
Planning | 0.298 | 0.290 | 0.296
Execution | 219.625 | 0.035 | 0.034
Requires the following configuration options
enable_github_login = 1
github_client_id
github_client_secret
Or github_client_secret_file which points to a file with the secret
Fixes this error:
ERROR: failed to process declarative jobset test:inputs,
DBIx::Class::Storage::DBI::_dbh_execute(): DBI Exception: DBD::Pg::st
execute failed: ERROR: null value in column "emailoverride" violates
not-null constraint
This would start happening if the network connection between the Hydra
server and the remote build server breaks after sucessfully importing
at least one output of a derivation, but before having finished
importing all outputs.
Fixes#816.
These make the hydra-queue-runner logs very noisy even when not using the GitlabStatus plugin.
Also, they shouldn't be necessary except when developing the plugin itself and should have been removed before release.
It might happen that a job from the aggregate returned an error!
This is what the vague "[json.exception.type_error.302] type must be string, but is null"
was all about in this instance; there was no `drvPath` to stringify!
So we now actively watch for errors and copy them to the aggregate job.
The vague "[json.exception.type_error.302] type must be string, but is null"
is **absolutely** unhelpful in the way Hydra currently handles it on
evaluation.
This is handling *unexpected* errors only; the following commit will
handle the specific instance of the previously mentioned error.
Recently a few internal APIs have changed[1]. The `outputPaths` function
has been removed and a lot of data structures are modeled with
`std::optional` which broke compilation.
This patch updates the code in `hydra-queue-runner` accordingly to make
sure that Hydra compiles again.
[1] https://github.com/NixOS/nix/pull/3883
With the current implementation, if ANY hash was found inside the decl
spec, the spec would be treated as static. This is problematic since
`inputs` is a hash and hence any configuration would be handled as a
static one.
This fixes the code to match the documentation and only switch to static
processing when ALL values are hashes.
As of https://github.com/NixOS/hydra/pull/737 (removal of sqlite
dependency), the only supported database is Postgresql.
This change removes all references to hydra-postgresql.sql file. This
file is generated using a cpp on hydra.sql, but doesn't differ from
hydra.sql at all.
PathInput plugin keeps a cache of path evaluations. This cache is simple, and
path is not checked more than once every N seconds, where N=30. The caching is
there to avoid expensive calls to `nix-store --add`.
This change makes the validity period configurable. The main use case is
`api-test.pl` which was implemented wrong for a while, as the invocation of
`hydra-eval-jobset` would return the previous evaluation, claiming there are no
changes. The test has been fixed to check better for a new evaluation.
`build_finished` Postgres event will never be fired for the dependent builds.
For example, on our Hydra, the following query always returns increasing
numbers, even though all notifications have been delivered:
```
hydra=> select count(1) from builds where notificationpendingsince is not null;
count
-------
4583
(1 row)
```
Thus, we have to iterate over all dependent builds and mark their
`notificationpendingsince` as `null`, otherwise they will pile up until
the next restart of hydra-notify, when they will get delivered.
When deploying Hydra different than hydra.nixos.org one may encounter a problem
as building any job that uses IFD fails with:
May 22 19:41:07 hydra hydra-evaluator[6960]: error: "attempted to realize '/nix/store/1jm02mfiv58rpy8zrx95cpqxzsp64ssh-source.drv' during evaluation but 'allow-import-from-derivation' is false"
May 22 19:41:07 hydra hydra-evaluator[6960]: error: "attempted to realize '/nix/store/av3jr8ix4qcadq2wm3y3hplvxwzlhl4y-source.drv' during evaluation but 'allow-import-from-derivation' is false"
May 22 19:41:07 hydra hydra-evaluator[6960]: error: "attempted to realize
'/nix/store/2jm02mfiv58rpy8zrx95cpqxzsp64ssh-source.drv' during evaluation but
'allow-import-from-derivation' is false"
The recent change enforced passing `--no-allow-import-from-derivation`
to `hydra-eval-job` unconditionally. This change makes it configurable and
defaults to **NOT PASSING IT** -- most of the deployments allow IFDs.
The configuration option is called `allow_import_from_derivation` and
defaults to `true`. It is interpreted as a boolean, with only true option being
`true`.