If we don't see machine that supports a build step for
'max_unsupported_time' seconds, the step is aborted. The default is 0,
which is appropriate for Hydra installations that don't provision
missing machines dynamically.
When I browse failed builds in a jobset-eval on Hydra, I regularly
mistake actual build-failures with temporary issues like timeouts (that
probably disappear at the next eval).
To prevent this kind of issue, I figured that using the stopsign-svg for
builds with timeouts or exceeded log-limits is a reasonable choice for
the following reasons:
* A user can now distinguish between actual build-errors (like
compilation-failures or oversized outputs) and (usually) temporary issues
(like a bloated log or a timeout).
* The stopsign is also used for aborted jobs that are shown in a
different tab and can't be confused with timeouts for that reason.
Declarative jobsets were broken by the Nix update, causing
nix cat-file to break silently.
This commit restores declarative jobsets, based on top of a commit
making it easier to see what broke.
In the past, jobsets which are automatically evaluated are evaluated
regularly, on a schedule. This schedule means a new evaluation is
created every checkInterval seconds (assuming something changed.)
This model works well for architectures where our build farm can
easily keep up with demand.
This commit adds a new type of evaluation, called ONE_AT_A_TIME, which
only schedules a new evaluation if the previous evaluation of the
jobset has no unfinished builds.
This model of evaluation lets us have 'low-tier' architectures.
For example, we could now have a jobset for ARMv7l builds, where
the buildfarm only has a single, underpowered ARMv7l builder.
Configuring that jobset as ONE_AT_A_TIME will create an evaluation
and then won't schedule another evaluation until every job of
the existing evaluation is complete.
This way, the cache will have a complete collection of pre-built
software for some commits, but the underpowered architecture will
never become backlogged in ancient revisions.
A postgresql column which is non-null and unique is treated with
the same optimisations as a primary key, so we have no need to
try and recreate the `id` as the primary key.
No read paths are impacted by this change, and the database will
automatically create an ID for each insert. Thus, no code needs to
change.
hydra.nixos.org is already running this rev, and it should be safe to
apply to everyone else. If we make changes to this migration, we'll
need to write another migration anyway.
Lowercasing is due to postgresql not having case-sensitive table names.
It always technically workde before, but those table names never
existed literally.
The switch to generating from postgresql is to handle an upcoming
addition of an auto-incrementign ID to the Jobset table. Sqlite doesn't
seem to be able to handle the table having an auto incrementing ID
field which isn't the primary key, but we can't change the primary
key trivially.
Since hydra doesn't support sqlite and hasn't for many year anyway,
it is easier to just generate from pgsql directly.
Building on macOS with the latest nixpkgs master and NixOS/nixpkgs#77147
fails. It seems some `std::experimental` (optional) for instance are
not available as `experimental`, but are in `std`. Also `toJSON` is
missing for `atomic< unsigned long long >`.
In a NixOS container, cmdBuildDerivation doesn't work because we're
not privileged. But we also don't need it because the store already
has the derivation.
Also, don't copy from/to the store since this gives errors about
missing signatures.
This attribute allows to know if an error occurred or not: when an
error occurs, errormsg is not an empty string. Note we can not use the
errormsg attribute because it can be arbitrarily long and is excluded
from the jobset API response.
This adds the following (pre-existing) attributes to the jobset response:
- nrtotal
- lastcheckedtime
- starttime
- checkinterval
- triggertime
- fetcherrormsg
- errortime
May 15 09:20:10 chef hydra-queue-runner[27523]: Hydra::Plugin::GitlabStatus=HASH(0x519a7b8)->buildFinished: Can't call method "value" on an undefined value at /nix/store/858hinflxcl2jd12wv1r3a8j11ybsf6w-hydra-0.1.2629.89fa829/libexec/hydra/lib/Hydra/Plugin/GitlabStatus.pm line 57.
(cherry picked from commit 438ddf5289)
Plugins are now disabled at startup time unless there is some relevant
configuration in hydra.conf. This avoids hydra-notify having to do a
lot of redundant work (a lot of plugins did a lot of database queries
*before* deciding they were disabled).
Note: BitBucketStatus users will need to add 'enable_bitbucket_status
= 1' to hydra.conf.
* 'eval_started' has the format '<tmpId>\t<project>\t<jobset>'.
* 'eval_failed' has the format '<tmpId>'. (The cause of the error can
be found in the database.)
* 'eval_added' has the format '<tmpId>:<evalId>'.
It now receives notifications about started/finished builds/steps via
PostgreSQL. This gets rid of the (substantial) overhead of starting
hydra-notify for every event. It also allows other programs (even on
other machines) to listen to Hydra notifications.
This adds a `InfluxDBNotification` plugin which is configured as:
```
<influxdb>
url = http://127.0.0.1:8086
db = hydra
</influxdb>
```
which will write a notification for every finished job to the
configured database in InfluxDB looking like:
```
hydra_build_status,cached=false,job=job,jobset=default,project=sample,repo=default,result=success,status=success,system=x86_64-linux build_id="1",build_status=0i,closure_size=584i,duration=0i,main_build_id="1",queued=0i,size=168i 1564156212
```
The creation of the `pg_trgm` extension needs superuser power. So,
this patch makes the extension creation in the Hydra NixOS module when
a local database is used.
If it is not possible to create this extension (remote database for
instance with nosuperuser), the creation of the `pg_trgm` index is
skipped (this index speedup queries on builds.drvpath) and warnings
are emitted:
initialising the Hydra database schema...
WARNING: Can not create extension pg_trgm: permission denied to create extension "pg_trgm"
WARNING: HINT: Temporary provide superuser role to your Hydra Postgresql user and run the script src/sql/upgrade-57.sql
WARNING: The pg_trgm index on builds.drvpath has been skipped (slower complex queries on builds.drvpath)
This allows to keep smooth migrations: the migration process doesn't
require a manual step (but this manual step is recommended on big
remote databases).
The search query uses the LIKE operator which requires a sequential
scan (it can't use the already existing B-tree index). This new
index (trigram) avoids a sequential scan of the builds table when the
LIKE operator is used.
Here is the analyze of a request on the builds table with this index:
explain analyze select * from builds where drvpath like '%k3r71gz0gv16ld8rhcp2bb8gb5w1xc4b%';
QUERY PLAN
-----------------------------------------------------------------------------------------------------------------------------------
Bitmap Heap Scan on builds (cost=128.00..132.01 rows=1 width=492) (actual time=0.070..0.077 rows=1 loops=1)
Recheck Cond: (drvpath ~~ '%k3r71gz0gv16ld8rhcp2bb8gb5w1xc4b%'::text)
-> Bitmap Index Scan on indextrgmbuildsondrvpath (cost=0.00..128.00 rows=1 width=0) (actual time=0.047..0.047 rows=3 loops=1)
Index Cond: (drvpath ~~ '%k3r71gz0gv16ld8rhcp2bb8gb5w1xc4b%'::text)
Total runtime: 0.206 ms
(5 rows)
Currently, a full store path has to be provided to search in
builds. This patch permits to search jobs with a output path or
derivation hash.
Usecase: we are building Docker images with Hydra. The tag of the
Docker image is the hash of the image output path. This patch would
allow us to find back the build job from the tag of a running
container image.
May 15 09:20:10 chef hydra-queue-runner[27523]: Hydra::Plugin::GitlabStatus=HASH(0x519a7b8)->buildFinished: Can't call method "value" on an undefined value at /nix/store/858hinflxcl2jd12wv1r3a8j11ybsf6w-hydra-0.1.2629.89fa829/libexec/hydra/lib/Hydra/Plugin/GitlabStatus.pm line 57.
No more need for a reproduction script! It just says something like
If you have Nix installed, you can reproduce this build on your own
machine by running the following command:
# nix build github:edolstra/dwarffs/09c823e977946668b63ad6c88ed358b48220f124:hydraJobs.build.x86_64-linux
This plugin expects as inputs to a jobset the following:
- gitlab_status_repo => Name of the repository input pointing to that
status updates should be POST'ed, i.e. the jobset has a git input
"nixexprs": "https://gitlab.example.com/project/nixexprs", in which
case "gitlab_status_repo" would be "nixexprs".
- gitlab_project_id => ID of the project in Gitlab, i.e. in the above
case the ID in gitlab of "nixexprs"
The hydra-queue-runner opens a connection to the builder. If the
builder is 'localhost' it starts `nix-store`, otherwise it starts
'ssh'.
Currently, if the hydra-queue-runner can not start `nix-store` (not in
the PATH for instance), the error message is:
cannot connect to ‘localhost’: error: cannot start ssh: No such file
or directory
This is not useful since ssh is actually not started:/
With this patch the error message is now:
cannot connect to ‘localhost’: error: cannot start nix-store: No such file
or directory
Some time ago the data structure for maintainer descriptions in
`nixpkgs` changed from a simple attr set with maintainer emails as
values to an attribute set where the maintainer' nick is associated to
an attribute set with email, GitHub handle and full name.
Hydra can either parse a Nix list or fetches `shortName` from the
associated attribute set (which is used for `meta.licenses` as each
value in it contains a `shortName`). This behavior needs to be
replicated for maintainers to retrieve the emails for `hydra-notify`.
This change is backwards-compatible since `queryMetaStrings` is still
able to understand lists, so old versions of `nixpkgs` or packages using
the old maintainer data structure remain usable.
This is because setting only the initial heap size to more than
the default value (or the configured value) will cause all initial evals
until maxHeapSize expands to the given value to abort.
The 1.1 multiplier comes from the the configured defaults on NixOS' hydra,
and from the previous multiplier used before
7876cf677c.
In order to access protected or private repositories. Using the target
repository URL along with the merge-request ref instead of the source
repository url and branch is necessary to avoid running into issues if
the source repository is not actually accessible to the user Hydra is
authenticating as.
Thanks Alexei Robyn for this patch.
The PathInput input for local paths was previously enhanced to allow
URLs for which it would use a nix-prefetch-url operation. This change
updates the prompt for the declarative input type to indicate this
capability.
When I press "n builds omitted" I get back to the first tab of a jobset.
This is extremely counter-intuitive, instead this notice should link to
the currently opened tab.
The job has been failing since https://hydra.nixos.org/eval/1461286
with the following error:
hydra-eval-jobs.cc:278:17: error: 'evalSettings' was not declared in this scope
evalSettings.restrictEval = true;
^~~~~~~~~~~~
This is likely due to a typo in 0882519 where that line and the
corresponding comment were moved, and `settings` was changed in that
one place to `evalSettings`.
I reproduced the error by running `nix-build release.nix -A
build.x86_64-linux` on my machine, and this small change fixes it.
You can now set 'evaluator_max_heap_size' to make hydra-eval-jobs
restart itself if the Boehm heap exceeds the specified size.
For example, with 'evaluator_max_heap_size = 256000000',
$ hydra-eval-jobs '<nixpkgs/pkgs/top-level/release.nix>' -I nixpkgs=channel:nixos-17.09
has a max RSS of .56 GiB rather than 4.7 GiB.
Unfortunately it doesn't help much for the NixOS jobsets because of
the "tested" job which requires a huge amount of memory all by itself.
This cannot be done in the hydra-evaluator systemd unit, since then
every other Nix process (e.g. hydra-evaluator and nix-prefetch-*) will
also allocate the specified heap size, probably leading to OOM.
This is a good way to make Hydra hang. (E.g. we had a deletion of
nixos:gcc-7 running for > 12 hours and blocking UPDATE statements from
hydra-queue-runner.) Generally it's better to just disable/hide an old
jobset anyway.
Frequently users want Hydra access just to restart jobs. However,
prior to this commit the only way to grant that access was by giving
them full Admin access which isn't necessarily what we want to do.
By having a restart-jobs role, we can grant this privilege to users
who are known to the community and want to help, but aren't long-time
members.
I haven't tested this commit, but it looks good to me...
When using the "build" or "sysbuild" jobset input types in conjunction
with a binary cache store, the evaluator needs to be able to fetch
store paths from the binary cache. Typical usage:
store_uri = s3://nix-test-cache?secret-key=...
eval_substituter = s3://nix-test-cache
Also, the public key of the binary cache must be added to
binary-cache-public-keys in nix.conf, otherwise the local nix-daemon
won't allow the store paths to be copied over.
Also, remove support in hydra-eval-jobs for multiple jobset input
alternatives. The web interface hasn't supported this in a long
time. Thus we can use the regular "--arg" handler.
This makes downloading/viewing build results work with binary cache
stores. For good performance, this should be used in conjunction with
ca580bec35,
i.e. you should set store_uri to something like
s3://my-cache?local-nar-cache=/tmp/nar-cache
to cache NARs between requests.
When creating a Hydra user with the `hydra-create-user` command, you can now
provide a SHA1 password hash with the `--password-hash` flag. This is useful for
the upcoming work on Fully Declarative Hydra, since the end user should not have
to specify plaintext passwords in their `configuration.nix` file.
Thus, we no longer hold the send lock while substituting missing paths
on the build machine. This is a good thing in particular for macOS
builders which have a tendency to hang forever in curl downloads.
Previously, when hydra-queue-runner was restarted, any pending "build
finished" notifications were lost. Now hydra-queue-runner marks
finished but unnotified builds in the database and uses that to run
pending notifications at startup.
The queue runner can now run up to ‘max-concurrent-notifications’ in
parallel (default is 2). This is useful when some hydra-notify
invocations can take a long time to complete (e.g. because they need
to compress a giant build log) and we don't want this to block all
other notifications.
As @dtzWill discovered, with the concurrent hydra-evaluator, there can
be multiple active transactions adding builds to the database. As a
result, builds can become visible in a non-monotonically increasing
order, breaking the queue monitor's assumption that build IDs only go
up.
The fix is to have hydra-eval-jobset provide the lowest build ID it
just added in the builds_added notification, and have the queue
monitor check from there.
Fixes#496.
This plugin will post to the build status system in BitBucket. In order
to use it you need to add to ExtraConfig
<bitbucket>
username = bitbucket_username
password = bitbucket_password
</bitbucket>
You can use an application password https://blog.bitbucket.org/2016/06/06/app-passwords-bitbucket-cloud/
This can take an excessive amount of time. For example, on
hydra.nixos.org, a call to hydra-notify takes 0.7s even if there are
no plugins. So for an eval with ~45K new builds, the calls to
hydra-notify add up to about 9 hours.
The proper fix would be to pass a list of build IDs, or an eval ID.
This can be used with declarative projects to build PRs.
The github_authorization section should contain verbatim Authorization header contents keyed by repo owner for private repos
1. From the hydra configuration file.
The configuration is loaded from the "git-input" block.
Currently only the "timeout" variable is been looked up in the file.
<git-input>
# general timeout
timeout = 400
<input-name>
# specific timeout for a particular input name
timeout = 400
</input-name>
# use quotes when the input name has spaces
<"foot with spaces">
# specific timeout for a particular input name
timeout = 400
</"foo with spaces">
</git-input>
2. As an argument in the input value after the repo url and branch (and after the deepClone if is defined)
"timeout=<value>"
The preference on which value is used:
1. input value
2. Block with the name of the input in the <git-input> block
3. "timeout" inside the <git-input> block
4. Default value of 600 seconds. (original hard-coded value)
The code is generalized for more values to be configured, it might be too much
for a single value on a single plugin.
Adding a 96-core aarch64 build machine to the build farm caused the
potential number of database connections to increase a lot, so we
started hitting the Postgres connection limit.
* The "Jobset" page now shows when evaluations are in progress (rather
than just pending).
* Restored the ability to do a single evaluation from the command line
by doing "hydra-evaluator <project> <jobset>".
* Fix some consistency issues between jobset status in PostgreSQL and
in hydra-evaluator. In particular, "lastCheckedTime" was never
updated internally.
Setting
xxx-jobset-repeats = patchelf:master:2
will cause Hydra to perform every build step in the specified jobset 2
additional times (i.e. 3 times in total). Non-determinism is not fatal
unless the derivation has the attribute "isDeterministic = true"; we
just note the lack of determinism in the Hydra database. This will
allow us to get stats about the (lack of) reproducibility of all of
Nixpkgs.
Builds can now specify the attribute "isDeterministic = true" to tell
Hydra to build with build-repeat > 0. If there is a mismatch between
rounds, the step / build fails with a suitable status.
Maybe this should be a meta attribute, but that makes it invisible to
hydra-queue-runner, and it seems reasonable to make a claim of
mandatory determinism part of the derivation (since e.g. enabling this
flag should trigger a rebuild).
We now take into account the memory necessary for compressing the NAR
being exported to the binary cache, plus xz compression overhead.
Also, we now release the memory tokens for the NAR accessor *after*
releasing the NAR accessor. Previously the memory for the NAR accessor
might still be in use while another thread does an allocation, causing
the maximum to be exceeded temporarily.
Also, use notify_all instead of notify_one to wake up memory token
waiters. This is not very nice, but not every waiter is requesting the
same number of tokens, so some might be able to proceed.
If a step is cancelled just as its builder step is starting,
doBuildStep() will return sRetry. This causes builder() to make the
step runnable again, since the queue monitor may have added new builds
referencing it. The idea is that if the latter condition is not true,
the step's reference count will drop to zero and it will be
deleted. However, if the dispatcher thread sees and locks the step
before the reference count can drop to zero in the builder thread, the
dispatcher thread will start a new builder thread for the step. Thus
the step can be kept alive for an indefinite amount of time.
The fix is for State::builder() to use a weak pointer to the step, to
ensure that the step's reference count can drop to zero *before* it's
added to the runnable queue.
This was a bad idea because pthread_cancel() is unsalvageable broken
in C++. Destructors are not allowed to throw exceptions (especially in
C++11), but pthread_cancel() can cause a __cxxabiv1::__forced_unwind
exception inside any destructor that invokes a cancellation
point. (This exception can be caught but *must* be rethrown.) So let's
just kill the builder process instead.
It was hitting
assert(reservation.unique());
Since we do want the machine reservation to be released before calling
wakeDispatcher(), let's use a different object for keeping track of
active steps.
We now kill active build steps when there are no more referring
builds. This is useful e.g. for preventing cancelled multi-hour TPC-H
benchmark runs from hogging build machines.
If two active steps of the same build failed, then the first would be
marked as "failed", but the second would end up as "orphaned", causing
it to be marked as "aborted" later on. Now it's correctly marked as
"failed".
Without this, if (failed or aborted) derivations have been
garbage-collected, there is no way to restart them, which is very
annoying. Now we set a forceEval flag in the jobset to cause it to be
re-evaluated even if none of the inputs have changed.
‘basicDrv.inputSrcs’ also contains the outputs of inputDrvs. These
don't necessarily exist in the local store, so copying them may cause
an exception. We should only copy the real inputSrcs.
Some Hydra API requests were vulnerable to XSRF attacks, e.g. you
could have a form on another website using http://hydra/logout as the
form action. So we now require POST requests to come from the same
origin.
Reported by Hans-Christian Esperer.
This rewrites the top-level loop of hydra-evaluator in C++. The Perl
stuff is moved into hydra-eval-jobset. (Rewriting the entire evaluator
would be nice but is a bit too much work.) The new version has some
advantages:
* It can run multiple jobset evaluations in parallel.
* It uses PostgreSQL notifications so it doesn't have to poll the
database. So if a jobset is triggered via the web interface or from
a GitHub / Bitbucket webhook, evaluation of the jobset will start
almost instantaneously (assuming the evaluator is not at its
concurrency limit).
* It imposes a timeout on evaluations. So if e.g. hydra-eval-jobset
hangs connecting to a Mercurial server, it will eventually be
killed.
This prevents the server from gradually filling up due to store paths
fetched by hydra-server that then get turned into a GC root by
hydra-update-gc-roots.
Dashboards can now be marked as publically visible in the user
preferences. The dashboard URL has changed from /user/<name>/dashboard
to /dashboard/<name> because /user/<name> requires being logged in as
<name> or as an admin.
This allows fully declarative project specifications. This is best
illustrated by example:
* I create a new project, setting the declarative spec file to
"spec.json" and the declarative input to a git repo pointing
at git://github.com/shlevy/declarative-hydra-example.git
* hydra creates a special ".jobsets" jobset alongside the project
* Just before evaluating the ".jobsets" jobset, hydra fetches
declarative-hydra-example.git, reads spec.json as a jobset spec,
and updates the jobset's configuration accordingly:
{
"enabled": 1,
"hidden": false,
"description": "Jobsets",
"nixexprinput": "src",
"nixexprpath": "default.nix",
"checkinterval": 300,
"schedulingshares": 100,
"enableemail": false,
"emailoverride": "",
"keepnr": 3,
"inputs": {
"src": { "type": "git", "value": "git://github.com/shlevy/declarative-hydra-example.git", "emailresponsible": false },
"nixpkgs": { "type": "git", "value": "git://github.com/NixOS/nixpkgs.git release-16.03", "emailresponsible": false }
}
}
* When the "jobsets" job of the ".jobsets" jobset completes, hydra
reads its output as a JSON representation of a dictionary of
jobset specs and creates a jobset named "master" configured
accordingly (In this example, this is the same configuration as
.jobsets itself, except using release.nix instead of default.nix):
{
"enabled": 1,
"hidden": false,
"description": "js",
"nixexprinput": "src",
"nixexprpath": "release.nix",
"checkinterval": 300,
"schedulingshares": 100,
"enableemail": false,
"emailoverride": "",
"keepnr": 3,
"inputs": {
"src": { "type": "git", "value": "git://github.com/shlevy/declarative-hydra-example.git", "emailresponsible": false },
"nixpkgs": { "type": "git", "value": "git://github.com/NixOS/nixpkgs.git release-16.03", "emailresponsible": false }
}
}
Currently, the hydra.nixos.org queue contains 1000s of Darwin builds
that all depend on a stdenv-darwin that previously failed. However,
before, first createStep() would construct a dependency graph for each
build, then getQueuedBuilds() would discover that one of the steps had
failed previously and discard all those steps. Since the graph
construction involves a lot of uncached calls to isValidPath(), this
took several seconds per build.
Now createStep() detects the previous failure right away and bails
out.
These are build steps that remain "busy" in the database even though
they have finished, because they couldn't be updated (e.g. due to a
PostgreSQL connection problem). To prevent them from showing up as
busy in the "Machine status" page, we now periodically purge them.
Previously, if the queue monitor thread encounters a build that Hydra
has previously built, it downloaded the output paths from the binary
cache, just to determine the build products and metrics. This is very
inefficient. In particular, when doing something like merging
nixpkgs:staging into nixpkgs:master, the queue monitor thread will be
locked up for a long time fetching files from S3, causing the build
farm to be mostly idle.
Of course this is entirely unnecessary, since the build
products/metrics are already in the Hydra database. So now we just
look up a previous build with the same output path, and copy the
products/metrics.
Mutliple <githubstatus> sections are possible:
* jobs: regexp for jobs to match
* inputs: the input which corresponds to the github repo/rev whose
status we want to report. Can be repeated
* authorization: Verbatim contents of the Authorization header. See
https://developer.github.com/v3/#authentication.
Otherwise, the browser may mix up HTML and JSON responses if it has
requested both. For example, hitting the back button to return to a
job metric page will show a JSON response, because that was the last
thing the browser fetched for that URL.
This requires Catalyst::Action::Rest >= 1.20.
The previous query
select count(*) from builds b left join buildsteps s on s.build = b.id where busy = 1 and finished = 0
is suddenly taking several minutes. Probably PostgreSQL decided to use
a suboptimal query plan.
The maximum output size per build step (as the sum of the NARs of each
output) can be set via hydra.conf, e.g.
max-output-size = 1000000000
The default is 2 GiB.
Also refactored the build error / status handling a bit.
When using a binary cache store, the queue runner receives NARs from
the build machines, compresses them, and uploads them to the
cache. However, keeping multiple large NARs in memory can cause the
queue runner to run out of memory. This can happen for instance when
it's processing multiple ISO images concurrently.
The fix is to use a TokenServer to prevent the builder threads to
store more than a certain total size of NARs concurrently (at the
moment, this is hard-coded at 4 GiB). Builder threads that cause the
limit to be exceeded will block until other threads have finished.
The 4 GiB limit does not include certain other allocations, such as
for xz compression or for FSAccessor::readFile(). But since these are
unlikely to be more than the size of the NARs and hydra.nixos.org has
32 GiB RAM, it should be fine.
The old page didn't scale very well if you have 150K builds in the
queue, in fact it tended to make browsers hang. The new one just
shows, for each jobset, the number of queued builds. The actual builds
can be seen by going to the corresponding jobset page and looking at
the evals.
Same problem as d744362e4a.
at /nix/store/ksvsbr7pg4z69bv6fbbc8h7x7rm2104m-gcc-4.9.3/include/c++/4.9.3/bits/predefined_ops.h:166
__last@entry=..., __comp=...) at /nix/store/ksvsbr7pg4z69bv6fbbc8h7x7rm2104m-gcc-4.9.3/include/c++/4.9.3/bits/stl_algo.h:1827
__comp=...) at /nix/store/ksvsbr7pg4z69bv6fbbc8h7x7rm2104m-gcc-4.9.3/include/c++/4.9.3/bits/stl_algo.h:4717
Respects <slack> blocks in the hydra config, with attributes:
* jobs: a regexp matching the job name (in the format project:jobset:job)
* url: The URL to a slack incoming webhook
* force: If true, always send messages. Otherwise, only when the build status changes
Multiple <slack> blocks are allowed
To use the local Nix store (default):
store_mode = direct
To use a local binary cache:
store_mode = local-binary-cache
binary_cache_dir = /var/lib/hydra/binary-cache
To use an S3 bucket:
store_mode = s3-binary-cache
binary_cache_s3_bucket = my-nix-bucket
Also, respect binary_cache_{secret,public}_key_file for signing the
binary cache.
The queue runner no longer uses this field, and it doesn't provide
very interesting historical data (mostly SSH failures), but it takes
up a lot of space. Also, it contained some bad UTF-8 which was
preventing an upgrade to Postgres 9.5, so a good occasion to get rid
of it.
The required configuration in hydra.conf:
enable_google_login = 1
google_client_id = 238429sdjkds....apps.googleusercontent.com
and optionally persona_allowed_domains to restrict to one or more
domains.
This is necessary given the current size of the Nixpkgs/NixOS
jobsets. Once we have a Nix store + Postgres on SSD, we can reduce
this again.
Should really make this configurable...
The uid split a while back caused the web interface to create GC roots
in /nix/var/nix/gcroots/per-user/hydra-www, where they wouldn't be
purged by hydra-update-gc-roots. Thus restarted builds would
accumulate forever. The fix is to keep the roots in a shared directory
with gid=hydra.
Regression introduced by 1fdc258de0.
The commit introduced a channel/custom PathPart which uses the new
custom channel expressions, but I forgot to remove CaptureArgs, so the
URL really is channel/latest/ignored-value.
Signed-off-by: aszlig <aszlig@redmoonstudios.org>
Reported-by: Peter Simons <simons@cryp.to>
This removes the "busy", "locker" and "logfile" columns, which are no
longer used by the queue runner. The "Running builds" page now only
shows builds that have an active build step.
Previously, priority bumps could take a long time to get noticed if
getQueuedBuilds() was busy processing zillions of queue
additions. (This was made worse by the reintroduction of substitute
checking.)
We have this set in upgrade-42.sql, so it's better to stay consistent
with the basic SQL file to avoid problems with new Hydra installations.
Signed-off-by: aszlig <aszlig@redmoonstudios.org>
Reported-by: Eelco Dolstra <eelco.dolstra@logicblox.com>
There is still a tiny window between the calls to nix-prefetch-* and
addTempRoot. This could be eliminated by adding a "-o" option to
nix-prefetch-*, or by not using those scripts at all (and use
addToStore directly).
This allows Hydra to use binaries from available binary caches. It
makes the queue monitor thread quite a bit slower, so if you don't
want to use binary caches, it's better to add "--option
build-use-substitutes false" to the hydra-queue-runner invocation.
Fixed#243.
The last paragraph states about package installation of the "following"
jobs, but it only applies to generic channels, so let's only display it
there.
Signed-off-by: aszlig <aszlig@redmoonstudios.org>
So this is the final part which is needed in order to be able to deliver
custom channels, everything else is now just polishing.
We do this by simply redirecting to the build product download URL and
we use binary_cache_url the same way as in NixChannel.pm.
Signed-off-by: aszlig <aszlig@redmoonstudios.org>
We should now get an overview and help text on how to add a particular
channel and also a bit of information about the builds that are required
for a channel to get upgraded.
Right now we only select the latest successful build in the latest
successful evaluation, so if someone wants to have more information about
which channel has failed, (s)he still has to look at the "Channels" tab
of the jobset.
We can make this more fancy at some later point if this is really
needed, because right now we're only interested in the latest build,
because it's the only thing necessary to deliver the channel contents.
Signed-off-by: aszlig <aszlig@redmoonstudios.org>
It's actually lower-case _despite_ the spelling in the SQL file(s),
because the schema auto-generator from DBIx::Class doesn't take it into
account because it's working on SQLite and the latter seems to ignore
case.
Signed-off-by: aszlig <aszlig@redmoonstudios.org>
We want to have contents and detauls of channel expressions as well and
we already have that in product.type == file, so why not reuse the same
for the channel expression?
Signed-off-by: aszlig <aszlig@redmoonstudios.org>
We now have a searchBuildsAndEvalsForJobset, which creates such a
mapping for us, so we don't need to duplicate code in jobs_tab and
channels_tab.
Also, we're going to use this for the overview of a particular channel
as well, so it makes sense to put it in CatalystUtils instead of
directly in Jobset.pm.
Instead of eval->jobs, it's now eval->builds, because it's really an
aggregate over the builds schema, rather than the job schema.
Signed-off-by: aszlig <aszlig@redmoonstudios.org>
We only allow channel/latest anyway, so it really doesn't make sense to
explicitly specify this in the PathPart and provide other dispatcher
once we have more than just "latest", which greatly simplifies the
dispatch tree.
Signed-off-by: aszlig <aszlig@redmoonstudios.org>
We now have a column for that, so no need for counting rows which was a
bit inefficient anyway, because we only would have needed the first row
in the result.
Signed-off-by: aszlig <aszlig@redmoonstudios.org>
Now that we have our dedicated "Channels" tab, there is no need anymore
to show redundant information.
Signed-off-by: aszlig <aszlig@redmoonstudios.org>
We now no longer need that additional join of the build outputs and can
solely use the isChannel column of the Builds table to determine whether
it's a channel build.
Signed-off-by: aszlig <aszlig@redmoonstudios.org>
This is to properly separate channels from regular jobs and also make
sure that we can always iterate on them, no matter whether the build has
failed. The reason why we were not able to do this until now was because
we were iterating on the build products, and whenever some constituent
of a channel job has failed, we didn't get a build output.
So whenever there is a meta.isHydraChannel, we can now properly
distinguish it from the other jobs.
I still don't have any clue, why "make -C src/sql update-dbix" without
*any* modifications tries to create additional schema definitions. But
I've checked the md5sums of the existing schema definitions and they
don't seem to match, so it seems that they already have been tampered
with.
Signed-off-by: aszlig <aszlig@redmoonstudios.org>
Now we can provide different channel expressions for one particular
channel build. Not sure yet how this would be useful, but I found it
more appropriate to use a type instead of a subtype of "file".
This should get us consistent with the provious commit.
Signed-off-by: aszlig <aszlig@redmoonstudios.org>
This is to get a bit more consistency among channel builds but doesn't
do a radical change on the display. Ideally we may want to have a
channel overview with all the constituents and a small help showing how
the user can add the channel.
Unfortunately, this also introduces an inconsistency: We previously used
the *subtype* "channel", but now we're expecting "channel" as the type
of the product, so we need to change this for the channels overview as
well.
Signed-off-by: aszlig <aszlig@redmoonstudios.org>
It's very similar to "jobs" and the code is pretty much the same, except
that we don't do filtering on it. At least it doesn't waste space for a
filter option when there are usually WAY less channel jobs than ordinary
jobs.
Signed-off-by: aszlig <aszlig@redmoonstudios.org>
Currently I'm using a (not very well) downscaled version of the NixOS
logo, so we want to replace it by a proper image ASAP.
Other than that, the idea is to have something like this in
hydra-build-products:
file channel $out/channel.tar.bz2
Right now of course, it's only displayed at the corresponding builds, so
we might want to have aggregates on all channels for a project, jobset
or maybe even single jobs?
Signed-off-by: aszlig <aszlig@redmoonstudios.org>
They will show up in machineTypes as (e.g.) x86_64-linux:local instead
of x86_64-linux. This is to prevent the Hydra provisioner from
creating machines for steps that are supposed to be executed locally.
It's easier for the Hydra provisioner to put host public keys in the
machines file than to separately manage the known_hosts file
(especially when the provisioner runs on a different machine).
This is necessary because the required system type can become
available later (e.g. by being provisioned by the
auto-scaler). However, in the future, we may want to fail steps if
they have been unsupported for more than a certain amount of time.
For example, steps that require the "kvm" feature may require a
different kind of machine to be provisioned. This can also be used to
require performance-sensitive tests to run on a particular kind of
machine, e.g., by setting requiredSystemFeatures to something like
"ec2-i2.8xlarge".
"hydra-queue-runner --status" now prints how many runnable and running
build steps exist for each machine type. This allows additional
machines to be provisioned based on the Hydra load.
If there is no input named 'inputs', hydra-eval-jobs now passes in a set
of lists, where each attribute corresponds to an input defined in the
jobset specification and each list element is a different input alt, as
an argument named 'inputs'.
Among other things, this allows for generic hydra expressions to be
shared amongst projects with similar structures but different sets of
specific inputs.
Builds can now emit metrics that Hydra will store in its database and
render as time series via flot charts. Typical applications are to
keep track of performance indicators, coverage percentages, artifact
sizes, and so on.
For example, a coverage build can emit the coverage percentage as
follows:
echo "lineCoverage $pct %" > $out/nix-support/hydra-metrics
Graphs of all metrics for a job can be seen at
http://.../job/<project>/<jobset>/<job>#tabs-charts
Specific metrics are also visible at
http://.../job/<project>/<jobset>/<job>/metric/<metric>
The latter URL also allows getting the data in JSON format (e.g. via
"curl -H 'Accept: application/json'").
If Hydra isn't hosted on https://example.com/ but something like
https://example.com/hydra/, the URL for /api/scmdiff would have ended up
on /api/scmdiff rather than /hydra/api/scmdiff.
This is because we didn't use the URI resolver from the controller,
hence we're using it now to build up the whole URL including the query
string.
Signed-off-by: aszlig <aszlig@redmoonstudios.org>
Without an index on (machine, stoptime desc), this requires a
sequential scan. And adding a whole index for this seems
overkill. (Possibly the queue runner could maintain this info more
efficiently.)
This prevents a race where multiple threads see that machine X is
missing path P, and start sending it concurrently. Nix handles this
correctly, but it's still wasteful (especially for the case where P ==
GHC).
A more refined scheme would be to have per machine, per path locks.
Derivations with "preferLocalBuild = true" can now be executed on
specific machines (typically localhost) by setting the mandary system
features field to include "local". For example:
localhost x86_64-linux,i686-linux - 10 100 - local
says that "localhost" can *only* do builds with "preferLocalBuild =
true". The speed factor of 100 will make the machine almost always win
over other machines.
Otherwise we never recover from reset daemon connections, e.g.
hydra-queue-runner[16106]: while loading build 599369: cannot start daemon worker: reading from file: Connection reset by peer
hydra-queue-runner[16106]: while loading build 599236: writing to file: Broken pipe
...
The error is now handled queueMonitor(), causing the next call to
queueMonitorLoop() to create a new connection.
This is currently done by a separate program that periodically
calls "hydra-queue-runner --status". Eventually, I'll do this
in the queue runner directly.
Fixes#220.
Having a hundred threads doing I/O at the same time is bad on magnetic
disks because of the excessive disk seeks. So allow only 4 threads to
copy closures in parallel.
While sorting machines by load, the load of a machine
(machine->currentJobs) can be changed by other threads. If that
happens, the comparator is no longer a proper ordering, in which case
std::sort() can segfault. So we now make a copy of currentJobs before
sorting.
There is a slight possibility that the queue monitor and a builder
thread simultaneously decide to mark a build as finished. That's fine,
as long as we ensure the DB update is idempotent (as ensured by doing
"update Builds set finished = 1 ... where finished = 0").
If a build A depends on a derivation that is the top-level derivation
of some build B, then we should process B before A (meaning we
shouldn't make the derivation runnable before B has been
added). Otherwise, the derivation will be "accounted" to A rather than
B (so the build step will show up in the wrong build).
Aborted builds are now put back on the runnable queue and retried
after a certain time interval (currently 60 seconds for the first
retry, then tripled on each subsequent retry).
Hydra-queue-runner now no longer polls the queue periodically, but
instead sleeps until it receives a notification from PostgreSQL about
a change to the queue (build added, build cancelled or build
restarted).
Also, for the "build added" case, we now only check for builds with an
ID greater than the previous greatest ID. This is much more efficient
if the queue is large.
It just makes things unnecessarily complicated. We can just exit
without cleaning anything up, since the only thing to do is unmark
builds and build steps as busy. But we can do that by having systemd
call "hydra-queue-runner --unlock" from ExecStopPost.
If multiple threads create a step for the same build, they could get
the same "max(stepnr)" and allocate conflicting new step numbers. So
lock the BuildSteps table while doing this. We could use a different
isolation level, but this is easier.
This removes the need for Nix's build-remote.pl.
Build logs are now written to $HYDRA_DATA/build-logs because
hydra-queue-runner doesn't have write permission to /nix/var/log.
When visiting the tail-reload page, for a short amount of time the
"unscrolled" version is shown. To circumvent that, let's scroll down
immediately at the first possibility to fill the gap between the loading
of the document and the first AJAX request coming in.
Signed-off-by: aszlig <aszlig@redmoonstudios.org>
There are quite a lot of build outputs which have lines with a length
exceeding the width of the taillog <pre/> and thus visually produce more
lines than 50. This causes the tail "box" to change height frequently
and to get to the bottom you need to scroll down.
We now set a fixed line-height to 120% of the font size and cap the
maximum height based on that value (50 * 1.2 = 60). It's probably not
nice to override the line-height, but max-lines is currently only
available using browser-specific property names. But after all it's just
for the tail output, if people complain about the line-height, we can
still change it :-)
Signed-off-by: aszlig <aszlig@redmoonstudios.org>
We're just implicitly escaping the tail content by not using .load() but
explicitly setting the text content using .text(), so that escaping
isn't needed on our side.
This should get rid of a few formatting errors and possibly XSS if
someone manages to place JS code in the tail of a build and manages to
lurk a user to that tail output.
Signed-off-by: aszlig <aszlig@redmoonstudios.org>