`dev-notes` are severely outdated. I dropped everything except one note
that I moved to hacking.md. The parts about creating users are also
covered elsewhere.
The `update-dbix` part got a just command to make it discoverable again.
There are some known regressions regarding local testing setups - since
everything was kinda half written with the expectation that build dir =
source dir (which should not be true anymore). But everything builds and
the test suite runs fine, after several hours spent debugging random
crashes in libpqxx with MALLOC_PERTURB_...
This verison has a worse UI, but also chnages the schema less: One
non-null constraint is removed, but no new columns are added.
Co-Authored-By: Andrea Ciceri <andrea.ciceri@autistici.org>
Co-Authored-By: regnat <rg@regnat.ovh>
This gives us a place to put helper functions that act on entire
tables, not just individual records.
This should be a backwards compatible change, except in places we're
manually using result class names.
Without this commit, two jobsets using the same repository as input,
but different `deepClone` options, end up incorrectly sharing the same
"checkout" for a given (`uri`, `branch`, `revision`) tuple. The
presence or absence of `.git` is determined by the jobset execution
order.
This patch adds the missing `isDeepClone` boolean to the cache key.
The database upgrade script empties the `CachedGitInputs` table, as we
don't know if existing checkouts are deep clones. Unfortunately, this
generally forces rebuilds even for correct `deepClone` checkouts, as
the binary contents of `.git` are not deterministic.
Fixes#510
The current check happening in jobsets is incorrect.
The wanted constraint is stated as follow :
- If type is 0 (legacy), then the flake field should be null, and
both nixExprInput and nixExprPath should be non-null
- If type is 1 (flake), then the flake field should be non-null, and
both nixExprInput and nixExprPath should be null
The current version will not catch (i.e. it will accept) situations
where you have for instance :
type = 1, nixExprPath null, nixExprInput non-null, flake non-null
This commit fixes that.
I split(ted) that into two constraints, to make it more readable and
easier to extend if a new type appears in the future.
The complete query could be instead :
( type = 0
AND nixExprInput IS NOT NULL AND nixExprPath IS NOT NULL AND flake IS NULL )
OR ( type = 1
AND nixExprInput IS NULL AND nixExprPath IS NULL AND flake IS NOT NULL )
(but an "OR" cannot be split, hence the other formulation)
DBIx likes to eagerly select all columns without a way to really tell
it so. Therefore, this splits this one large column in to its own
table.
I'd also like to make "jobsets" use this table too, but that is on hold
to stop the bleeding caused by the extreme amount of traffic this is
causing.
Duplicating this data on every record of the builds table cost
approximately 4G of duplication.
Note that the database migration included took about 4h45m on an
untuned server which uses very slow rotational disks in a RAID5 setup,
with not a lot of RAM. I imagine in production it might take an hour
or two, but not 4. If this should become a chunked migration, I can do
that.
Note: Because of the question about chunked migrations, I have NOT
YET tested this migration thoroughly enough for merge.
Looking at AWS' Performance Insights for a Hydra instance, I found
the hydra-queue-runner's query:
select id, buildStatus, releaseName, closureSize, size
from Builds b
join BuildOutputs o on b.id = o.build
where
finished = ?
and (buildStatus = ? or buildStatus = ?)
and path = $1
was the slowest query by at least 10x. Running an explain on this
showed why:
hydra=> explain select id, buildStatus, releaseName, closureSize, size
from Builds b join BuildOutputs o on b.id = o.build where
finished = 1 and (buildStatus = 0 or buildStatus = 6) and
path = '/nix/store/s93khs2dncf2cy273mbyr4fb4ns3db20-MIDIVisualizer-5.1';
QUERY PLAN
------------------------------------------------------------------------
Gather (cost=1000.43..33718.98 rows=2 width=56)
Workers Planned: 2
-> Nested Loop (cost=0.43..32718.78 rows=1 width=56)
-> Parallel Seq Scan on buildoutputs o (cost=0.00..32710.32
rows=1
width=4)
Filter: (path = '/nix/store/s93kh...snip...'::text)
-> Index Scan using indexbuildsonjobsetidfinishedid on builds b
(cost=0.43..8.45 rows=1 width=56)
Index Cond: ((id = o.build) AND (finished = 1))
Filter: ((buildstatus = 0) OR (buildstatus = 6))
(8 rows)
A paralell sequential scan is definitely better than a sequential scan, but the
cost ranging from 0 to 32710 is not great. Looking at the table, I saw the `path`
column is completely unindex:
hydra=> \d buildoutputs
Table "public.buildoutputs"
Column | Type | Collation | Nullable | Default
--------+---------+-----------+----------+---------
build | integer | | not null |
name | text | | not null |
path | text | | not null |
Indexes:
"buildoutputs_pkey" PRIMARY KEY, btree (build, name)
Foreign-key constraints:
"buildoutputs_build_fkey" FOREIGN KEY (build) REFERENCES builds(id)
ON DELETE CASCADE
Since we always do exact matches on the path and don't care about ordering,
and since the path column is very high cardinality a `hash` index is a
good candidate. Note that I did test a btree index and it performed
similarly well, but slightly worse.
After creating the index (this took about 10 seconds) on a test database:
create index IndexBuildOutputsPath on BuildOutputs using hash(path);
We get a *significantly* reduced cost:
hydra=> explain select id, buildStatus, releaseName, closureSize, size
hydra-> from Builds b join BuildOutputs o on b.id = o.build where
hydra-> finished = 1 and (buildStatus = 0 or buildStatus = 6) and
hydra-> path = '/nix/store/s93khs2dncf2cy273mbyr4fb4ns3db20-MIDIVisualizer-5.1';
QUERY PLAN
-------------------------------------------------------------------------------------------------------
Nested Loop (cost=0.43..41.41 rows=2 width=56)
-> Index Scan using buildoutputs_path_hash on buildoutputs o (cost=0.00..16.05 rows=3 width=4)
Index Cond: (path = '/nix/store/s93khs2dncf2cy273mbyr4fb4ns3db20-MIDIVisualizer-5.1'::text)
-> Index Scan using indexbuildsonjobsetidfinishedid on builds b (cost=0.43..8.45 rows=1 width=56)
Index Cond: ((id = o.build) AND (finished = 1))
Filter: ((buildstatus = 0) OR (buildstatus = 6))
(6 rows)
For direct comparison, the overall query plan was changed:
From: Gather (cost=1000.43..33718.98 rows=2 width=56)
To: Nested Loop (cost= 0.43.....41.41 rows=2 width=56)
and the query plan for buildoutputs changed from a maximum cost of
32,710 down to 16.
In practical terms, the query's planning and execution time was reduced:
Before (ms) | Try 1 | Try 2 | Try 3
------------+---------+---------+--------
Planning | 0.898 | 0.416 | 0.383
Execution | 138.644 | 172.331 | 375.585
After (ms) | Try 1 | Try 2 | Try 3
------------+---------+---------+--------
Planning | 0.298 | 0.290 | 0.296
Execution | 219.625 | 0.035 | 0.034
Requires the following configuration options
enable_github_login = 1
github_client_id
github_client_secret
Or github_client_secret_file which points to a file with the secret
As of https://github.com/NixOS/hydra/pull/737 (removal of sqlite
dependency), the only supported database is Postgresql.
This change removes all references to hydra-postgresql.sql file. This
file is generated using a cpp on hydra.sql, but doesn't differ from
hydra.sql at all.
SQLite isn't properly supported by Hydra for a few years now[1], but
Hydra still depends on it. Apart from a slightly bigger closure this can
cause confusion by users since Hydra picks up SQLite rather than
PostgreSQL by default if HYDRA_DBI isn't configured properly[2]
[1] 78974abb69
[2] https://logs.nix.samueldr.com/nixos-dev/2020-04-10#3297342;
In the past, jobsets which are automatically evaluated are evaluated
regularly, on a schedule. This schedule means a new evaluation is
created every checkInterval seconds (assuming something changed.)
This model works well for architectures where our build farm can
easily keep up with demand.
This commit adds a new type of evaluation, called ONE_AT_A_TIME, which
only schedules a new evaluation if the previous evaluation of the
jobset has no unfinished builds.
This model of evaluation lets us have 'low-tier' architectures.
For example, we could now have a jobset for ARMv7l builds, where
the buildfarm only has a single, underpowered ARMv7l builder.
Configuring that jobset as ONE_AT_A_TIME will create an evaluation
and then won't schedule another evaluation until every job of
the existing evaluation is complete.
This way, the cache will have a complete collection of pre-built
software for some commits, but the underpowered architecture will
never become backlogged in ancient revisions.