Commit graph

228 commits

Author SHA1 Message Date
e987f74954
doc: drop dev-notes & make update-dbix more discoverable
`dev-notes` are severely outdated. I dropped everything except one note
that I moved to hacking.md. The parts about creating users are also
covered elsewhere.

The `update-dbix` part got a just command to make it discoverable again.
2024-08-18 14:47:09 +02:00
4b886d9c45
autotools -> meson
There are some known regressions regarding local testing setups - since
everything was kinda half written with the expectation that build dir =
source dir (which should not be true anymore). But everything builds and
the test suite runs fine, after several hours spent debugging random
crashes in libpqxx with MALLOC_PERTURB_...
2024-07-22 22:30:41 +02:00
John Ericson
b503280256 Add migration to drop non-null constraints 2024-01-26 11:53:58 -05:00
John Ericson
323b556dc8 Minimal CA support
This verison has a worse UI, but also chnages the schema less: One
non-null constraint is removed, but no new columns are added.

Co-Authored-By: Andrea Ciceri <andrea.ciceri@autistici.org>
Co-Authored-By: regnat <rg@regnat.ovh>
2024-01-26 00:34:58 -05:00
Cole Helbling
810d2e6b51 Drop unused IndexBuildOutputsOnPath index
Also it's larger than the actual table it's indexing lol.

    -[ RECORD 30 ]----------+-----------------------------------------
    table_name              | buildoutputs
    index_name              | indexbuildoutputsonpath
    index_scans_count       | 0
    index_size              | 31 GB
    table_reads_index_count | 2128699937
    table_reads_seq_count   | 0
    table_reads_count       | 2128699937
    table_writes_count      | 22442976
    table_size              | 28 GB
2023-03-06 07:56:05 -08:00
Graham Christensen
85a53694c8 sql: add enable_dynamic_run_command to the Project as well 2022-02-01 10:58:54 -05:00
Graham Christensen
a9bfabd672 sql: add a migration for enable_dynamic_run_command 2022-02-01 10:58:23 -05:00
Graham Christensen
97a1d2d1d4 Jobsets: add enable_dynamic_run_command 2022-02-01 10:57:30 -05:00
Cole Helbling
b57345ba1f hydra.sql: add IndexRunCommandLogsOnBuildID index 2022-01-31 12:56:34 -08:00
Cole Helbling
d0b6329aa8 sql/upgrade-81: remove unnecessary comment 2022-01-31 12:55:36 -08:00
Graham Christensen
cf49a05ff5 RunCommandLogs: add a uuid to each log entry 2022-01-31 08:58:33 -08:00
Graham Christensen
f120909547 builds: drop project, jobset columns
Indexes were haphazardly dropped.
2022-01-15 15:58:02 -05:00
Graham Christensen
2f382ba067 Add migration 79: RunCommand logs 2022-01-07 15:05:33 -05:00
Graham Christensen
52843195db RunCommandLogs: init table 2022-01-07 15:05:33 -05:00
Graham Christensen
ecb4697930 update-dbix: overwrite modifications
Prevents authors from mistakenly corrupting the hashes
2021-11-19 15:02:07 -05:00
Graham Christensen
ff888032eb SystemTypes: drop database table. It was originally removed in #65, but put back in fcd511c4de, and now totally unused. 2021-10-24 21:38:04 -04:00
Eelco Dolstra
2745226ada
Merge pull request #1003 from DeterminateSystems/perlcritic-level-4
perlcritic: level 4
2021-09-27 20:23:55 +02:00
Your Name
4677a7c894 perlcritic: use strict, use warnings 2021-09-06 22:13:33 -04:00
Graham Christensen
c4134c8e84 TaskRetries: init table 2021-09-02 10:06:26 -04:00
Graham Christensen
fa57fb8f25 hydra.sql: explain update-dbix.pl map 2021-08-26 22:10:19 -04:00
Graham Christensen
397d13a300 DBIx::Class: migrate to use_namespaces
This gives us a place to put helper functions that act on entire
tables, not just individual records.

This should be a backwards compatible change, except in places we're
manually using result class names.
2021-08-26 12:37:19 -04:00
Graham Christensen
c7c322545d
Merge pull request #992 from DeterminateSystems/sql/fixup-comment
hydra.sql: Update comment on regeneration
2021-08-06 14:54:12 -04:00
Graham Christensen
4169f22231 update-dbix.pl: correct indentation 2021-08-06 14:40:57 -04:00
Graham Christensen
5bd8dc171b hydra.sql: Update comment on regeneration
We no longer need to generate the hydra-postgres.sql document,
that is a relic from when sqlite was also supported.
2021-08-06 14:40:34 -04:00
Damien Diederen
df7dab1291 GitInput: Include deepClone option in the cache key
Without this commit, two jobsets using the same repository as input,
but different `deepClone` options, end up incorrectly sharing the same
"checkout" for a given (`uri`, `branch`, `revision`) tuple.  The
presence or absence of `.git` is determined by the jobset execution
order.

This patch adds the missing `isDeepClone` boolean to the cache key.

The database upgrade script empties the `CachedGitInputs` table, as we
don't know if existing checkouts are deep clones.  Unfortunately, this
generally forces rebuilds even for correct `deepClone` checkouts, as
the binary contents of `.git` are not deterministic.

Fixes #510
2021-06-19 17:37:40 +02:00
Graham Christensen
a9e4ede006 SQL: create better indexes for builds based on the jobset id
These are primarily used by the jobsetOverview renders.
2021-06-01 11:23:22 -04:00
Ismaël Bouya
339a09f2e4
Fix check in jobsets
The current check happening in jobsets is incorrect.
The wanted constraint is stated as follow :
- If type is 0 (legacy), then the flake field should be null, and
  both nixExprInput and nixExprPath should be non-null
- If type is 1 (flake), then the flake field should be non-null, and
  both nixExprInput and nixExprPath should be null

The current version will not catch (i.e. it will accept) situations
where you have for instance :
type = 1, nixExprPath null, nixExprInput non-null, flake non-null

This commit fixes that.

I split(ted) that into two constraints, to make it more readable and
easier to extend if a new type appears in the future.

The complete query could be instead :
( type = 0
  AND nixExprInput IS NOT NULL AND nixExprPath IS NOT NULL AND flake IS NULL )
OR ( type = 1
  AND nixExprInput IS NULL AND nixExprPath IS NULL AND flake IS NOT NULL )

(but an "OR" cannot be split, hence the other formulation)
2021-02-03 22:14:53 +01:00
Graham Christensen
f1e75c8bff
Move evaluation errors from evaluations to EvaluationErrors, a new table
DBIx likes to eagerly select all columns without a way to really tell
it so. Therefore, this splits this one large column in to its own
table.

I'd also like to make "jobsets" use this table too, but that is on hold
to stop the bleeding caused by the extreme amount of traffic this is
causing.
2021-02-01 21:33:14 -05:00
Graham Christensen
ac3e8a4a59
jobsetevals: refer to jobset by ID 2021-01-26 11:50:37 -05:00
Graham Christensen
bf674a9653
hydra.sql: embed some in-line docs about schema changes 2021-01-26 11:50:36 -05:00
Graham Christensen
dc5a0d59c5
sql: Stop loading SQL if an error occurs
Otherwise we may go ahead and create DBIx classes for a half-loaded schema.
2021-01-26 11:50:32 -05:00
Graham Christensen
9516b256f1
Normalize nixexpr{input,path} from builds to jobsetevals.
Duplicating this data on every record of the builds table cost
approximately 4G of duplication.

Note that the database migration included took about 4h45m on an
untuned server which uses very slow rotational disks in a RAID5 setup,
with not a lot of RAM. I imagine in production it might take an hour
or two, but not 4. If this should become a chunked migration, I can do
that.

Note: Because of the question about chunked migrations, I have NOT
YET tested this migration thoroughly enough for merge.
2021-01-22 09:10:18 -05:00
Graham Christensen
d9989b7fa1
Schema: add errorMsg, errorTime to JobsetEvals 2021-01-21 13:10:41 -05:00
Graham Christensen
bc4b96d053
BuildOutputs: index path with HASH
Looking at AWS' Performance Insights for a Hydra instance, I found
the hydra-queue-runner's query:

    select id, buildStatus, releaseName, closureSize, size
    from Builds b
    join BuildOutputs o on b.id = o.build
    where
      finished = ?
      and (buildStatus = ? or buildStatus = ?)
      and path = $1

was the slowest query by at least 10x. Running an explain on this
showed why:

hydra=> explain select id, buildStatus, releaseName, closureSize, size
    from Builds b join BuildOutputs o on b.id = o.build where
    finished = 1 and (buildStatus = 0 or buildStatus = 6) and
    path = '/nix/store/s93khs2dncf2cy273mbyr4fb4ns3db20-MIDIVisualizer-5.1';

                                                     QUERY PLAN
    ------------------------------------------------------------------------
     Gather  (cost=1000.43..33718.98 rows=2 width=56)
       Workers Planned: 2
       ->  Nested Loop  (cost=0.43..32718.78 rows=1 width=56)
             ->  Parallel Seq Scan on buildoutputs o  (cost=0.00..32710.32
                                                       rows=1
                                                        width=4)
                   Filter: (path = '/nix/store/s93kh...snip...'::text)
             ->  Index Scan using indexbuildsonjobsetidfinishedid on builds b
                                            (cost=0.43..8.45 rows=1 width=56)
                   Index Cond: ((id = o.build) AND (finished = 1))
                   Filter: ((buildstatus = 0) OR (buildstatus = 6))
    (8 rows)

A paralell sequential scan is definitely better than a sequential scan, but the
cost ranging from 0 to 32710 is not great. Looking at the table, I saw the `path`
column is completely unindex:

    hydra=> \d buildoutputs
                Table "public.buildoutputs"
    Column |  Type   | Collation | Nullable | Default
    --------+---------+-----------+----------+---------
    build  | integer |           | not null |
    name   | text    |           | not null |
    path   | text    |           | not null |
    Indexes:
        "buildoutputs_pkey" PRIMARY KEY, btree (build, name)
    Foreign-key constraints:
        "buildoutputs_build_fkey" FOREIGN KEY (build) REFERENCES builds(id)
            ON DELETE CASCADE

Since we always do exact matches on the path and don't care about ordering,
and since the path column is very high cardinality a `hash` index is a
good candidate. Note that I did test a btree index and it performed
similarly well, but slightly worse.

After creating the index (this took about 10 seconds) on a test database:

    create index IndexBuildOutputsPath on BuildOutputs using hash(path);

We get a *significantly* reduced cost:

    hydra=> explain select id, buildStatus, releaseName, closureSize, size
    hydra->     from Builds b join BuildOutputs o on b.id = o.build where
    hydra->     finished = 1 and (buildStatus = 0 or buildStatus = 6) and
    hydra->     path = '/nix/store/s93khs2dncf2cy273mbyr4fb4ns3db20-MIDIVisualizer-5.1';
                                                QUERY PLAN
    -------------------------------------------------------------------------------------------------------
    Nested Loop  (cost=0.43..41.41 rows=2 width=56)
    ->  Index Scan using buildoutputs_path_hash on buildoutputs o  (cost=0.00..16.05 rows=3 width=4)
            Index Cond: (path = '/nix/store/s93khs2dncf2cy273mbyr4fb4ns3db20-MIDIVisualizer-5.1'::text)
    ->  Index Scan using indexbuildsonjobsetidfinishedid on builds b  (cost=0.43..8.45 rows=1 width=56)
            Index Cond: ((id = o.build) AND (finished = 1))
            Filter: ((buildstatus = 0) OR (buildstatus = 6))
    (6 rows)

For direct comparison, the overall query plan was changed:

    From: Gather      (cost=1000.43..33718.98 rows=2 width=56)
    To:   Nested Loop (cost=   0.43.....41.41 rows=2 width=56)

and the query plan for buildoutputs changed from a maximum cost of
32,710 down to 16.

In practical terms, the query's planning and execution time was reduced:

Before (ms) | Try 1   | Try 2   | Try 3
------------+---------+---------+--------
Planning    |   0.898 |   0.416 |   0.383
Execution   | 138.644 | 172.331 | 375.585

After (ms)  | Try 1   | Try 2   | Try 3
------------+---------+---------+--------
Planning    |   0.298 |   0.290 |   0.296
Execution   | 219.625 |   0.035 |   0.034
2021-01-18 11:28:05 -05:00
Jelle Besseling
bbd4891133
Implement GitHub logins
Requires the following configuration options
enable_github_login = 1
github_client_id
github_client_secret
Or github_client_secret_file which points to a file with the secret
2020-12-28 14:37:03 +01:00
Eelco Dolstra
87317812a8 Fix some broken indices
These indices basically did nothing since they put "id" first.

In particular this makes /job/.../all much faster.
2020-10-28 14:30:44 +01:00
Eelco Dolstra
58a8b1c91c
Keep the SHA-1 column in existing installations 2020-07-28 11:47:44 +02:00
Eelco Dolstra
d4e4be4fd1
Remove SHA-1 hash from BuildProducts
SHA-1 is deprecated and it will be expensive to compute with the
streaming NAR handler.
2020-07-27 18:24:10 +02:00
Nikola Knezevic
3acdd21569 Remove references to hydra-postgresql.sql
As of https://github.com/NixOS/hydra/pull/737 (removal of sqlite
dependency), the only supported database is Postgresql.

This change removes all references to hydra-postgresql.sql file. This
file is generated using a cpp on hydra.sql, but doesn't differ from
hydra.sql at all.
2020-06-05 13:42:56 +02:00
Eelco Dolstra
8adb433e3b
Remove the Jobs table
This table has been superfluous for a long time.
2020-05-27 20:09:36 +02:00
Nikola Knezevic
e9922c460e Add missing SQL upgrade script for NOT NULL on type
`type` column in `Jobsets` is defined as NOT NULL. However, the original upgrade
script adding this column ommited the constraint.
2020-05-18 10:59:55 +02:00
Eelco Dolstra
96a514c169
Remove the "releases" feature
We haven't used this in many years (it was really only used for nix
and patchelf releases).
2020-05-06 12:39:21 +02:00
efcbc08686
Get rid of dependency to SQLite
SQLite isn't properly supported by Hydra for a few years now[1], but
Hydra still depends on it. Apart from a slightly bigger closure this can
cause confusion by users since Hydra picks up SQLite rather than
PostgreSQL by default if HYDRA_DBI isn't configured properly[2]

[1] 78974abb69
[2] https://logs.nix.samueldr.com/nixos-dev/2020-04-10#3297342;
2020-04-16 00:42:40 +02:00
Graham Christensen
5fae9d96a2
hydra-evaluator: add a 'ONE_AT_A_TIME' evaluator style
In the past, jobsets which are automatically evaluated are evaluated
regularly, on a schedule. This schedule means a new evaluation is
created every checkInterval seconds (assuming something changed.)

This model works well for architectures where our build farm can
easily keep up with demand.

This commit adds a new type of evaluation, called ONE_AT_A_TIME, which
only schedules a new evaluation if the previous evaluation of the
jobset has no unfinished builds.

This model of evaluation lets us have 'low-tier' architectures.

For example, we could now have a jobset for ARMv7l builds, where
the buildfarm only has a single, underpowered ARMv7l builder.
Configuring that jobset as ONE_AT_A_TIME will create an evaluation
and then won't schedule another evaluation until every job of
the existing evaluation is complete.

This way, the cache will have a complete collection of pre-built
software for some commits, but the underpowered architecture will
never become backlogged in ancient revisions.
2020-03-03 19:28:44 -05:00
Graham Christensen
027668f0db
hydra.sql: add an index for slow queries in production
These queries used to use (jobset, project) based indexes,
and the addition of jobset_id makes most of those indexes
unusable now.
2020-02-11 12:52:28 -05:00
Graham Christensen
834793468f
fixup: d'oh, make the migrations from #710 part-2 sequential 2020-02-11 08:36:14 -05:00
Graham Christensen
2637a7ad76
Builds: index literally what latest-finished queries 2020-02-11 07:06:21 -05:00
Graham Christensen
8ef08f1385
Builds.jobset_id: make not-null 2020-02-11 07:06:20 -05:00
Graham Christensen
2cdcc7f188
Jobs.jobset_id: make not-null 2020-02-11 07:06:17 -05:00
Graham Christensen
ddf00fa627
Builds: add a nullable jobset_id foreign key to Jobsets.
Also, adds an explicitly named "builds" accessor to the Jobsets
Schema object, which uses the project/jobset name.
2020-02-10 11:43:02 -05:00