Commit graph

7 commits

Author SHA1 Message Date
Sergei Trofimovich 975b0b52e7 ca: add sqlite index on RealisationsRefs(realisationReference)
Without the change any CA deletion triggers linear scan on large
RealisationsRefs table:

    sqlite>.eqp full
    sqlite> delete from RealisationsRefs where realisationReference IN ( select id from Realisations where outputPath = 1234567890 );
    QUERY PLAN
    |--SCAN RealisationsRefs
    `--LIST SUBQUERY 1
       `--SEARCH Realisations USING COVERING INDEX IndexRealisationsRefsOnOutputPath (outputPath=?)

With the change it gets turned into a lookup:

    sqlite> CREATE INDEX IndexRealisationsRefsRealisationReference on RealisationsRefs(realisationReference);
    sqlite> delete from RealisationsRefs where realisationReference IN ( select id from Realisations where outputPath = 1234567890 );
    QUERY PLAN
    |--SEARCH RealisationsRefs USING INDEX IndexRealisationsRefsRealisationReference (realisationReference=?)
    `--LIST SUBQUERY 1
       `--SEARCH Realisations USING COVERING INDEX IndexRealisationsRefsOnOutputPath (outputPath=?)
2022-04-21 10:06:39 +02:00
regnat 86d7a11c6b Make sure to delete all the realisation refs
Deleting just one will only work in the test cases where I didn’t bother
creating too many of them :p
2022-04-21 10:06:39 +02:00
regnat 92656da0b9 Fix the gc with indirect self-references via the realisations
If the derivation `foo` depends on `bar`, and they both have the same
output path (because they are CA derivations), then this output path
will depend both on the realisation of `foo` and of `bar`, which
themselves depend on each other.
This confuses SQLite which isn’t able to automatically solve this
diamond dependency scheme.

Help it by adding a trigger to delete all the references between the
relevant realisations.

Fix #5320
2022-04-21 10:06:39 +02:00
Sergei Trofimovich edfc5b2f12 ca-specific-schema.sql: add index on RealisationsRefs(referrer) and (outputPath)
For a typical desktop system (~2K packages) we can easily get 100K
entries in RealisationsRefs. Without indices query for RealisationsRefs
requires linear scan.

RealisationsRefs(referrer)
--------------------------

Inefficiency is seen as a 100% CPU load of nix-daemon for the following
scenario:

    $ nix edit -f . bash # add unused environment variable, like FOO="1"
    # populate RealisationsRefs, build fresh system
    $ nix build -f nixos system --arg config '{ contentAddressedByDefault = true; }'
    $ nix edit -f . bash # add unused environment variable, like FOO="2"
    $ time nix build -f nixos system --arg config '{ contentAddressedByDefault = true; }'

In this case `bash `will be rebuilt a few times and then rest of CPU
time is spent on scanning RealisationsRefs table (about 5 CPU-minutes
on my machine).

Before the change:

    $ time nix build -f nixos system ... # step 4 above
    real    34m3,613s
    user    0m5,232s
    sys     0m0,758s

Of all this time about 29.5 minutes are taken by nix-daemon's CPU time.

After the change:

    $ time nix build -f nixos system ... # step 4 above
    real    4m50,061s
    user    0m5,038s
    sys     0m0,677s

Of all this time about 1 minute is taken by nix-daemon's CPU time.
Most of the time is spent polling for non-existent realisations on
cache-nixos.org.

Realisations(outputPath)
------------------------

After running CA system for two weeks I got ~1M entries in Realisations
table. `nix-collect-garbage` became very slow (seemingly 100 path deletions
per second). It happens due to a slow cascading delete from Realisations
triggered by deletion from ValidPaths.

The fix is to add an index on primary key from ValidPaths(id) that
triggers cascading deletions.

Before the change:
    $ time nix-collect-garbage -d --max-freed 100G
    <interrupted before finish, took too long>
    real    23m32.411s
    user    17m49.679s
    sys     4m50.609s

Most of time was spent in re-scanning Realisations table on each path deletion.

After the change:
    $ time nix-collect-garbage -d --max-freed 100G

    real    8m43.226s
    user    6m16.317s
    sys     1m40.188s

Time is spent scanning sqlite indices and in kernel when unlinking directories.
2021-11-10 08:32:05 +00:00
regnat eca6ff06d6 Store the realisation deps on the local store 2021-05-26 16:59:09 +02:00
regnat 826877cabf Add some logic for signing realisations
Not exposed anywhere, but built realisations are now signed (and this
should be forwarded when copy-ing them around)
2021-03-15 16:34:49 +01:00
regnat 3ac9d74eb1 Rework the db schema for derivation outputs
Add a new table for tracking the derivation output mappings.

We used to hijack the `DerivationOutputs` table for that, but (despite its
name), it isn't a really good fit:

- Its entries depend on the drv being a valid path, making it play badly with
  garbage collection and preventing us to copy a drv output without copying
  the whole drv closure too;
- It dosen't guaranty that the output path exists;

By using a different table, we can experiment with a different schema better
suited for tracking the output mappings of CA derivations.
(incidentally, this also fixes #4138)
2020-12-11 20:41:32 +01:00