ca-specific-schema.sql: add index on RealisationsRefs(referrer) and (outputPath)

For a typical desktop system (~2K packages) we can easily get 100K
entries in RealisationsRefs. Without indices query for RealisationsRefs
requires linear scan.

RealisationsRefs(referrer)
--------------------------

Inefficiency is seen as a 100% CPU load of nix-daemon for the following
scenario:

    $ nix edit -f . bash # add unused environment variable, like FOO="1"
    # populate RealisationsRefs, build fresh system
    $ nix build -f nixos system --arg config '{ contentAddressedByDefault = true; }'
    $ nix edit -f . bash # add unused environment variable, like FOO="2"
    $ time nix build -f nixos system --arg config '{ contentAddressedByDefault = true; }'

In this case `bash `will be rebuilt a few times and then rest of CPU
time is spent on scanning RealisationsRefs table (about 5 CPU-minutes
on my machine).

Before the change:

    $ time nix build -f nixos system ... # step 4 above
    real    34m3,613s
    user    0m5,232s
    sys     0m0,758s

Of all this time about 29.5 minutes are taken by nix-daemon's CPU time.

After the change:

    $ time nix build -f nixos system ... # step 4 above
    real    4m50,061s
    user    0m5,038s
    sys     0m0,677s

Of all this time about 1 minute is taken by nix-daemon's CPU time.
Most of the time is spent polling for non-existent realisations on
cache-nixos.org.

Realisations(outputPath)
------------------------

After running CA system for two weeks I got ~1M entries in Realisations
table. `nix-collect-garbage` became very slow (seemingly 100 path deletions
per second). It happens due to a slow cascading delete from Realisations
triggered by deletion from ValidPaths.

The fix is to add an index on primary key from ValidPaths(id) that
triggers cascading deletions.

Before the change:
    $ time nix-collect-garbage -d --max-freed 100G
    <interrupted before finish, took too long>
    real    23m32.411s
    user    17m49.679s
    sys     4m50.609s

Most of time was spent in re-scanning Realisations table on each path deletion.

After the change:
    $ time nix-collect-garbage -d --max-freed 100G

    real    8m43.226s
    user    6m16.317s
    sys     1m40.188s

Time is spent scanning sqlite indices and in kernel when unlinking directories.
This commit is contained in:
Sergei Trofimovich 2021-10-09 18:47:05 +01:00
parent 67179472df
commit edfc5b2f12
2 changed files with 17 additions and 1 deletions

View file

@ -19,3 +19,8 @@ create table if not exists RealisationsRefs (
foreign key (referrer) references Realisations(id) on delete cascade,
foreign key (realisationReference) references Realisations(id) on delete restrict
);
-- used by QueryRealisationReferences
create index if not exists IndexRealisationsRefs on RealisationsRefs(referrer);
-- used by cascade deletion when ValidPaths is deleted
create index if not exists IndexRealisationsRefsOnOutputPath on Realisations(outputPath);

View file

@ -79,7 +79,7 @@ int getSchema(Path schemaPath)
void migrateCASchema(SQLite& db, Path schemaPath, AutoCloseFD& lockFd)
{
const int nixCASchemaVersion = 2;
const int nixCASchemaVersion = 3;
int curCASchema = getSchema(schemaPath);
if (curCASchema != nixCASchemaVersion) {
if (curCASchema > nixCASchemaVersion) {
@ -130,6 +130,17 @@ void migrateCASchema(SQLite& db, Path schemaPath, AutoCloseFD& lockFd)
txn.commit();
}
if (curCASchema < 3) {
SQLiteTxn txn(db);
// Apply new indices added in this schema update.
db.exec(R"(
-- used by QueryRealisationReferences
create index if not exists IndexRealisationsRefs on RealisationsRefs(referrer);
-- used by cascade deletion when ValidPaths is deleted
create index if not exists IndexRealisationsRefsOnOutputPath on Realisations(outputPath);
)");
txn.commit();
}
writeFile(schemaPath, fmt("%d", nixCASchemaVersion));
lockFile(lockFd.get(), ltRead, true);
}