fix(local-store): invalidate phantom referrers at garbage collection time

Sometimes, a path can disappear from the `ValidPaths` table (I have 23
such cases on my 1.4TB Nix store).

When this occurs and you try to run a garbage collection,
`queryReferrers` will report no referrer because it's performing a
*RIGHT JOIN* between `Refs` and `ValidPaths`, finally, when you issue
the deletion SQL statement, this will throw an uncaught exception from
SQLite side regarding a foreign key violation because `reference` in
`Refs` is a foreign key to `ValidPaths` (which we are trying to delete).

Why can this happen?

Two reasons:

* `PRAGMA foreign_keys=off;` will disable deletion on cascade.
* Trigger recursion *limits*, a deletion on cascade is a *trigger*, when
  a delete is issued and it triggers a bunch of deletion on cascade,
  there's a documented limit by SQLite: https://www.sqlite.org/limits.html#max_trigger_depth

  > Recursion limit on foreign key actions. The SQLITE_MAX_TRIGGER_DEPTH
  > and SQLITE_LIMIT_TRIGGER_DEPTH settings determine the maximum
  > allowable depth of trigger program recursion. For the purposes of
  > these limits, foreign key actions are considered trigger programs. The
  > PRAGMA recursive_triggers setting does not affect the operation of
  > foreign key actions. It is not possible to disable recursive foreign
  > key actions.

As I do not see easy ways to solve the root cause, garbage collection
should be self-healing in that regards, so I propose to invalidate
phantom referrers as we go.

As part of a work improving the consistency of the SQLite database, it
would make sense to count how many times this happen and try to find
ways to reproduce this issue.

Change-Id: I055a8a1d8c0e44d4388a411abe8e5a5e385f7b55
Signed-off-by: Raito Bezarius <raito@lix.systems>
This commit is contained in:
raito 2024-09-14 19:50:08 +02:00
parent b2fc007811
commit 2a7a6cb85a
5 changed files with 112 additions and 0 deletions

View file

@ -62,6 +62,8 @@ struct LocalStore::State::Stmts {
SQLiteStmt QueryReferences;
SQLiteStmt QueryReferrers;
SQLiteStmt InvalidatePath;
SQLiteStmt InvalidatePhantomReferrers;
SQLiteStmt QueryPhantomReferrers;
SQLiteStmt AddDerivationOutput;
SQLiteStmt RegisterRealisedOutput;
SQLiteStmt UpdateRealisedOutput;
@ -384,6 +386,10 @@ LocalStore::LocalStore(const Params & params)
"select path from Refs join ValidPaths on referrer = id where reference = (select id from ValidPaths where path = ?);");
state->stmts->InvalidatePath.create(state->db,
"delete from ValidPaths where path = ?;");
state->stmts->InvalidatePhantomReferrers.create(state->db,
"delete from Refs where referrer IN (select referrer from Refs left join ValidPaths on referrer = id where reference = (select id from ValidPaths where path = ?));");
state->stmts->QueryPhantomReferrers.create(state->db,
"select referrer from Refs left join ValidPaths on referrer = id where reference = (select id from ValidPaths where path = ?);");
state->stmts->AddDerivationOutput.create(state->db,
"insert or replace into DerivationOutputs (drv, id, path) values (?, ?, ?);");
state->stmts->QueryValidDerivers.create(state->db,
@ -1522,6 +1528,18 @@ void LocalStore::invalidatePathChecked(const StorePath & path)
if (!referrers.empty())
throw PathInUse("cannot delete path '%s' because it is in use by %s",
printStorePath(path), showPaths(referrers));
// Note: `queryReferrers` will only return *valid* referrers.
// i.e. referrer for which there is a *ValidPath* row in the SQLite database.
// In the unfortunate situation where a valid path is removed but its corresponding `Refs` are not removed (*), we better just invalidate all these phantom referrers,
// otherwise we will create a foreign key violation when we actually try to invalidate paths.
//
// (*) : yes, there's a "ON DELETE CASCADE" on the referrer foreign key.
// Unfortunately, in practice, it doesn't ensure integrity over large SQLite databases.
if (hasPhantomReferrers(*state, path)) {
warn("'%s' has phantom referrers (disappeared referrers from the valid path table)", printStorePath(path));
invalidatePhantomReferrers(*state, path);
}
invalidatePath(*state, path);
}
@ -1529,6 +1547,24 @@ void LocalStore::invalidatePathChecked(const StorePath & path)
});
}
bool LocalStore::hasPhantomReferrers(State & state, const StorePath & path)
{
return retrySQLite<bool>([&]() -> bool {
debug("checking for phantom referrers for '%s'", printStorePath(path));
auto useQueryPhantomReferrers(state.stmts->QueryPhantomReferrers.use()(printStorePath(path)));
return useQueryPhantomReferrers.next();
});
}
void LocalStore::invalidatePhantomReferrers(State & state, const StorePath & path)
{
retrySQLite<void>([&]() {
debug("invalidating phantom referrers to '%s'", printStorePath(path));
state.stmts->InvalidatePhantomReferrers.use()(printStorePath(path)).exec();
});
}
bool LocalStore::verifyStore(bool checkContents, RepairFlag repair)
{

View file

@ -322,6 +322,14 @@ private:
* Delete a path from the Nix store.
*/
void invalidatePathChecked(const StorePath & path);
/**
* Check if there's phantom referrers for a certain path in the Nix SQLite database
*/
bool hasPhantomReferrers(State & state, const StorePath & path);
/**
* Invalidate all phantom referrers from the Nix SQLite database.
*/
void invalidatePhantomReferrers(State & state, const StorePath & path);
void verifyPath(const StorePath & path, const StorePathSet & store,
StorePathSet & done, StorePathSet & validPaths, RepairFlag repair, bool & errors);

View file

@ -16,6 +16,7 @@ fi
export NIX_LOCALSTATE_DIR=$TEST_ROOT/var
export NIX_LOG_DIR=$TEST_ROOT/var/log/nix
export NIX_STATE_DIR=$TEST_ROOT/var/nix
export NIX_SQLITE_DATABASE=$NIX_STATE_DIR/db/db.sqlite
export NIX_CONF_DIR=$TEST_ROOT/etc
export NIX_DAEMON_SOCKET_PATH=$TEST_ROOT/dSocket
unset NIX_USER_CONF_FILES
@ -164,6 +165,10 @@ requireDaemonNewerThan () {
isDaemonNewer "$1" || skipTest "Daemon is too old"
}
requireSqliteDatabase() {
[[ -f "$NIX_SQLITE_DATABASE" ]] || skipTest "SQLite database is not used for this store implementation"
}
canUseSandbox() {
[[ ${_canUseSandbox-} ]]
}

View file

@ -76,6 +76,7 @@ functional_tests_scripts = [
'flakes/flake-registry.sh',
'flakes/subdir-flake.sh',
'gc.sh',
'phantom-referrers-gc.sh',
'nix-collect-garbage-d.sh',
'nix-collect-garbage-dry-run.sh',
'remote-store.sh',

View file

@ -0,0 +1,62 @@
source common.sh
startDaemon
requireDaemonNewerThan "2.92.0"
requireSqliteDatabase
clearStore
depOutPath=$(nix-build --no-out-link -E '
with import ./config.nix;
mkDerivation {
name = "phantom";
outputs = [ "out" ];
buildCommand = "
echo i will become a phantom soon > $out
";
}
')
finalOutPath=$(nix-build --no-out-link -E '
with import ./config.nix;
let dep = mkDerivation {
name = "phantom";
outputs = [ "out" ];
buildCommand = "
echo i will become a phantom soon > $out
";
}; in
mkDerivation {
name = "phantom-gc";
outputs = [ "out" ];
buildCommand = "
echo UNUSED: ${dep} > $out
";
}
')
echo "displaying all valid paths"
sqlite3 "$NIX_SQLITE_DATABASE" <<EOF
select * from validpaths;
EOF
echo "displaying the relevant IDs..."
sqlite3 "$NIX_SQLITE_DATABASE" <<EOF
select r.referrer, r.reference from Refs r join ValidPaths vp on r.referrer = vp.id where path = '$finalOutPath';
EOF
echo "corrupting the SQLite database manually..."
sqlite3 "$NIX_SQLITE_DATABASE" <<EOF
pragma foreign_keys = off;
delete from ValidPaths where path = '$finalOutPath';
select * from Refs;
EOF
restartDaemon
# expect this to work and maybe warn about phantom referrers
expectStderr 0 nix-collect-garbage -vvvv | grepQuiet 'phantom referrers'