nix-store --delete scans full .links/ directory on every invocation #758

Open
opened 2025-03-24 15:13:07 +00:00 by profpatsch · 6 comments

Describe the bug

When running nix-store --delete even on a single store path, it will always scan the full .links directory for any hardlinks it can remove (because only one storepath remains that references the hardlink).

Lines 897 to 941 in f0fd789
/* Unlink all files in /nix/store/.links that have a link count of 1,
which indicates that there are no other links and so they can be
safely deleted. FIXME: race condition with optimisePath(): we
might see a link count of 1 just before optimisePath() increases
the link count. */
if (options.action == GCOptions::gcDeleteDead || deleteSpecific) {
printInfo("deleting unused links...");
AutoCloseDir dir(opendir(linksDir.c_str()));
if (!dir) throw SysError("opening directory '%1%'", linksDir);
int64_t actualSize = 0, unsharedSize = 0;
struct dirent * dirent;
while (errno = 0, dirent = readdir(dir.get())) {
checkInterrupt();
std::string name = dirent->d_name;
if (name == "." || name == "..") continue;
Path path = linksDir + "/" + name;
auto st = lstat(path);
if (st.st_nlink != 1) {
actualSize += st.st_size;
unsharedSize += (st.st_nlink - 1) * st.st_size;
continue;
}
printMsg(lvlTalkative, "deleting unused link '%1%'", path);
if (unlink(path.c_str()) == -1)
throw SysError("deleting '%1%'", path);
/* Do not accound for deleted file here. Rely on deletePath()
accounting. */
}
struct stat st;
if (stat(linksDir.c_str(), &st) == -1)
throw SysError("statting '%1%'", linksDir);
int64_t overhead = st.st_blocks * 512ULL;
printInfo("note: currently hard linking saves %.2f MiB",
((unsharedSize - actualSize - overhead) / (1024.0 * 1024.0)));
}

Steps To Reproduce

  1. create a storepath with nix-store --add
  2. remove the storepath with nix-store --delete and check the strace. It will take linear time to the number of files in .links, about 1.5 million files on my developer machine.

Expected behavior

The --delete operation should take more-or-less constant time.

nix --version output

nix --version
nix (Lix, like Nix) 2.91.1

Additional context

This is kind of in-between a bug report and a request to improve the optimization feature.

My first idea was “some kind of bloom filter”, but really, we know which store paths we are deleting, the links directory is hashed over the inodes, shouldn’t we just be able to check exactly the files we are deleting? So the operation shouldn’t ever have to take longer than O(n) where n is the amount of files in the storepaths to delete.

## Describe the bug When running `nix-store --delete` even on a single store path, it will always scan the full `.links` directory for any hardlinks it can remove (because only one storepath remains that references the hardlink). https://git.lix.systems/lix-project/lix/src/commit/f0fd789f68a6b91d5052c582bd9fbc75cfcb5e54/lix/libstore/gc.cc#L897-L941 ## Steps To Reproduce 1. create a storepath with `nix-store --add` 2. remove the storepath with `nix-store --delete` and check the strace. It will take linear time to the number of files in `.links`, about 1.5 million files on my developer machine. ## Expected behavior The `--delete` operation should take more-or-less constant time. ## `nix --version` output ``` nix --version nix (Lix, like Nix) 2.91.1 ``` ## Additional context This is kind of in-between a bug report and a request to improve the optimization feature. My first idea was “some kind of bloom filter”, but really, we know which store paths we are deleting, the `links` directory is hashed over the inodes, shouldn’t we just be able to check exactly the files we are deleting? So the operation shouldn’t ever have to take longer than `O(n)` where n is the amount of files in the storepaths to delete.
Owner

There's still the small challenge of reference scanning the entire system though; you need to know it's not alive in any process environment or whatever. But yea this sounds suboptimal.

There's still the small challenge of reference scanning the entire system though; you need to know it's not alive in any process environment or whatever. But yea this sounds suboptimal.
Author

In case this is not optimal, it could be argued that a full scan should only be necessary for operations like nix-store --gc, which will guarantee optimal cleanup of the links, and for everything else it can be a best-effort operation.

In case this is not optimal, it could be argued that a full scan should only be necessary for operations like `nix-store --gc`, which will guarantee optimal cleanup of the links, and for everything else it can be a best-effort operation.
Owner

shouldn’t we just be able to check exactly the files we are deleting?

that requires hashing every single file we're about to delete, and that can be much more expensive than stating all .links entries. hardlink-based deduplication of store objects is unsound, if your goal is to have quick single-path deletes you're better off disabling automatic store optimization and running something like duperemove to deduplicate data properly. (we have plans to eventually add reflink-based deduplication to lix, but it's very far down the list)

> shouldn’t we just be able to check exactly the files we are deleting? that requires hashing every single file we're about to delete, and that can be much more expensive than stating all .links entries. hardlink-based deduplication of store objects is unsound, if your goal is to have quick single-path deletes you're better off disabling automatic store optimization and running something like duperemove to deduplicate data *properly*. (we have plans to eventually add reflink-based deduplication to lix, but it's very far down the list)
Author

Ah, I didn’t read far enough into the code to see that it’s actually hashing the files.

fwiw, even just listing .links/ on my machine with ls -f >/dev/null takes about 1 second, and nix takes about 5 seconds to do all the stats. And I’m pretty sure it’s not a very involved setup, my store is maybe 200GB.

I see the problem that fully hashing arbitrary store paths can be pretty expensive in the worst case.

Ah, I didn’t read far enough into the code to see that it’s actually hashing the files. fwiw, even just listing `.links/` on my machine with `ls -f >/dev/null` takes about 1 second, and nix takes about 5 seconds to do all the stats. And I’m pretty sure it’s not a very involved setup, my store is maybe 200GB. I see the problem that fully hashing arbitrary store paths can be pretty expensive in the worst case.
Author

Using a counting bloom filter could be a solution, but as you said it’s probably not worth the conceptual complexity this would introduce.

Edit: really not sure about this, don’t quote me :P

Using a [counting bloom filter](https://en.wikipedia.org/wiki/Counting_Bloom_filter) *could* be a solution, but as you said it’s probably not worth the conceptual complexity this would introduce. Edit: really not sure about this, don’t quote me :P
Owner

we're not touching this part of the code to fix something that's already unsound, if anything we'll be moving to a reflink-based system that has none of the soundness problems and doesn't need a hardlink directory either. also not that bloom filters are the wrong tool for this job since we still need to know the hash of the link we're trying to delete

we're not touching this part of the code to fix something that's already unsound, if anything we'll be moving to a reflink-based system that has none of the soundness problems and doesn't need a hardlink directory either. also not that bloom filters are the wrong tool for this job since we still need to know the hash of the link we're trying to delete
Sign in to join this conversation.
No milestone
No project
No assignees
3 participants
Notifications
Due date
The due date is invalid or out of range. Please use the format "yyyy-mm-dd".

No due date set.

Dependencies

No dependencies set.

Reference: lix-project/lix#758
No description provided.