nix collect-garbage is kind of slow at deleting paths #418

Open
opened 2024-06-24 23:50:57 +00:00 by jade · 4 comments
Owner

I bet it's entirely single threaded. We should look into that.

I bet it's entirely single threaded. We should look into that.
jade added the
performance
label 2024-06-24 23:50:57 +00:00
Author
Owner

Redesign idea that would avoid a lot of rearchitecture (since the current gc preserves invariants like that paths always have their dependencies): run the GC in two parts: one thread, which is exactly identical to the current GC except it replaces the recursive-delete operation with another which moves paths into a garbage bin, say, /nix/store/.garbage, and adds it to a deleter queue. Then the deleter queue is consumed by a thread pool that does not care about ordering.

The deleter queue is also initially populated with the current contents of the .garbage directory in case it is interrupted.

By doing it this way, we get:

  • atomicity of deleting paths (i.e. i suspect that currently you can corrupt a store path by half-deleting it and then interrupting that)
  • wayyyy faster GC since we can dispatch deletion to a thread pool
Redesign idea that would avoid a lot of rearchitecture (since the current gc preserves invariants like that paths always have their dependencies): run the GC in two parts: one thread, which is exactly identical to the current GC except it replaces the recursive-delete operation with another which moves paths into a garbage bin, say, `/nix/store/.garbage`, and adds it to a deleter queue. Then the deleter queue is consumed by a thread pool that does not care about ordering. The deleter queue is also initially populated with the current contents of the `.garbage` directory in case it is interrupted. By doing it this way, we get: - atomicity of deleting paths (i.e. i suspect that currently you can corrupt a store path by half-deleting it and then interrupting that) - wayyyy faster GC since we can dispatch deletion to a thread pool
Owner

Bit worried about the effect this will have on IOPs used, since iirc nix-collect-garbage can already consume a problematic amount on it's own and it's not necessarily easily tuned by end users.

I suppose adding a config parameter for maximum paths removed in parallel may be enough, but it's worth some consideration first.

Bit worried about the effect this will have on IOPs used, since iirc `nix-collect-garbage` can already consume a problematic amount on it's own and it's not necessarily easily tuned by end users. I suppose adding a config parameter for maximum paths removed in parallel may be enough, but it's worth some consideration first.
Owner

This shouldn't have a major effect in that regard — it's not parallelizing, just not blocking deletion by store paths being live from not having been deleted yet, and renames are cheap. But it's worth taking a look at before and after, yeah.

This *shouldn't* have a major effect in that regard — it's not parallelizing, just not blocking deletion by store paths being live from not having been deleted yet, and renames are cheap. But it's worth taking a look at before and after, yeah.
Author
Owner

For anyone looking at this in the future, I have wip code for this here: https://git.lix.systems/jade/lix/src/branch/jade/fast-gc

For anyone looking at this in the future, I have wip code for this here: https://git.lix.systems/jade/lix/src/branch/jade/fast-gc
Sign in to join this conversation.
No milestone
No project
No assignees
3 participants
Notifications
Due date
The due date is invalid or out of range. Please use the format "yyyy-mm-dd".

No due date set.

Dependencies

No dependencies set.

Reference: lix-project/lix#418
No description provided.