KillMode=process may leave subdaemons which hold locks to paths a user want to build later #563

Open
opened 2024-10-30 18:36:28 +00:00 by raito · 0 comments
Owner

Describe the bug

Consider a bug in Lix that would cause a subdaemon to deadlock somewhere, e.g. while downloading stuff, building paths, whatever.
Assume that subdaemon locks path A via flock.

Build a new Lix, install it on your system, restart nix-daemon. Observe that subdaemon didn't go away.

Retry what you wanted to have, i.e. rebuild path A, same bug but the new subdaemon is hanging on the flock call because the previous subdaemon still hold the path A flock.

Steps To Reproduce

  1. Introduce a bug that will deadlock your daemon during a build of path A.
  2. Install a new Lix, restart a new nix-daemon with it (KillMode=process will leave the subdaemons around)
  3. Rebuild path A.

How to recognize this?

  1. Find the PID of your Nix client which is deadlocked.
  2. Find the PID of your Nix subdaemon counterpart (nix-daemon $CLIENT_PID in the process list)
  3. sudo strace -fp $DAEMON_PID
  4. If you observe flock syscalls, you may be running in an instance of this issue.

Expected behavior

Rebuilding path A should work.

Additional context

Fixing this would probably require moving to KillMode=mixed and removing progressively the existence of subdaemons à la inetd.

## Describe the bug Consider a bug in Lix that would cause a subdaemon to deadlock somewhere, e.g. while downloading stuff, building paths, whatever. Assume that subdaemon locks path A via flock. Build a new Lix, install it on your system, restart nix-daemon. Observe that subdaemon didn't go away. Retry what you wanted to have, i.e. rebuild path A, same bug but the new subdaemon is hanging _on_ the flock call because the previous subdaemon still hold the path A flock. ## Steps To Reproduce 1. Introduce a bug that will deadlock your daemon during a build of path A. 2. Install a new Lix, restart a new nix-daemon with it (`KillMode=process` will leave the subdaemons around) 3. Rebuild path A. ## How to recognize this? 1. Find the PID of your Nix client which is deadlocked. 2. Find the PID of your Nix subdaemon counterpart (`nix-daemon $CLIENT_PID` in the process list) 3. `sudo strace -fp $DAEMON_PID` 4. If you observe `flock` syscalls, you may be running in an instance of this issue. ## Expected behavior Rebuilding path A should work. ## Additional context Fixing this would probably require moving to `KillMode=mixed` and removing progressively the existence of subdaemons à la inetd.
raito added the
bug
ux
E/requires rearchitecture
E/hard
labels 2024-10-30 18:37:04 +00:00
Sign in to join this conversation.
No milestone
No project
No assignees
1 participant
Notifications
Due date
The due date is invalid or out of range. Please use the format "yyyy-mm-dd".

No due date set.

Dependencies

No dependencies set.

Reference: lix-project/lix#563
No description provided.