System V IPC objects not cleaned up with Darwin Sandbox #691

Open
opened 2025-02-22 11:35:53 +00:00 by wolfgangwalther · 6 comments

Describe the bug

The darwin sandbox does not clean up System V IPC objects.

Steps To Reproduce

On a darwin system:

  1. Run:
nix-build --no-link -E 'with import <nixpkgs> {};
stdenv.mkDerivation {
  name = "ipc";
  dontUnpack = true;
  doCheck = true;
  nativeCheckInputs = [ postgresqlTestHook postgresql ];
  checkPhase = "runHook preCheck; sleep 1000";
}'`
  1. Cancel the sleeping build.
  2. Run ipcs -ma and see the left-over shared memory segment belonging to a build user.

Expected behavior

ipcs -ma should not return any left-over shared memory segments.

nix --version output

I only have access to the nix-community darwin builder, so can't test with "lix on darwin". I am pretty sure lix should be affected, too, though.

Additional context

More details about my analysis:

The Linux sandbox mentions IPC cleanup explicitly:

  • - The IPC namespace prevents the builder from communicating
    with outside processes using SysV IPC mechanisms (shared
    memory, message queues, semaphores). It also ensures
    that all IPC objects are destroyed when the builder
    exits.

Various tickets which are all caused by this:

Reported for CppNix in https://github.com/NixOS/nix/issues/12548. I am not sure about the process here, whether lix becomes aware of bugs raised in CppNix or whether raising them here as well is appropriate.

## Describe the bug The darwin sandbox does not clean up System V IPC objects. ## Steps To Reproduce On a darwin system: 1. Run: ``` nix-build --no-link -E 'with import <nixpkgs> {}; stdenv.mkDerivation { name = "ipc"; dontUnpack = true; doCheck = true; nativeCheckInputs = [ postgresqlTestHook postgresql ]; checkPhase = "runHook preCheck; sleep 1000"; }'` ``` 2. Cancel the sleeping build. 3. Run `ipcs -ma` and see the left-over shared memory segment belonging to a build user. ## Expected behavior `ipcs -ma` should not return any left-over shared memory segments. ## `nix --version` output I only have access to the nix-community darwin builder, so can't test with "lix on darwin". I am pretty sure lix should be affected, too, though. ## Additional context More details about my analysis: - https://github.com/NixOS/nixpkgs/issues/371242#issuecomment-2672697582 The Linux sandbox mentions IPC cleanup explicitly: - https://git.lix.systems/lix-project/lix/src/commit/148f4eefe9136e722752c0f73633ea271f271278/lix/libstore/platform/linux.cc#L870-L874 Various tickets which are all caused by this: - https://github.com/NixOS/nixpkgs/issues/371242 - https://github.com/NixOS/nixpkgs/pull/371463#issuecomment-2575975419 - https://github.com/NixOS/nixpkgs/issues/198495 - https://discourse.nixos.org/t/nixbld-leaving-around-shared-memory-segments/30043 - and some more cases where PostgreSQL-related tests are disabled for Darwin Reported for CppNix in https://github.com/NixOS/nix/issues/12548. I am not sure about the process here, whether lix becomes aware of bugs raised in CppNix or whether raising them here as well is appropriate.

Just realized that.. to be able to hit the exact bug here, you'd probably need to backport https://github.com/NixOS/nix/pull/10878 first, which apparently hasn't happened, yet - but should be easy to do?

Just realized that.. to be able to hit the exact bug here, you'd probably need to backport https://github.com/NixOS/nix/pull/10878 first, which apparently hasn't happened, yet - but should be easy to do?
Owner

that sandbox change got (relatively soft-) rejected here because it's an effectively deprecated feature on macOS that allows random communication between derivations. you might be able to find it, someone filed a bug requesting said port.

thank you for raising this bug. do we have any knowledge of how we could actually clean up those ipc objects left by dead processes in practice? like, which APIs exist for this? this is one of the most neglected parts of macOS so I'm not optimistic of it being nice to fix.

that sandbox change got (relatively soft-) rejected here because it's an effectively deprecated feature on macOS that allows random communication between derivations. you might be able to find it, someone filed a bug requesting said port. thank you for raising this bug. do we have any knowledge of how we could actually clean up those ipc objects left by dead processes in practice? like, which APIs exist for this? this is one of the most neglected parts of macOS so I'm not optimistic of it being nice to fix.
Owner

cc @lilyball about macos

cc @lilyball about macos

do we have any knowledge of how we could actually clean up those ipc objects left by dead processes in practice? like, which APIs exist for this?

I can list them with ipcs and remove them with ipcrm ;). I don't really know about the APIs, just want to point out that there could also be another angle at this, discussed in https://github.com/NixOS/nixpkgs/issues/371242#issuecomment-2676179689.

TLDR: Why does PostgreSQL running inside the sandbox doesn't get a chance to clean up itself? It seems to be killed immediately, but maybe there could be a way to stop those processes a bit "nicer", giving them time to cleanup?

Of course, doing this at the sandbox level, would be much better, because it would also stop badly-behaving programs.

Edit: "Time for cleanup" kind of rings a bell with #678... where the shutdown is also not entirely clean.

> do we have any knowledge of how we could actually clean up those ipc objects left by dead processes in practice? like, which APIs exist for this? I can list them with `ipcs` and remove them with `ipcrm` ;). I don't really know about the APIs, just want to point out that there could also be another angle at this, discussed in https://github.com/NixOS/nixpkgs/issues/371242#issuecomment-2676179689. TLDR: Why does PostgreSQL running inside the sandbox doesn't get a chance to clean up itself? It seems to be killed immediately, but maybe there could be a way to stop those processes a bit "nicer", giving them time to cleanup? Of course, doing this at the sandbox level, would be much better, because it would also stop badly-behaving programs. Edit: "Time for cleanup" kind of rings a bell with https://git.lix.systems/lix-project/lix/issues/678... where the shutdown is also not entirely clean.

FTR: Just learned about systemd's RemoveIPC setting, which does the same: https://www.freedesktop.org/software/systemd/man/latest/systemd.exec.html#RemoveIPC=

Just mentioning this, because it shows that others need to deal with the same thing.

FTR: Just learned about systemd's `RemoveIPC` setting, which does the same: https://www.freedesktop.org/software/systemd/man/latest/systemd.exec.html#RemoveIPC= Just mentioning this, because it shows that others need to deal with the same thing.
Owner

Hmmm. That's very interesting if only because maybe the code might be portable. There are a bunch of edge cases (how do you make sure only your own ipc objects get removed, etc) that seem hard if we're going to be implementing something from scratch so that's very helpful.

(not saying this is a priority for me personally though; i don't expect this issue to be fixed in the next couple of months without some help from someone outside the lix team)

Hmmm. That's very interesting if only because maybe the code might be portable. There are a bunch of edge cases (how do you make sure only your own ipc objects get removed, etc) that seem hard if we're going to be implementing something from scratch so that's very helpful. (not saying this is a priority for me personally though; i don't expect this issue to be fixed in the next couple of months without some help from someone outside the lix team)
Sign in to join this conversation.
No milestone
No project
No assignees
2 participants
Notifications
Due date
The due date is invalid or out of range. Please use the format "yyyy-mm-dd".

No due date set.

Dependencies

No dependencies set.

Reference: lix-project/lix#691
No description provided.