darwin: workaround PROC_PIDLISTFDS on processes with no fds
This has been causing various seemingly spurious CI failures as well as
some failures on people running tests on beta builds.
lix> ++(nix-collect-garbage-dry-run.sh:20) nix-store --gc --print-dead
lix> ++(nix-collect-garbage-dry-run.sh:20) wc -l
lix> finding garbage collector roots...
lix> error: Listing pid 87261 file descriptors: Undefined error: 0
There is no real way to write a proper test for this, other than to
start a process like the following:
int main(void) {
for (int i = 0; i < 1000; ++i) {
close(i);
}
sleep(10000);
}
and then let Lix's gc look at it.
I have a relatively high confidence this *will* fix the problem since I
have manually confirmed the behaviour of the libproc call is
as-unexpected, and it would perfectly explain the observed symptom.
Fixes: #446
Change-Id: I67669b98377af17895644b3bafdf42fc33abd076
This commit is contained in:
parent
529eed74c4
commit
1437d3df15
15
doc/manual/rl-next/haunted-gc-macos.md
Normal file
15
doc/manual/rl-next/haunted-gc-macos.md
Normal file
|
@ -0,0 +1,15 @@
|
||||||
|
---
|
||||||
|
synopsis: "Fix unexpectedly-successful GC failures on macOS"
|
||||||
|
cls: 1723
|
||||||
|
issues: fj#446
|
||||||
|
credits: jade
|
||||||
|
category: Fixes
|
||||||
|
---
|
||||||
|
|
||||||
|
Has the following happened to you on macOS? This failure has been successfully eliminated, thanks to our successful deployment of advanced successful-failure detection technology (it's just `if (failed && errno == 0)`. Patent pending<sup>not really</sup>):
|
||||||
|
|
||||||
|
```
|
||||||
|
$ nix-store --gc --print-dead
|
||||||
|
finding garbage collector roots...
|
||||||
|
error: Listing pid 87261 file descriptors: Undefined error: 0
|
||||||
|
```
|
|
@ -56,12 +56,27 @@ void DarwinLocalStore::findPlatformRoots(UncheckedRoots & unchecked)
|
||||||
while (fdBufSize > fds.size() * sizeof(struct proc_fdinfo)) {
|
while (fdBufSize > fds.size() * sizeof(struct proc_fdinfo)) {
|
||||||
// Reserve some extra size so we don't fail too much
|
// Reserve some extra size so we don't fail too much
|
||||||
fds.resize((fdBufSize + fdBufSize / 8) / sizeof(struct proc_fdinfo));
|
fds.resize((fdBufSize + fdBufSize / 8) / sizeof(struct proc_fdinfo));
|
||||||
|
errno = 0;
|
||||||
fdBufSize = proc_pidinfo(
|
fdBufSize = proc_pidinfo(
|
||||||
pid, PROC_PIDLISTFDS, 0, fds.data(), fds.size() * sizeof(struct proc_fdinfo)
|
pid, PROC_PIDLISTFDS, 0, fds.data(), fds.size() * sizeof(struct proc_fdinfo)
|
||||||
);
|
);
|
||||||
|
|
||||||
|
// errno == 0???! Yes, seriously. This is because macOS has a
|
||||||
|
// broken syscall wrapper for proc_pidinfo that has no way of
|
||||||
|
// dealing with the system call successfully returning 0. It
|
||||||
|
// takes the -1 error result from the errno-setting syscall
|
||||||
|
// wrapper and turns it into a 0 result. But what if the system
|
||||||
|
// call actually returns 0? Then you get an errno of success.
|
||||||
|
//
|
||||||
|
// https://github.com/apple-opensource/xnu/blob/4f43d4276fc6a87f2461a3ab18287e4a2e5a1cc0/libsyscall/wrappers/libproc/libproc.c#L100-L110
|
||||||
|
// https://git.lix.systems/lix-project/lix/issues/446#issuecomment-5483
|
||||||
|
// FB14695751
|
||||||
if (fdBufSize <= 0) {
|
if (fdBufSize <= 0) {
|
||||||
throw SysError("Listing pid %1% file descriptors", pid);
|
if (errno == 0) {
|
||||||
|
break;
|
||||||
|
} else {
|
||||||
|
throw SysError("Listing pid %1% file descriptors", pid);
|
||||||
|
}
|
||||||
}
|
}
|
||||||
}
|
}
|
||||||
fds.resize(fdBufSize / sizeof(struct proc_fdinfo));
|
fds.resize(fdBufSize / sizeof(struct proc_fdinfo));
|
||||||
|
|
Loading…
Reference in a new issue