Improve error message when no tun module is available #1079
Labels
No labels
Affects/CppNix
Affects/Nightly
Affects/Only nightly
Affects/Stable
Area/build-packaging
Area/cli
Area/evaluator
Area/fetching
Area/flakes
Area/language
Area/lix ci
Area/nix-eval-jobs
Area/profiles
Area/protocol
Area/releng
Area/remote-builds
Area/repl
Area/repl/debugger
Area/store
awaiting
author
awaiting
contributors
bug
Context
contributors
Context
drive-by
Context
maintainers
Context
RFD
crash 💥
Cross Compilation
devx
docs
Downstream Dependents
E/easy
E/hard
E/help wanted
E/reproducible
E/requires rearchitecture
Feature/S3
imported
Language/Bash
Language/C++
Language/NixLang
Language/Python
Language/Rust
Needs Langver
OS/Linux
OS/macOS
performance
regression
release-blocker
stability
Status
blocked
Status
invalid
Status
postponed
Status
wontfix
testing
testing/flakey
Topic/Large Scale Installations
ux
No project
No assignees
3 participants
Notifications
Due date
No due date set.
Dependencies
No dependencies set.
Reference
lix-project/lix#1079
Loading…
Add table
Add a link
Reference in a new issue
No description provided.
Delete branch "%!s()"
Deleting a branch is permanent. Although the deleted branch may continue to exist for a short time before it actually gets removed, it CANNOT be undone in most cases. Continue?
Is your feature request related to a problem? Please describe.
When locking kernel modules (
security.lockKernelModules = true), the tun module can not be automatically loaded by Lix.Builds break (as expected), but the error message is not clear what the root cause is:
The journal helps a bit but it's still not fully clear:
This is even more confusing since the file exists. The actual solution (
boot.kernelModules = [ "tun" ];) can only be guessed from the word "tun".Describe the solution you'd like
A clear and concise error like
Unable to open the tun device - is the tun kernel module loaded?Describe alternatives you've considered
Live with it
Additional context
the messages here are unfortunately not directly under our control since pasta is doing the tun operations. we currently can't capture pasta output very well due to subprocess management restrictions, and if we could we'd have to string-match the error messages and live with that being brittle.
one thing we could do is unconditionally check whether
/dev/tunis accessible if network setup fails, but that too can have some weird error conditions. probably the best option is to somehow capture pasta output and report it as an addendum to "sandbox network setup timed out" and a dedicated check for the module.@dasj FYI, if the file hadn't existed, the error message would have been "Failed to open() /dev/net/tun: No such file or directory". "No such device" indicates that the "device special file" exists but the kernel driver that backs it hasn't been loaded.
@pennae When this happens, the
pastaprocess exits but the parentnixprocess doesn't notice.The
<defunct>marker is how ps indicates a zombie process in this mode. The build job cannot succeed at this point, but it hangs around until either a timeout expires, or I kill -9 process 39776 (thenixprocess that's a sibling of thepastazombie).A feasible UI improvement, therefore, should be to make the parent
nixprocess pay attention to exit notifications for all of its child processes. The errors would still be unhelpful but at least the build would fail quickly instead of hanging. I don't know enough about the guts of Lix to make a patch but I expect there is a place where something's callingwaitpid(specific_child, &status, 0)and that needs to be changed to a loop callingwaitpid(-1, &status, 0)until there aren't any more children to wait for.this is currently completely impossible to do without locking up the daemon or spawning one thread per child due to the absolute state of the world we have inherited. doing this is absolutely on the books for the future, but it is a rather invasive change. the world we inherited is a purely synchronous system that relies on guessing correctly to not lock up a lot of the time, we've been moving this to a fully asynchronous system based on capnproto/kj but haven't gotten to the point where we can replace the subprocess management system quite yet.
Recovery instructions, for anyone hitting this issue:
rootif you haven't already.systemd.unit=rescue.targetappended to the kernel command line. You'll be prompted for the root password.modprobe tun; you have interrupted the boot process early enough that this should succeed despitesecurity.lockKernelModules.exitthe rescue shell. The system will come up the rest of the way to normal operation, but with thetunmodule loaded.nix buildandnixos-rebuild bootshould now work again. Make sure to addboot.kernelModules = [ "tun" ]to your system configuration before rebuilding.I do understand how difficult such changes can be to make and there's no hurry from my end.
you can also temporarily disable pasta by setting
pasta-path =in the configuration file or--pasta-path ""on the command line. the latter will require running the rebuild command as root, potentially with aNIX_REMOTE=localenvironment variable to not build via the daemon (depending on the lix version being used)