Improve error message when no tun module is available #1079

Open
opened 2025-12-16 14:32:40 +00:00 by dasj · 6 comments

When locking kernel modules (security.lockKernelModules = true), the tun module can not be automatically loaded by Lix.
Builds break (as expected), but the error message is not clear what the root cause is:

error:
     … while setting up the build environment
     error: sandbox network setup timed out, please check daemon logs for possible error output.

The journal helps a bit but it's still not fully clear:

Dec 16 06:08:10 localhost pasta[391525]: Failed to open() /dev/net/tun
Dec 16 06:08:10 localhost nix-daemon[391525]: Failed to open() /dev/net/tun: No such device
Dec 16 06:08:10 localhost pasta[391525]: : No such device
Dec 16 06:08:10 localhost nix-daemon[391482]: Failed to set up tap device in namespace
Dec 16 06:08:10 localhost pasta[391482]: Failed to set up tap device in namespace

This is even more confusing since the file exists. The actual solution (boot.kernelModules = [ "tun" ];) can only be guessed from the word "tun".

Describe the solution you'd like

A clear and concise error like Unable to open the tun device - is the tun kernel module loaded?

Describe alternatives you've considered

Live with it

Additional context

## Is your feature request related to a problem? Please describe. When locking kernel modules (`security.lockKernelModules = true`), the tun module can not be automatically loaded by Lix. Builds break (as expected), but the error message is not clear what the root cause is: ``` error: … while setting up the build environment error: sandbox network setup timed out, please check daemon logs for possible error output. ``` The journal helps a bit but it's still not fully clear: ``` Dec 16 06:08:10 localhost pasta[391525]: Failed to open() /dev/net/tun Dec 16 06:08:10 localhost nix-daemon[391525]: Failed to open() /dev/net/tun: No such device Dec 16 06:08:10 localhost pasta[391525]: : No such device Dec 16 06:08:10 localhost nix-daemon[391482]: Failed to set up tap device in namespace Dec 16 06:08:10 localhost pasta[391482]: Failed to set up tap device in namespace ``` This is even more confusing since the file exists. The actual solution (`boot.kernelModules = [ "tun" ];`) can only be guessed from the word "tun". ## Describe the solution you'd like A clear and concise error like `Unable to open the tun device - is the tun kernel module loaded?` ## Describe alternatives you've considered Live with it ## Additional context
Owner

the messages here are unfortunately not directly under our control since pasta is doing the tun operations. we currently can't capture pasta output very well due to subprocess management restrictions, and if we could we'd have to string-match the error messages and live with that being brittle.

one thing we could do is unconditionally check whether /dev/tun is accessible if network setup fails, but that too can have some weird error conditions. probably the best option is to somehow capture pasta output and report it as an addendum to "sandbox network setup timed out" and a dedicated check for the module.

the messages here are unfortunately not directly under our control since pasta is doing the tun operations. we currently can't capture pasta output very well due to subprocess management restrictions, and if we could we'd have to string-match the error messages and live with that being brittle. one thing we could do is unconditionally check whether `/dev/tun` is accessible if network setup fails, but that too can have some weird error conditions. probably the best option is to somehow capture pasta output and report it as an addendum to "sandbox network setup timed out" *and* a dedicated check for the module.

@dasj FYI, if the file hadn't existed, the error message would have been "Failed to open() /dev/net/tun: No such file or directory". "No such device" indicates that the "device special file" exists but the kernel driver that backs it hasn't been loaded.

@pennae When this happens, the pasta process exits but the parent nix process doesn't notice.

# ps -ef --forest | tail -n5
root       39567       1  0 14:15 ?        00:00:00 /nix/store/rlq03x4cwf8zn73hxaxnx0zn5q9kifls-bash-5.3p3/bin/bash /nix/store/
root       39614   39567  0 14:15 ?        00:00:00  \_ /nix/store/3lll9y925zz9393sa59h653xik66srjb-python3-3.13.9/bin/python3.
root       39615   39614  6 14:15 ?        00:00:09      \_ nix --extra-experimental-features nix-command flakes build --print-
nixbld10   39776   39615  2 14:16 ?        00:00:03          \_ nix --extra-experimental-features nix-command flakes build --pr
nixbld10   39777   39615  0 14:16 ?        00:00:00          \_ [pasta] <defunct>

The <defunct> marker is how ps indicates a zombie process in this mode. The build job cannot succeed at this point, but it hangs around until either a timeout expires, or I kill -9 process 39776 (the nix process that's a sibling of the pasta zombie).

A feasible UI improvement, therefore, should be to make the parent nix process pay attention to exit notifications for all of its child processes. The errors would still be unhelpful but at least the build would fail quickly instead of hanging. I don't know enough about the guts of Lix to make a patch but I expect there is a place where something's calling waitpid(specific_child, &status, 0) and that needs to be changed to a loop calling waitpid(-1, &status, 0) until there aren't any more children to wait for.

@dasj FYI, if the file hadn't existed, the error message would have been "Failed to open() /dev/net/tun: No such _file or directory_". "No such _device_" indicates that the "device special file" exists but the kernel driver that backs it hasn't been loaded. @pennae When this happens, the `pasta` process exits but the parent `nix` process doesn't notice. ```plain # ps -ef --forest | tail -n5 root 39567 1 0 14:15 ? 00:00:00 /nix/store/rlq03x4cwf8zn73hxaxnx0zn5q9kifls-bash-5.3p3/bin/bash /nix/store/ root 39614 39567 0 14:15 ? 00:00:00 \_ /nix/store/3lll9y925zz9393sa59h653xik66srjb-python3-3.13.9/bin/python3. root 39615 39614 6 14:15 ? 00:00:09 \_ nix --extra-experimental-features nix-command flakes build --print- nixbld10 39776 39615 2 14:16 ? 00:00:03 \_ nix --extra-experimental-features nix-command flakes build --pr nixbld10 39777 39615 0 14:16 ? 00:00:00 \_ [pasta] <defunct> ``` The `<defunct>` marker is how ps indicates a zombie process in this mode. The build job cannot succeed at this point, but it hangs around until either a timeout expires, or I kill -9 process 39776 (the `nix` process that's a _sibling_ of the `pasta` zombie). A feasible UI improvement, therefore, should be to make the parent `nix` process pay attention to exit notifications for _all_ of its child processes. The errors would still be unhelpful but at least the build would fail quickly instead of hanging. I don't know enough about the guts of Lix to make a patch but I _expect_ there is a place where something's calling `waitpid(specific_child, &status, 0)` and that needs to be changed to a _loop_ calling `waitpid(-1, &status, 0)` until there aren't any more children to wait for.
Owner

A feasible UI improvement, therefore, should be to make the parent nix process pay attention to exit notifications for all of its child processes.

this is currently completely impossible to do without locking up the daemon or spawning one thread per child due to the absolute state of the world we have inherited. doing this is absolutely on the books for the future, but it is a rather invasive change. the world we inherited is a purely synchronous system that relies on guessing correctly to not lock up a lot of the time, we've been moving this to a fully asynchronous system based on capnproto/kj but haven't gotten to the point where we can replace the subprocess management system quite yet.

> A feasible UI improvement, therefore, should be to make the parent nix process pay attention to exit notifications for all of its child processes. this is currently completely impossible to do without locking up the daemon or spawning one thread per child due to the absolute *state* of the world we have inherited. doing this is absolutely on the books for the future, but it is a rather invasive change. the world we inherited is a purely synchronous system that relies on guessing correctly to not lock up a lot of the time, we've been moving this to a fully asynchronous system based on capnproto/kj but haven't gotten to the point where we can replace the subprocess management system quite yet.

Recovery instructions, for anyone hitting this issue:

  1. Set a password for root if you haven't already.
  2. Boot the system with systemd.unit=rescue.target appended to the kernel command line. You'll be prompted for the root password.
  3. After you enter the password, you'll be dropped into a shell running as root.
  4. Issue the command modprobe tun; you have interrupted the boot process early enough that this should succeed despite security.lockKernelModules.
  5. exit the rescue shell. The system will come up the rest of the way to normal operation, but with the tun module loaded.
  6. nix build and nixos-rebuild boot should now work again. Make sure to add boot.kernelModules = [ "tun" ] to your system configuration before rebuilding.
Recovery instructions, for anyone hitting this issue: 1. Set a password for `root` if you haven't already. 2. Boot the system with `systemd.unit=rescue.target` appended to the kernel command line. You'll be prompted for the root password. 3. After you enter the password, you'll be dropped into a shell running as root. 4. Issue the command `modprobe tun`; you have interrupted the boot process early enough that this should succeed despite `security.lockKernelModules`. 5. `exit` the rescue shell. The system will come up the rest of the way to normal operation, but with the `tun` module loaded. 6. `nix build` and `nixos-rebuild boot` should now work again. Make sure to add `boot.kernelModules = [ "tun" ]` to your system configuration before rebuilding.

make the parent nix process pay attention to exit notifications for all of its child processes.

this is currently completely impossible to do without locking up the daemon or spawning one thread per child due to the absolute state of the world we have inherited. doing this is absolutely on the books for the future, but it is a rather invasive change.

I do understand how difficult such changes can be to make and there's no hurry from my end.

> > make the parent nix process pay attention to exit notifications for all of its child processes. > > this is currently completely impossible to do without locking up the daemon or spawning one thread per child due to the absolute state of the world we have inherited. doing this is absolutely on the books for the future, but it is a rather invasive change. I do understand how difficult such changes can be to make and there's no hurry from my end.
Owner

you can also temporarily disable pasta by setting pasta-path = in the configuration file or --pasta-path "" on the command line. the latter will require running the rebuild command as root, potentially with a NIX_REMOTE=local environment variable to not build via the daemon (depending on the lix version being used)

you can also temporarily disable pasta by setting `pasta-path =` in the configuration file or `--pasta-path ""` on the command line. the latter will require running the rebuild command as root, potentially with a `NIX_REMOTE=local` environment variable to not build via the daemon (depending on the lix version being used)
raito added this to the 2.95 milestone 2025-12-22 08:26:39 +00:00
Sign in to join this conversation.
No milestone
No project
No assignees
3 participants
Notifications
Due date
The due date is invalid or out of range. Please use the format "yyyy-mm-dd".

No due date set.

Dependencies

No dependencies set.

Reference
lix-project/lix#1079
No description provided.