SSH ForceCommand directive breaks ssh store #830

Closed
opened 2025-05-12 17:27:59 +00:00 by tcmal · 3 comments

Describe the bug

When using an SSH store, if the user you're connecting as has ForceCommand /.../nix-store, Lix will fail with a cryptic error.

Encountered when using nix.sshServe on the remote builder.

Steps To Reproduce

  1. Setup a remote builder with something like:
  nix.sshServe = {
    enable = true;
    write = true;
    keys = [
      lib.our.pubKeys.builder
    ];
  };
  1. From a different machine, run nix store ping --store ssh://nix-ssh@<ip of first machine>
  2. Ping fails with log:
Store URL: ssh://nix-ssh@builder.aria.rip
error: serialised integer 7022364302122705765 is too large for type 'j'
warning: SSH to 'nix-ssh@builder.aria.rip' failed, stdout first line: ''
error: failed to start SSH connection to 'nix-ssh@builder.aria.rip'

Expected behavior

Successful store ping.

nix --version output

On both machines, nix (Lix, like Nix) 2.94.0-dev-pre20250511-e4b48ca

Additional context

The integer it returns corresponds to the bytes echo sta, which probably comes from the remote daemon trying to deserialise these bytes. Possibly f92235e1 is related, and using -oPermitLocalCommand=yes -oLocalCommand=echo started as before is needed.

You can work around this by not using the nix.sshServe module, and dropping the ForceCommand, but it would be preferable to not need this.

## Describe the bug When using an SSH store, if the user you're connecting as has `ForceCommand /.../nix-store`, Lix will fail with a cryptic error. Encountered when using [`nix.sshServe`](https://github.com/NixOS/nixpkgs/blob/nixos-24.11/nixos/modules/services/misc/nix-ssh-serve.nix) on the remote builder. ## Steps To Reproduce 1. Setup a remote builder with something like: ```nix nix.sshServe = { enable = true; write = true; keys = [ lib.our.pubKeys.builder ]; }; ``` 2. From a different machine, run `nix store ping --store ssh://nix-ssh@<ip of first machine>` 3. Ping fails with log: ``` Store URL: ssh://nix-ssh@builder.aria.rip error: serialised integer 7022364302122705765 is too large for type 'j' warning: SSH to 'nix-ssh@builder.aria.rip' failed, stdout first line: '' error: failed to start SSH connection to 'nix-ssh@builder.aria.rip' ``` ## Expected behavior Successful store ping. ## `nix --version` output On both machines, `nix (Lix, like Nix) 2.94.0-dev-pre20250511-e4b48ca` ## Additional context The integer it returns corresponds to the bytes `echo sta`, which probably comes from the remote daemon trying to deserialise [these bytes](https://git.lix.systems/lix-project/lix/src/commit/f92235e1d2ebe7f5fa8048543c4f59661d52f787/lix/libstore/ssh.cc#L86). Possibly [f92235e1](https://git.lix.systems/lix-project/lix/commit/f92235e1d2ebe7f5fa8048543c4f59661d52f787) is related, and using `-oPermitLocalCommand=yes -oLocalCommand=echo started` as before is needed. You can work around this by not using the `nix.sshServe` module, and dropping the `ForceCommand`, but it would be preferable to not need this.
Owner

yes, that's an unfortunate consequence of the connection sharing fixes: previously ssh connections would not work at all in some cases if the ssh config configured multiplexing, either crashing during connection setup or not opening a connection at all. we now require the remote to be able to run something that looks enough like a posix shell to first run an echo started, and then the command that does the actual remote processing.

sadly this is not a lix bug and cannot be fixed without reintroducing old bugs (like #644) :(

yes, that's an unfortunate consequence of the connection sharing fixes: previously ssh connections would not work at all in some cases if the ssh config configured multiplexing, either crashing during connection setup or not opening a connection at all. we now require the remote to be able to run something that looks enough like a posix shell to first run an `echo started`, and *then* the command that does the actual remote processing. sadly this is not a lix bug and cannot be fixed without reintroducing old bugs (like #644) :(
Owner

this nixos discourse thread has a description of how to do this properly with the old (buggy) ssh behavior, for the new one the script must read lines from stdin instead of parsing the original command and needs to also allow an echo started command to run successfully. the nixos module must use this script instead of executing lix binaries directly to function.

in that thread @winter mentioned wanting to write an updated version, that version should be used for the nixos module as well.

[this nixos discourse thread](https://discourse.nixos.org/t/wrapper-to-restrict-builder-access-through-ssh-worth-upstreaming/25834) has a description of how to do this properly with the old (buggy) ssh behavior, for the new one the script must read lines from stdin instead of parsing the original command and needs to also allow an `echo started` command to run successfully. the nixos module must use this script instead of executing lix binaries directly to function. in that thread @winter mentioned wanting to write an updated version, that version should be used for the nixos module as well.

At least for NixOS users there is a nixpkgs patch in the (above) linked thread now (everyone else can still take the generated script as a template though, I guess).
Most people won't really be able to do much with that patch I guess (I'm probably the odd one here using IFD to patch nixpkgs before evaluation), but it exists and could be upstreamed.

@tcmal if that patch solves the issue for you too (if you have a way of easily testing that, I don't fault you if it's too much work), feel free to comment in the Discourse, if I get some positive feedback I'll put upstreaming that thing on my todolist (instead of keeping it around out-of-tree).

At least for NixOS users there is a *nixpkgs* patch in the (above) linked thread now (everyone else can still take the generated script as a template though, I guess). Most people won't really be able to do much with that patch I guess (I'm probably the odd one here using IFD to patch *nixpkgs* before evaluation), but it exists and *could* be upstreamed. @tcmal if that patch solves the issue for you too (if you have a way of easily testing that, I don't fault you if it's too much work), feel free to comment in the Discourse, if I get some positive feedback I'll put upstreaming that thing on my todolist (instead of keeping it around out-of-tree).
Sign in to join this conversation.
No milestone
No project
No assignees
3 participants
Notifications
Due date
The due date is invalid or out of range. Please use the format "yyyy-mm-dd".

No due date set.

Dependencies

No dependencies set.

Reference: lix-project/lix#830
No description provided.