builder user on our semi-ephemeral netbooted x86_64-linux builders should have a static UID #224

Open
opened 2025-06-10 15:49:30 +00:00 by emilylange · 1 comment
Owner

Our netbooted x86_64-linux hydra builders have a persistent local ssd mounted at /mnt that is owned by a user called builder.

When rebooting the UID of this user can change. E.g. I've seen it jump from 993 to 999, causing Hydra to fail with

Jun 09 22:24:20 build-coord hydra-queue-runner[1670]: possibly transient failure building ‘/nix/store/38v4q9n1wgsi48zmpqg271wv2yx8m7i1-linux-6.12.30.drv’ on ‘ssh://hydra-wob01-big-parallel-bm-10?remote-store=/mnt&cores=20’: error: cannot connect to ‘ssh://hydra-wob01-big-parallel-bm-10?remote-store=/mnt&cores=20’: error: could not set permissions on '/mnt/nix/var/nix/profiles/per-user' to 755: Operation not permitted

Jun 09 22:48:36 build-coord hydra-queue-runner[1670]: possibly transient failure building ‘/nix/store/nhpgw1hl7j36f5z1d8542yw5wir1asmd-mdbook-0.4.49-vendor.drv’ on ‘ssh://hydra-wob01-bm-10?remote-store=/mnt&cores=8’: error: cannot connect to ‘ssh://hydra-wob01-bm-10?remote-store=/mnt&cores=8’: error: could not set permissions on '/mnt/nix/var/nix/profiles/per-user' to 755: Operation not permitted

Jun 10 00:10:25 build-coord hydra-queue-runner[1670]: possibly transient failure building ‘/nix/store/jrkk5dnwvl0p6xqv13lvrsr6d0k7zfn6-source.drv’ on ‘ssh://hydra-wob01-bm-10?remote-store=/mnt&cores=8’: error: cannot connect to ‘ssh://hydra-wob01-bm-10?remote-store=/mnt&cores=8’: error: could not set permissions on '/mnt/nix/var/nix/profiles/per-user' to 755: Operation not permitted

Jun 10 00:50:45 build-coord hydra-queue-runner[1670]: possibly transient failure building ‘/nix/store/jrkk5dnwvl0p6xqv13lvrsr6d0k7zfn6-source.drv’ on ‘ssh://hydra-wob01-bm-10?remote-store=/mnt&cores=8’: error: cannot connect to ‘ssh://hydra-wob01-bm-10?remote-store=/mnt&cores=8’: error: could not set permissions on '/mnt/nix/var/nix/profiles/per-user' to 755: Operation not permitted

We should set uid in

users.users.builder = {
isSystemUser = true;
group = "nogroup";
home = "/var/empty";
shell = "/bin/sh";
openssh.authorizedKeys.keys = [
# Do not hardcode Hydra's public key, selectively
# add the keys of the coordinators that require us.
"ssh-ed25519 AAAAC3NzaC1lZDI1NTE5AAAAIAvUT9YBig9LQPHgypIBHQuC32XqDKxlFZ2CfgDi0ZKx"
];
};

and chown the store once (chown --from=993 999 ...).

Our netbooted `x86_64-linux` hydra builders have a persistent local ssd mounted at `/mnt` that is owned by a user called `builder`. When rebooting the UID of this user can change. E.g. I've seen it jump from `993` to `999`, causing Hydra to fail with ``` Jun 09 22:24:20 build-coord hydra-queue-runner[1670]: possibly transient failure building ‘/nix/store/38v4q9n1wgsi48zmpqg271wv2yx8m7i1-linux-6.12.30.drv’ on ‘ssh://hydra-wob01-big-parallel-bm-10?remote-store=/mnt&cores=20’: error: cannot connect to ‘ssh://hydra-wob01-big-parallel-bm-10?remote-store=/mnt&cores=20’: error: could not set permissions on '/mnt/nix/var/nix/profiles/per-user' to 755: Operation not permitted Jun 09 22:48:36 build-coord hydra-queue-runner[1670]: possibly transient failure building ‘/nix/store/nhpgw1hl7j36f5z1d8542yw5wir1asmd-mdbook-0.4.49-vendor.drv’ on ‘ssh://hydra-wob01-bm-10?remote-store=/mnt&cores=8’: error: cannot connect to ‘ssh://hydra-wob01-bm-10?remote-store=/mnt&cores=8’: error: could not set permissions on '/mnt/nix/var/nix/profiles/per-user' to 755: Operation not permitted Jun 10 00:10:25 build-coord hydra-queue-runner[1670]: possibly transient failure building ‘/nix/store/jrkk5dnwvl0p6xqv13lvrsr6d0k7zfn6-source.drv’ on ‘ssh://hydra-wob01-bm-10?remote-store=/mnt&cores=8’: error: cannot connect to ‘ssh://hydra-wob01-bm-10?remote-store=/mnt&cores=8’: error: could not set permissions on '/mnt/nix/var/nix/profiles/per-user' to 755: Operation not permitted Jun 10 00:50:45 build-coord hydra-queue-runner[1670]: possibly transient failure building ‘/nix/store/jrkk5dnwvl0p6xqv13lvrsr6d0k7zfn6-source.drv’ on ‘ssh://hydra-wob01-bm-10?remote-store=/mnt&cores=8’: error: cannot connect to ‘ssh://hydra-wob01-bm-10?remote-store=/mnt&cores=8’: error: could not set permissions on '/mnt/nix/var/nix/profiles/per-user' to 755: Operation not permitted ``` We should set `uid` in https://git.lix.systems/the-distro/infra/src/commit/589ef165ab3cb55fa348e3a2783b97bc2b0b1a95/services/baremetal/builders/default.nix#L14-L24 and `chown` the store once (`chown --from=993 999 ...`).
Member

I think it might be good to define fixed UIDs for everything; in my personal infra I use this module (to get uids consistent across all machines as well):

{ pkgs, config, lib, ...}:
let
  cfg = config.users.common-ids;
  uids = {
    # People
    # ...
    linus = 2005;
    # ...
    
    # Machines
    sosiego = 3001;
    sol = 3002;
    # ...

    # Services
    acme = 4999;
    node-exporter = 4993;
    # ...

    # To match static uid assignments from nixos/modules/misc/ids.nix
    polkituser = 28;
    systemd-coredump = 151;
  };
in {
  options.users = {
    common-ids = {
      priority = lib.mkOption {
        description = "Priority with which to set common user IDs";
        default = 100;
      };
    };
    users = lib.mkOption {
      type = lib.types.attrsOf (lib.types.submodule ({ name, ... }: {
        uid = lib.mkIf (uids ? ${name}) (lib.mkOverride cfg.priority uids.${name});
      }));
    };
    groups = lib.mkOption {
      type = lib.types.attrsOf (lib.types.submodule ({ name, ... }: {
        gid = lib.mkIf (uids ? ${name}) (lib.mkOverride cfg.priority uids.${name});
      }));
    };
  };
}

I use ranges from 2000 up to reduce the likelihood of collisions with other mechanisms that assign UIDs, and use the thousands column to distinguish various "types" of user.

At some point I'll probably add an assertion that prevents any null uids in my configs, because I feel like the stateful assignment logic will often end up coming with surprises.

Maybe we want this for forkos infra too?

I think it might be good to define fixed UIDs for everything; in my personal infra I use this module (to get uids consistent across all machines as well): ```nix { pkgs, config, lib, ...}: let cfg = config.users.common-ids; uids = { # People # ... linus = 2005; # ... # Machines sosiego = 3001; sol = 3002; # ... # Services acme = 4999; node-exporter = 4993; # ... # To match static uid assignments from nixos/modules/misc/ids.nix polkituser = 28; systemd-coredump = 151; }; in { options.users = { common-ids = { priority = lib.mkOption { description = "Priority with which to set common user IDs"; default = 100; }; }; users = lib.mkOption { type = lib.types.attrsOf (lib.types.submodule ({ name, ... }: { uid = lib.mkIf (uids ? ${name}) (lib.mkOverride cfg.priority uids.${name}); })); }; groups = lib.mkOption { type = lib.types.attrsOf (lib.types.submodule ({ name, ... }: { gid = lib.mkIf (uids ? ${name}) (lib.mkOverride cfg.priority uids.${name}); })); }; }; } ``` I use ranges from 2000 up to reduce the likelihood of collisions with other mechanisms that assign UIDs, and use the thousands column to distinguish various "types" of user. At some point I'll probably add an assertion that prevents any null uids in my configs, because I feel like the stateful assignment logic will often end up coming with surprises. Maybe we want this for forkos infra too?
Sign in to join this conversation.
No milestone
No project
No assignees
2 participants
Notifications
Due date
The due date is invalid or out of range. Please use the format "yyyy-mm-dd".

No due date set.

Dependencies

No dependencies set.

Reference: the-distro/infra#224
No description provided.