installCheckPhase fails when building lix with a forgejo action #310

Open
opened 2024-05-12 20:32:26 +00:00 by tom-hubrecht · 5 comments
Member

Describe the bug

When running the CI for my infra on podman, building lix fails at the installCheckPhase, with 12 failed checks.
Some of the failures are due to #157, but other are not:

86/154 lix:installcheck / functional-build-remote-content-addressed-floating          FAIL             0.39s   exit status 1
94/154 lix:installcheck / functional-build-remote-trustless-should-fail-0             FAIL             0.28s   exit status 100
107/154 lix:installcheck / functional-linux-sandbox                                    FAIL             0.25s   exit status 100
108/154 lix:installcheck / functional-supplementary-groups                             FAIL             0.28s   exit status 1
112/154 lix:installcheck / functional-shell                                            FAIL             0.63s   exit status 1
145/154 lix:installcheck / functional-nested-sandboxing                                FAIL             0.29s   exit status 1

I added the logs of these tests.

Steps To Reproduce

I agree that these error may not be easy to reproduce, given the complicated build process, it uses https://git.hubrecht.ovh/hubrecht/nix-modules/src/branch/main/services/forgejo-nix-runners/default.nix to run nix builds inside podman containers through forgejo actions...

Expected behavior

The tests to not fail

nix --version output

Additional context

Add any other context about the problem here.

## Describe the bug When running the CI for my infra on podman, building lix fails at the `installCheckPhase`, with 12 failed checks. Some of the failures are due to #157, but other are not: ``` 86/154 lix:installcheck / functional-build-remote-content-addressed-floating FAIL 0.39s exit status 1 94/154 lix:installcheck / functional-build-remote-trustless-should-fail-0 FAIL 0.28s exit status 100 107/154 lix:installcheck / functional-linux-sandbox FAIL 0.25s exit status 100 108/154 lix:installcheck / functional-supplementary-groups FAIL 0.28s exit status 1 112/154 lix:installcheck / functional-shell FAIL 0.63s exit status 1 145/154 lix:installcheck / functional-nested-sandboxing FAIL 0.29s exit status 1 ``` I added the logs of these tests. ## Steps To Reproduce I agree that these error may not be easy to reproduce, given the complicated build process, it uses https://git.hubrecht.ovh/hubrecht/nix-modules/src/branch/main/services/forgejo-nix-runners/default.nix to run nix builds inside podman containers through forgejo actions... ## Expected behavior The tests to not fail ## `nix --version` output ## Additional context Add any other context about the problem here.
43 KiB
tom-hubrecht added the
bug
label 2024-05-12 20:32:26 +00:00
qyriad added the
testing
Area/build-packaging
labels 2024-05-13 03:13:10 +00:00
Author
Member

I managed to capture the full logs of the installChecks failure

I managed to capture the full logs of the installChecks failure
Owner
2024-05-15T18:38:33.6368732Z compute01 | build-remote-input> error: executing '/nix/store/l85ia9bwz4v0k09v716sx2f2cjar3jvp-busybox': No such file or directory
2024-05-15T18:38:33.6369018Z compute01 | error: build of '/nix/store/dl7xsh6s6a96z1cvpx4n10wxjfmrif1s-build-remote-input-1.drv' on 'ssh://localhost?remote-store=/tmp/nix-build-nix-2.90.0-lix.drv-0/nix-test/build-remote-input-addressed/machine1?system-features=foo' failed: builder for '/nix/store/dl7xsh6s6a96z1cvpx4n10wxjfmrif1s-build-remote-input-1.drv' failed with exit code 1
2024-05-15T18:38:33.6369438Z compute01 | error: builder for '/nix/store/dl7xsh6s6a96z1cvpx4n10wxjfmrif1s-build-remote-input-1.drv' failed with exit code 1
2024-05-15T18:38:33.6369635Z compute01 | error: 1 dependencies of derivation '/nix/store/yfcf7n7nv6ayfrvs6w002v9fpjxblshj-build-remote.drv' failed to build
2024-05-15T18:38:33.6369886Z compute01 | +++(build-remote.sh:29) onError
2024-05-15T18:38:33.6370201Z compute01 | +++(/tmp/nix-build-nix-2.90.0-lix.drv-0/source/build/tests/functional/common/vars-and-functions.sh:235) set +x
2024-05-15T18:38:33.6370422Z compute01 | build-remote-input-addressed.sh: test failed at:
2024-05-15T18:38:33.6370603Z compute01 |   source in build-remote.sh:29
2024-05-15T18:38:33.6370773Z compute01 |   main in build-remote-input-addressed.sh:5
``` 2024-05-15T18:38:33.6368732Z compute01 | build-remote-input> error: executing '/nix/store/l85ia9bwz4v0k09v716sx2f2cjar3jvp-busybox': No such file or directory 2024-05-15T18:38:33.6369018Z compute01 | error: build of '/nix/store/dl7xsh6s6a96z1cvpx4n10wxjfmrif1s-build-remote-input-1.drv' on 'ssh://localhost?remote-store=/tmp/nix-build-nix-2.90.0-lix.drv-0/nix-test/build-remote-input-addressed/machine1?system-features=foo' failed: builder for '/nix/store/dl7xsh6s6a96z1cvpx4n10wxjfmrif1s-build-remote-input-1.drv' failed with exit code 1 2024-05-15T18:38:33.6369438Z compute01 | error: builder for '/nix/store/dl7xsh6s6a96z1cvpx4n10wxjfmrif1s-build-remote-input-1.drv' failed with exit code 1 2024-05-15T18:38:33.6369635Z compute01 | error: 1 dependencies of derivation '/nix/store/yfcf7n7nv6ayfrvs6w002v9fpjxblshj-build-remote.drv' failed to build 2024-05-15T18:38:33.6369886Z compute01 | +++(build-remote.sh:29) onError 2024-05-15T18:38:33.6370201Z compute01 | +++(/tmp/nix-build-nix-2.90.0-lix.drv-0/source/build/tests/functional/common/vars-and-functions.sh:235) set +x 2024-05-15T18:38:33.6370422Z compute01 | build-remote-input-addressed.sh: test failed at: 2024-05-15T18:38:33.6370603Z compute01 | source in build-remote.sh:29 2024-05-15T18:38:33.6370773Z compute01 | main in build-remote-input-addressed.sh:5 ```
Owner

So when it ssh's into localhost it then can't find busybox? wat

So when it ssh's into localhost it then can't find busybox? wat
Owner

worth noting that ssh to localhost does not invoke ssh, instead it invokes bash -c. perhaps that interacts somehow.

worth noting that ssh to localhost *does not invoke ssh*, instead it invokes `bash -c`. perhaps that interacts somehow.
jade self-assigned this 2024-05-25 22:00:30 +00:00
Owner

Reproducer split out from the nixos stuff:

virtualisation.podman.enable = true; and get xonsh available.

Dump this in sandbox.xsh:

#!/usr/bin/env xonsh
import textwrap
import tempfile
import xonsh
import os
import json

$RAISE_SUBPROC_ERROR = True
$XONSH_SHOW_TRACEBACK = True

def build_image_inner():
    mkdir -p etc/nix

    busybox = nix_build('with import <nixpkgs> {}; pkgs.pkgsStatic.busybox')

    cp -R @(busybox)/bin tools

    mkdir -m 1777 -p tmp

    mkdir -p nix/store

    echo 'root:x:0:0:System administrator:/root:/bin/bash' > etc/passwd
    echo 'root:x:0:' > etc/group

    # Create an unpriveleged user that we can use also without the run-as-user.sh script
    groupid = $(id -g nixuser).strip()
    userid = $(id -u nixuser).strip()
    cwd = os.getcwd()
    groupadd --prefix @(cwd) --gid @(groupid) nixuser
    emptypassword='$6$1ero.LwbisiU.h3D$GGmnmECbPotJoPQ5eoSTD6tTjKnSWZcjHoVTkxFLZP17W9hRi/XkmCiAMOfWruUwy8gMjINrBMNODc7cYEo4K.'
    useradd --prefix @(cwd) -p @(emptypassword) -m -d /tmp -u @(userid) -g @(groupid) -G nixuser nixuser

    nix_config = textwrap.dedent('''
    accept-flake-config = true
    experimental-features = nix-command flakes
    build-users-group = nixuser
    ''')

    echo @(nix_config) > etc/nix/nix.conf

    nsswitch_conf = textwrap.dedent('''
    passwd:    files mymachines systemd
    group:     files mymachines systemd
    shadow:    files

    hosts:     files mymachines dns myhostname
    networks:  files

    ethers:    files
    services:  files
    protocols: files
    rpc:       files
    ''')
    echo @(nsswitch_conf) > etc/nsswitch.conf

    # Link usr/bin/env
    mkdir -p usr/bin
    ln -s /bin/env usr/bin/env

    # list the content as it will be imported into the container
    tar -cv . | tar -tvf -
    proc = !(tar -cv . --group=0 --owner=0 | podman import - forgejo-nix-runner)
    return proc.out.strip()


def mk_store(dest, paths):
    echo f"[+] Performing an installation of Nix at {dest}."

    if not dest.exists:
        echo f"[+] Directory {dest} does not exist, creating it."
        dest.mkdir(mode=0o755)

    nix-store --export @(paths) | nix-store --import --store @(dest.absolute())


def build_image():
    with tempfile.TemporaryDirectory() as d:
        with xonsh.tools.chdir(d):
            return build_image_inner()

def nix_build(expr, args={}):
    args = [x for name, val in args.items() for x in ['--argstr', name, val]]
    return $(nix build \
        --impure \
        --no-link \
        --print-out-paths \
        --print-build-logs \
        --expr @(expr) @(args)
    ).strip()

def nix_eval(expr, args={}):
    args = [x for name, val in args.items() for x in ['--argstr', name, val]]
    return json.loads($(nix eval \
        --impure \
        --json \
        --expr @(expr) @(args)
    ).strip())

def get_transitive_closure(paths):
    return $(nix-store --query -R @(paths)).strip().split('\n')

def store_deps():
    deps = '''
    let
      pkgs = import <nixpkgs> { };
      deps = [
        pkgs.bashInteractive
        pkgs.coreutils
        pkgs.curl
        pkgs.findutils
        pkgs.gawk
        pkgs.git
        pkgs.gnugrep
        pkgs.jq
        pkgs.lix
        pkgs.nodejs
        pkgs.openssh
      ];
    in

    pkgs.runCommand "store-deps" { } ''
      mkdir -p $out/bin
      for dir in ${builtins.toString deps}; do
        for bin in "$dir"/bin/*; do
          ln -s "$bin" "$out/bin/$(basename "$bin")"
        done
      done

      # Add SSL CA certs
      mkdir -p $out/etc/ssl/certs
      cp -a "${pkgs.cacert}/etc/ssl/certs/ca-bundle.crt" $out/etc/ssl/certs/ca-bundle.crt
    ''
    '''
    return nix_build(deps)

def run_podman():
    deps = store_deps()
    closure = get_transitive_closure(deps)
    mk_store(p'box', closure)

    nixpkgs_path = nix_eval('(import <nixpkgs> {}).path')
    cwd = os.getcwd()

    podman_args = [
        "run", "--rm",
        "-e", "PATH=/tools:/bin",
        "-e", "NIX_BUILD_SHELL=/bin/bash",
        "-e", "PAGER=cat",
        "-e", f"NIX_PATH=nixpkgs={nixpkgs_path}",
        "-e", "SSL_CERT_FILE=/etc/ssl/certs/ca-bundle.crt",
        "-v", "./box:/persist",
        "-v", "./box/nix:/nix",
        "-v", f"{deps}/bin:/bin",
        "-v", f"{deps}/etc/ssl:/etc/ssl",
        "-v", ".:/lix-src",
        "--device=/dev/kvm",
        # "--user=nixuser",
        "-it", "localhost/forgejo-nix-runner", "/tools/sh",
    ]
    podman @(podman_args)

Then in xonsh:

source ./sandbox.xsh ; build_image() ; run_podman()

Inside the container:

rm -rf build
nix develop .#native-clangStdenvPackages
just setup
just install
just test

This will explode in the same way as described above, I think. FWIW: your container setup is pretty busted and has bad perms on /tmp among other things.

There are complaints that /nix/store/l85ia9bwz4v0k09v716sx2f2cjar3jvp-busybox doesn't exist. This is copied from /nix/store/svsnsvyz3h7vxkcssalms7c5y4pq5n5h-busybox-static-x86_64-unknown-linux-musl-1.36.1/bin/busybox via the $busybox variable in the tests, which becomes --arg busybox /nix/store/svsnsvyz3h7vxkcssalms7c5y4pq5n5h-busybox-static-x86_64-unknown-linux-musl-1.36.1/bin/busybox, which is a path literal that causes that copy.

I don't think that that copy is busted, so maybe it got deleted????? idk!!!

I don't know why the copy doesn't appear to actually happen.

I've attached test logs from this container. You can view the junit one with https://github.com/lukejpreston/xunit-viewer/ which is slightly nicer.

At this point I don't have many more spoons to fight this issue.

Reproducer split out from the nixos stuff: `virtualisation.podman.enable = true;` and get xonsh available. Dump this in `sandbox.xsh`: ``` #!/usr/bin/env xonsh import textwrap import tempfile import xonsh import os import json $RAISE_SUBPROC_ERROR = True $XONSH_SHOW_TRACEBACK = True def build_image_inner(): mkdir -p etc/nix busybox = nix_build('with import <nixpkgs> {}; pkgs.pkgsStatic.busybox') cp -R @(busybox)/bin tools mkdir -m 1777 -p tmp mkdir -p nix/store echo 'root:x:0:0:System administrator:/root:/bin/bash' > etc/passwd echo 'root:x:0:' > etc/group # Create an unpriveleged user that we can use also without the run-as-user.sh script groupid = $(id -g nixuser).strip() userid = $(id -u nixuser).strip() cwd = os.getcwd() groupadd --prefix @(cwd) --gid @(groupid) nixuser emptypassword='$6$1ero.LwbisiU.h3D$GGmnmECbPotJoPQ5eoSTD6tTjKnSWZcjHoVTkxFLZP17W9hRi/XkmCiAMOfWruUwy8gMjINrBMNODc7cYEo4K.' useradd --prefix @(cwd) -p @(emptypassword) -m -d /tmp -u @(userid) -g @(groupid) -G nixuser nixuser nix_config = textwrap.dedent(''' accept-flake-config = true experimental-features = nix-command flakes build-users-group = nixuser ''') echo @(nix_config) > etc/nix/nix.conf nsswitch_conf = textwrap.dedent(''' passwd: files mymachines systemd group: files mymachines systemd shadow: files hosts: files mymachines dns myhostname networks: files ethers: files services: files protocols: files rpc: files ''') echo @(nsswitch_conf) > etc/nsswitch.conf # Link usr/bin/env mkdir -p usr/bin ln -s /bin/env usr/bin/env # list the content as it will be imported into the container tar -cv . | tar -tvf - proc = !(tar -cv . --group=0 --owner=0 | podman import - forgejo-nix-runner) return proc.out.strip() def mk_store(dest, paths): echo f"[+] Performing an installation of Nix at {dest}." if not dest.exists: echo f"[+] Directory {dest} does not exist, creating it." dest.mkdir(mode=0o755) nix-store --export @(paths) | nix-store --import --store @(dest.absolute()) def build_image(): with tempfile.TemporaryDirectory() as d: with xonsh.tools.chdir(d): return build_image_inner() def nix_build(expr, args={}): args = [x for name, val in args.items() for x in ['--argstr', name, val]] return $(nix build \ --impure \ --no-link \ --print-out-paths \ --print-build-logs \ --expr @(expr) @(args) ).strip() def nix_eval(expr, args={}): args = [x for name, val in args.items() for x in ['--argstr', name, val]] return json.loads($(nix eval \ --impure \ --json \ --expr @(expr) @(args) ).strip()) def get_transitive_closure(paths): return $(nix-store --query -R @(paths)).strip().split('\n') def store_deps(): deps = ''' let pkgs = import <nixpkgs> { }; deps = [ pkgs.bashInteractive pkgs.coreutils pkgs.curl pkgs.findutils pkgs.gawk pkgs.git pkgs.gnugrep pkgs.jq pkgs.lix pkgs.nodejs pkgs.openssh ]; in pkgs.runCommand "store-deps" { } '' mkdir -p $out/bin for dir in ${builtins.toString deps}; do for bin in "$dir"/bin/*; do ln -s "$bin" "$out/bin/$(basename "$bin")" done done # Add SSL CA certs mkdir -p $out/etc/ssl/certs cp -a "${pkgs.cacert}/etc/ssl/certs/ca-bundle.crt" $out/etc/ssl/certs/ca-bundle.crt '' ''' return nix_build(deps) def run_podman(): deps = store_deps() closure = get_transitive_closure(deps) mk_store(p'box', closure) nixpkgs_path = nix_eval('(import <nixpkgs> {}).path') cwd = os.getcwd() podman_args = [ "run", "--rm", "-e", "PATH=/tools:/bin", "-e", "NIX_BUILD_SHELL=/bin/bash", "-e", "PAGER=cat", "-e", f"NIX_PATH=nixpkgs={nixpkgs_path}", "-e", "SSL_CERT_FILE=/etc/ssl/certs/ca-bundle.crt", "-v", "./box:/persist", "-v", "./box/nix:/nix", "-v", f"{deps}/bin:/bin", "-v", f"{deps}/etc/ssl:/etc/ssl", "-v", ".:/lix-src", "--device=/dev/kvm", # "--user=nixuser", "-it", "localhost/forgejo-nix-runner", "/tools/sh", ] podman @(podman_args) ``` Then in xonsh: ``` source ./sandbox.xsh ; build_image() ; run_podman() ``` Inside the container: ``` rm -rf build nix develop .#native-clangStdenvPackages just setup just install just test ``` This will explode in the same way as described above, I think. FWIW: your container setup is pretty busted and has bad perms on /tmp among other things. There are complaints that `/nix/store/l85ia9bwz4v0k09v716sx2f2cjar3jvp-busybox` doesn't exist. This is copied from `/nix/store/svsnsvyz3h7vxkcssalms7c5y4pq5n5h-busybox-static-x86_64-unknown-linux-musl-1.36.1/bin/busybox` via the `$busybox` variable in the tests, which becomes `--arg busybox /nix/store/svsnsvyz3h7vxkcssalms7c5y4pq5n5h-busybox-static-x86_64-unknown-linux-musl-1.36.1/bin/busybox`, which is a path literal that causes that copy. I don't think that *that* copy is busted, so maybe it got deleted????? idk!!! I don't know why the copy doesn't appear to actually happen. I've attached test logs from this container. You can view the junit one with https://github.com/lukejpreston/xunit-viewer/ which is slightly nicer. At this point I don't have many more spoons to fight this issue.
jade removed their assignment 2024-05-26 02:07:18 +00:00
Sign in to join this conversation.
No milestone
No project
No assignees
4 participants
Notifications
Due date
The due date is invalid or out of range. Please use the format "yyyy-mm-dd".

No due date set.

Dependencies

No dependencies set.

Reference: lix-project/lix#310
No description provided.