nix offline detection is applying to --refresh
when that should be fatal, breaking system.autoUpgrade #286
Labels
No labels
Area/build-packaging
Area/cli
Area/evaluator
Area/fetching
Area/flakes
Area/language
Area/profiles
Area/protocol
Area/releng
Area/remote-builds
Area/repl
Area/store
bug
crash 💥
Cross Compilation
devx
docs
Downstream Dependents
E/easy
E/hard
E/help wanted
E/reproducible
E/requires rearchitecture
imported
Needs Langver
OS/Linux
OS/macOS
performance
regression
release-blocker
RFD
stability
Status
blocked
Status
invalid
Status
postponed
Status
wontfix
testing
testing/flakey
ux
No milestone
No project
No assignees
1 participant
Notifications
Due date
No due date set.
Dependencies
No dependencies set.
Reference: lix-project/lix#286
Loading…
Reference in a new issue
No description provided.
Delete branch "%!s()"
Deleting a branch is permanent. Although the deleted branch may continue to exist for a short time before it actually gets removed, it CANNOT be undone in most cases. Continue?
Describe the bug
autoUpgrade service doesn't fail when steps within the process have errors.
nix seems to not be exiting with an error visible to caller.nixos-rebuild
seems to be swallowing themAs well as simply not doing the intended job of upgrading, this can actually cause configuration to go backwards.
Steps To Reproduce
network-online.target
but this is not meaningful after a resume, unfortunately.--refresh
argument is given with a flake, it will use the previously-cached fetch from the last run, which should be considered stale and invalid. The build proceeds anyway./etc/nixos/flake.nix
for example), the autoupgrade service will build and switch to the older revision, effectively rolling back unexpectedly.Expected behavior
Issues and errors, such as lack of network connectivity for an upgrade, should be considered as errors for the rebuild, and cause the service to fail (so it can optionally then be configured to retry with a delay).
In particular, at step 5, the
--refresh
argument should consider cached copies of the flake source as invalid (as documented) and refuse to use them. The errors in the log, reported as "fatal", should therefore be fatal.Screenshots
In the below log, wifi was disabled. The autoUpgrade service is configures with a git+ssh:// flake repo.
Without
--refresh
in the options list, the ssh errors don't appear, presumably because the 'network-dependent features' have been disabled. With--refresh
they're tried anyway but the errors are ignored.Speculation
After pondering on this for a while, I'm becoming more convinced that the issue is nix itself:
--offline
had been passed explicitly, based on some auto-detection of connectivity--refresh
, that was passed explicitlyAdditional context
Full config, including workaround using a preStart job that will fail in a way systemd can see, and another to prevent rollback of 'dirty' changes when hacking:
It also seems to rebuild and switch when there's full network connectivity but no new revisions are fetched, regardless of whether this is because (without
--refresh
) the content is still within TTL, or simply no new revisions are found on the git repo. I don't think this is necessary.It might be helpful to have an option that's the inverse of
--offline
that seems to be getting detected.. something like--require-online
such that it can bail directly from this autodetection before even getting to the other steps. But it should still bail on those other errors, and the failure to update with--refresh
, and it should very-definitely not roll back by building and switching to a stale revision.originally at https://github.com/NixOS/nixpkgs/issues/274146
nix offline detection is masking errors, breaking system.autoUpgradeto nix offline detection is applying to--refresh
when that should be fatal, breaking system.autoUpgradeSmaller repro, invoking
nixos-rebuild build --flake git+ssh://soft-serve:23231/geek/nixos?ref=flake --refresh
3 times:--refresh
This also confirms the issue still persists in lix as of now.