By default, Nix sets the "cores" setting to the number of CPUs which are
physically present on the machine. If cgroups are used to limit the CPU
and memory consumption of a large Nix build, the OOM killer may be
invoked.
For example, consider a GitLab CI pipeline which builds a large software
package. The GitLab runner spawns a container whose CPU is limited to 4
cores and whose memory is limited to 16 GiB. If the underlying machine
has 64 cores, Nix will invoke the build with -j64. In many cases, that
level of parallelism will invoke the OOM killer and the build will
completely fail.
This change sets the default value of "cores" to be
ceil(cpu_quota / cpu_period), with a fallback to
std:🧵:hardware_concurrency() if cgroups v2 is not detected.
The workaround for "Some distros patch Linux" mentioned in
local-derivation-goal.cc will not help in the `--option
sandbox-fallback false` case. To provide the user more helpful
guidance on how to get the sandbox working, let's check to see if the
`/proc` node created by the aforementioned patch is present and
configured in a way that will cause us problems. If so, give the user
a suggestion for how to troubleshoot the problem.
local-derivation-goal.cc contains a comment stating that "Some distros
patch Linux to not allow unprivileged user namespaces." Let's give a
pointer to a common version of this patch for those who want more
details about this failure mode.
This commit causes nix to `warn()` if sandbox setup has failed and
`/proc/self/ns/user` does not exist. This is usually a sign that the
kernel was compiled without `CONFIG_USER_NS=y`, which is required for
sandboxing.
This commit uses `warn()` to notify the user if sandbox setup fails
with errno==EPERM and /proc/sys/user/max_user_namespaces is missing or
zero, since that is at least part of the reason why sandbox setup
failed.
Note that `echo -n 0 > /proc/sys/user/max_user_namespaces` or
equivalent at boot time has been the recommended mitigation for
several Linux LPE vulnerabilities over the past few years. Many users
have applied this mitigation and then forgotten that they have done
so.
The failure modes for nix's sandboxing setup are pretty complicated.
When nix is unable to set up the sandbox, let's provide more detail
about what went wrong. Specifically:
* Make sure the error message includes the word "sandbox" so the user
knows that the failure was related to sandboxing.
* If `--option sandbox-fallback false` was provided, and removing it
would have allowed further attempts to make progress, let the user
know.
I recently got fairly confused why the following expression didn't have
any effect
{
description = "Foobar";
inputs.sops-nix = {
url = github:mic92/sops-nix;
inputs.nixpkgs_22_05.follows = "nixpkgs";
};
}
until I found out that the input was called `nixpkgs-22_05` (please note
the dash vs. underscore).
IMHO it's not a good idea to not throw an error in that case and
probably leave end-users rather confused, so I implemented a small check
for that which basically checks whether `follows`-declaration from
overrides actually have corresponding inputs in the transitive flake.
In fact this was done by accident already in our own test-suite where
the removal of a `follows` was apparently forgotten[1].
Since the key of the `std::map` that holds the `overrides` is a vector
and we have to find the last element of each vector (i.e. the override)
this has to be done with a for loop in O(n) complexity with `n` being
the total amount of overrides (which shouldn't be that large though).
Please note that this doesn't work with nested expressions, i.e.
inputs.fenix.inputs.nixpkgs.follows = "...";
which is a known problem[2].
For the expression demonstrated above, an error like this will be
thrown:
error: sops-nix has a `follows'-declaration for a non-existant input nixpkgs_22_05!
[1] 2664a216e5
[2] https://github.com/NixOS/nix/issues/5790
Defers completion of flake inputs until the whole command line is parsed
so that we know what flakes we need to complete the inputs of.
Previously, `nix build flake --update-input <Tab>` always behaved like
`nix build . --update-input <Tab>`.
Prevents errors when running with UBSan:
/nix/store/j5vhrywqmz1ixwhsmmjjxa85fpwryzh0-gcc-11.3.0/include/c++/11.3.0/bits/stl_pair.h:353:4: runtime error: load of value 229, which is not a valid value for type 'AttrType'
Specifically, if we're not root and the daemon socket does not exist,
then we use ~/.local/share/nix/root as a chroot store. This enables
non-root users to download nix-static and have it work out of the box,
e.g.
ubuntu@ip-10-13-1-146:~$ ~/nix run nixpkgs#hello
warning: '/nix' does not exists, so Nix will use '/home/ubuntu/.local/share/nix/root' as a chroot store
Hello, world!
With this, Nix will write a copy of the sandbox shell to /bin/sh in
the sandbox rather than bind-mounting it from the host filesystem.
This makes /bin/sh work out of the box with nix-static, i.e. you no
longer get
/nix/store/qa36xhc5gpf42l3z1a8m1lysi40l9p7s-bootstrap-stage4-stdenv-linux/setup: ./configure: /bin/sh: bad interpreter: No such file or directory
This allows changes to nix-cache-info to be picked up by existing
clients. Previously, the only way for this to happen would be for
clients to delete binary-cache-v6.sqlite, which is quite awkward for
users.
On the other hand, updates to nix-cache-info should be pretty rare,
hence the choice of a fairly long TTL. Configurability is probably not
useful enough to warrant implementing it.
Allow `nix build flake1 flake2 --update-input <Tab>` to complete the
inputs of both flakes.
Also do tilde expansion so that `nix build ~/flake --update-input <Tab>`
works.
Useful because a default `sudo` on darwin doesn't clear `$HOME`, so things like `sudo nix-channel --list`
will surprisingly return the USER'S channels, rather than `root`'s.
Other counterintuitive outcomes can be seen in this PR description:
https://github.com/NixOS/nix/pull/6622
Basically an attempt to resume fixing #5543 for a breakage introduced
earlier[1]. Basically, when evaluating an older `nixpkgs` with
`nix-shell` the following error occurs:
λ ma27 [~] → nix-shell -I nixpkgs=channel:nixos-18.03 -p nix
error: anonymous function at /nix/store/zakqwc529rb6xcj8pwixjsxscvlx9fbi-source/pkgs/top-level/default.nix:20:1 called with unexpected argument 'inNixShell'
at /nix/store/zakqwc529rb6xcj8pwixjsxscvlx9fbi-source/pkgs/top-level/impure.nix:82:1:
81|
82| import ./. (builtins.removeAttrs args [ "system" "platform" ] // {
| ^
83| inherit config overlays crossSystem;
This is a problem because one of the main selling points of Nix is that
you can evaluate any old Nix expression and still get the same result
(which also means that it *still evaluates*). In fact we're deprecating,
but not removing a lot of stuff for that reason such as unquoted URLs[2]
or `builtins.toPath`. However this property was essentially thrown away
here.
The change is rather simple: check if `inNixShell` is specified in the
formals of an auto-called function. This means that
{ inNixShell ? false }:
builtins.trace inNixShell
(with import <nixpkgs> { }; makeShell { name = "foo"; })
will show `trace: true` while
args@{ ... }:
builtins.trace args.inNixShell
(with import <nixpkgs> { }; makeShell { name = "foo"; })
will throw the following error:
error: attribute 'inNixShell' missing
This is explicitly needed because the function in
`pkgs/top-level/impure.nix` of e.g. NixOS 18.03 has an ellipsis[3], but
passes the attribute-set on to another lambda with formals that doesn't
have an ellipsis anymore (hence the error from above). This was perhaps
a mistake, but we can't fix it anymore. This also means that there's
AFAICS no proper way to check if the attr-set that's passed to the Nix
code via `EvalState::autoCallFunction` is eventually passed to a lambda
with formals where `inNixShell` is missing.
However, this fix comes with a certain price. Essentially every
`shell.nix` that assumes `inNixShell` to be passed to the formals even
without explicitly specifying it would break with this[4]. However I think
that this is ugly, but preferable:
* Nix 2.3 was declared stable by NixOS up until recently (well, it still
is as long as 21.11 is alive), so most people might not have even
noticed that feature.
* We're talking about a way shorter time-span with this change being
in the wild, so the fallout should be smaller IMHO.
[1] 9d612c393a
[2] https://github.com/NixOS/rfcs/pull/45#issuecomment-488232537
[3] https://github.com/NixOS/nixpkgs/blob/release-18.03/pkgs/top-level/impure.nix#L75
[4] See e.g. the second expression in this commit-message or the changes
for `tests/ca/nix-shell.sh`.
Overrides for inputs with flake=false were non-sticky, since they
changed the `original` in `flake.lock`. This fixes it, by using the same
locked original for both flake and non-flake inputs.
nixos/nix#6290 introduced a regex pattern to account for tags when
resolving sourcehut refs. nixos/nix#4638 reafactored the code,
accidentally treating the pattern as a regular string, causing all
non-HEAD ref resolving to break.
This fixes the regression and adds more test cases to avoid future
breakage.
The manpage for `getgrouplist` says:
> If the number of groups of which user is a member is less than or
> equal to *ngroups, then the value *ngroups is returned.
>
> If the user is a member of more than *ngroups groups, then
> getgrouplist() returns -1. In this case, the value returned in
> *ngroups can be used to resize the buffer passed to a further
> call getgrouplist().
In our original code, however, we allocated a list of size `10` and, if
`getgrouplist` returned `-1` threw an exception. In practice, this
caused the code to fail for any user belonging to more than 10 groups.
While unusual for single-user systems, large companies commonly have a
huge number of POSIX groups users belong to, causing this issue to crop
up and make multi-user Nix unusable in such settings.
The fix is relatively simple, when `getgrouplist` fails, it stores the
real number of GIDs in `ngroups`, so we must resize our list and retry.
Only then, if it errors once more, we can raise an exception.
This should be backported to, at least, 2.9.x.
Bring back the possibility to copy CA paths with no reference (like the
outputs of FO derivations or stuff imported at eval time) between stores
that have a different prefix.
If a package's attribute path, description or name contains matches for any of the
regexes specified via `-e` or `--exclude` that package is excluded from
the final output.
Currently nix-build prints the "printMissing" information by default,
nix build doesn’t.
People generally don‘t notice this because the standard log-format of
nix build would not display the printMissing
output long enough to perceive the information.
This addresses https://github.com/NixOS/nix/issues/6561
To quote Eelco in #5867:
> Unfortunately we can't do
>
> evalSettings.pureEval.setDefault(false);
>
> because then we have to do the same in main.cc (where
> pureEval is set to true), and that would allow pure-eval
> to be disabled globally from nix.conf.
Instead, a command should specify that it should be impure by
default. Then, `evalSettings.pureEval` will be set to `false;` unless
it's overridden by e.g. a CLI flag.
In that case it's IMHO OK to be (theoretically) able to override
`pure-eval` via `nix.conf` because it doesn't have an effect on commands
where `forceImpureByDefault` returns `false` (i.e. everything where pure
eval actually matters).
Closes#5867
The git fetcher code used to dereference the (potentially empty) `ref`
input attribute. This was magically working, probably because the
compiler somehow outsmarted us, but is now blowing up with newer nixpkgs
versions.
Fix that by not trying to access this field while we don't know for sure
that it has been defined.
Fix#6554
Without the change llvm build fails on this week's gcc-13 snapshot as:
src/libutil/json.cc: In function 'void nix::toJSON(std::ostream&, const char*, const char*)':
src/libutil/json.cc:33:22: error: 'uint16_t' was not declared in this scope
33 | put(hex[(uint16_t(*i) >> 12) & 0xf]);
| ^~~~~~~~
src/libutil/json.cc:5:1: note: 'uint16_t' is defined in header '<cstdint>'; did you forget to '#include <cstdint>'?
4 | #include <cstring>
+++ |+#include <cstdint>
5 |
This solves the error
error: cannot connect to socket at '/nix/var/nix/daemon-socket/socket': Connection refused
on build farm systems that are loaded but operating normally.
I've seen this happen on an M1 mac running a loaded hercules-ci-agent.
Hercules CI uses multiple worker processes, which may connect to
the Nix daemon around the same time. It's not unthinkable that
the Nix daemon listening process isn't scheduled until after 6
workers try to connect, especially on a system under load with
many workers.
Is the increase safe?
The number is the number of connections that the kernel will buffer
while the listening process hasn't `accept`-ed them yet.
It did not - and will not - restrict the total number of daemon
forks that a client can create.
History
The number 5 has remained unchanged since the introduction in
nix-worker with 0130ef88ea in 2006.