Support for transparently substituting git mirrors (mapping between URLs pointing to the same git resource) #444

New issue

Open

opened 2024-07-11 20:19:32 +00:00 by crabdancing · 5 comments

crabdancing commented

2024-07-11 20:19:32 +00:00

Behind networks with slow internet access, or IP addresses that would be flagged/throttled by services like GitHub, downloading updates for nixpkgs and other repos can be surprisingly slow. It's also unfortunate that so much Nix infrastructure relies on unfree services like Github.

Describe the solution you'd like

The ability to transparently substitute a git repo, such that it's fetching from a local repo instead. It is possible, in principle, to replace the flake inputs for everything, but the inputs of those inputs then seem to still rely on GitHub if upstream is using GitHub.

The idea here would be, e.g., if upstream is using github:NixOS/nixpkgs/326341, it could be transparently substituted with git.myserver.com/mirrors/nixpkgs?ref=326341. Any locked input would behave the same, fetching the same exact hash.

Describe alternatives you've considered

I'm currently just forking repos and hard-changing their inputs. There are also mechanisms that can imperatively override inputs such as --override-input flag, but they don't work recursively, and they only map a given input name to the URL, instead of one URL to another URL. The problem with this is that it cannot work recursively, as input names are not constrained and the same resource might be called something else depending on which flake you're working with.

Additional context

## Is your feature request related to a problem? Please describe. Behind networks with slow internet access, or IP addresses that would be flagged/throttled by services like GitHub, downloading updates for `nixpkgs` and other repos can be surprisingly slow. It's also unfortunate that so much Nix infrastructure relies on unfree services like Github. ## Describe the solution you'd like The ability to transparently substitute a git repo, such that it's fetching from a local repo instead. It is possible, in principle, to replace the flake inputs for everything, but the `inputs` of those `inputs` then seem to still rely on GitHub if upstream is using GitHub. The idea here would be, e.g., if upstream is using `github:NixOS/nixpkgs/326341`, it could be transparently substituted with `git.myserver.com/mirrors/nixpkgs?ref=326341`. Any locked input would behave the same, fetching the same exact hash. ## Describe alternatives you've considered I'm currently just forking repos and hard-changing their inputs. There are also mechanisms that can imperatively override inputs such as [--override-input](https://siraben.dev/2022/02/13/nix-flake-hacks.html) flag, but they don't work recursively, and they only map a given input _name_ to the URL, instead of one URL to another URL. The problem with this is that it _cannot_ work recursively, as input names are not constrained and the same resource might be called something else depending on which flake you're working with. ## Additional context

jade commented

2024-07-12 18:46:39 +00:00

Owner

Have you tried flake registry abuses? If I understand correctly, although the registry add command now blocks this on main, I believe you can make a registry entry for github:nixos/nixpkgs/nixos-unstable. However this doesn't entirely help because I think you'd have to re-lock the inputs or something.

Have you tried flake registry abuses? If I understand correctly, although the registry add command now blocks this on `main`, I believe you can make a registry entry for `github:nixos/nixpkgs/nixos-unstable`. However this doesn't entirely help because I think you'd have to re-lock the inputs or something.

crabdancing commented

2024-07-12 21:56:31 +00:00

Author

Even not accounting for locking issues, some repos have commit-specific stuff in their flake inputs. For example, flake-parts. So I would need an entire extra bit of infrastructure just to track their inputs and adjust my "registry abuse" mapping to get it to pull from the local repo instead.

What's needed to have an actually robust 'sources substitution mechanism', is to have some way of mapping the git repo root e.g. https://github.com/NixOS/nixpkgs/ -> https://git.myserver.com/mirrors/nixpkgs/. Protocol-specific mappings seem entirely desirable, since different protocols may impose different requirements (e.g., port overrides).

This means e.g. https://github.com/NixOS/nixpkgs/archive/5daf0514482af3f97abaefc78a6606365c9108e2.tar.gz transparently becomes https://git.myserver.com/mirrors/nixpkgs/archive/5daf0514482af3f97abaefc78a6606365c9108e2.tar.gz. (I tested this transform, and it works perfectly with a forgejo backend as long as the mirror is sufficiently up-to-date). Or, e.g., you could map git+ssh://git.someserver.com/myuser/nixrepo to git+ssh://some-other-server.com:2222/othernixrepo. Anything after the fact, e.g., pinning ?ref=* should be kept identical for reproducible results (lest we accidentally substitute one commit with another).

I think the ideal solution to this, if implemented, should allow:

imperatively switching repo roots. (e.g., via CLI or env var)
persistently, declaratively mapping repo roots.
deliberately mapping source repos to 'null' or otherwise invalid URLs, e.g., to prevent a known-compromised or otherwise malicious git repo from being loaded as a dependency of the flake tree.
A failover mechanism (substituting one git repo source with another) would be excellent too, for robustness (i.e., if git.server1.com breaks -- it tries git.server2.com) but not strictly necessary.

Side notes:

It would be best to resolve the flake registry entry first, and then do the transparent mapping. E.g., a flake like nixpkgs/nixos-24.05 input would resolve to github:NixOS/nixpkgs/nixos-24.05, which then resolves to https://github.com/NixOS/nixpkgs/nixos-24.05, which is then mapped to https://git.myserver.com/mirrors/nixpkgs/nixos-24.05.

IDK if anyone would actually want to work on this, but I thought it would be nice to put the idea out there, as I was already frustrated with how much NixOS architecture is entangled with GitHub, on account of upstream development being done there -- and I thought some other people might feel the same way. :)

Even not accounting for locking issues, some repos have commit-specific stuff in their flake inputs. For example, [flake-parts](https://github.com/hercules-ci/flake-parts/blob/main/flake.nix). So I would need an entire extra bit of infrastructure just to track their inputs and adjust my "registry abuse" mapping to get it to pull from the local repo instead. What's needed to have an actually robust 'sources substitution mechanism', is to have some way of mapping the git repo root e.g. `https://github.com/NixOS/nixpkgs/` -> `https://git.myserver.com/mirrors/nixpkgs/`. Protocol-specific mappings seem entirely desirable, since different protocols may impose different requirements (e.g., port overrides). This means e.g. `https://github.com/NixOS/nixpkgs/archive/5daf0514482af3f97abaefc78a6606365c9108e2.tar.gz` transparently becomes `https://git.myserver.com/mirrors/nixpkgs/archive/5daf0514482af3f97abaefc78a6606365c9108e2.tar.gz`. (I tested this transform, and it works perfectly with a forgejo backend as long as the mirror is sufficiently up-to-date). Or, e.g., you could map `git+ssh://git.someserver.com/myuser/nixrepo` to `git+ssh://some-other-server.com:2222/othernixrepo`. Anything after the fact, e.g., pinning `?ref=*` should be kept identical for reproducible results (lest we accidentally substitute one commit with another). I think the ideal solution to this, if implemented, should allow: 1) imperatively switching repo roots. (e.g., via CLI or env var) 2) persistently, declaratively mapping repo roots. 3) deliberately mapping source repos to 'null' or otherwise invalid URLs, e.g., to prevent a known-compromised or otherwise malicious git repo from being loaded as a dependency of the flake tree. A failover mechanism (substituting one git repo source with another) would be excellent too, for robustness (i.e., if `git.server1.com` breaks -- it tries `git.server2.com`) but not strictly necessary. Side notes: It would be best to resolve the flake registry entry _first_, and then do the transparent mapping. E.g., a flake like `nixpkgs/nixos-24.05` input would resolve to ` github:NixOS/nixpkgs/nixos-24.05`, which then resolves to `https://github.com/NixOS/nixpkgs/nixos-24.05`, which is then mapped to `https://git.myserver.com/mirrors/nixpkgs/nixos-24.05`. IDK if anyone would actually want to work on this, but I thought it would be nice to put the idea out there, as I was already frustrated with how much NixOS architecture is entangled with GitHub, on account of upstream development being done there -- and I thought some other people might feel the same way. :)

toastal commented

2024-08-03 05:46:37 +00:00

This throttling issue happens regularly for me in Asia, but why transparent? Why not extend Flake input.url = string to input.urls = string list & users supply their mirrors explicitly? The precedence would be the fetchers like fetchzip that take a url or urls which allow multiple mirrors (not sure if it is a race or how the resolution is done, but it does work). This would also then be more generic than Git. I rely on tarballs for my inputs a lot since I like using Darcs & Pijul which are not supported as Flake inputs (unlike fetch* from Nixpkgs), but you can get a tarball archive.

This throttling issue happens regularly for me in Asia, but why *transparent*? Why not extend Flake `input.url = string` to `input.urls = string list` & users supply their mirrors explicitly? The precedence would be the fetchers like `fetchzip` that take a `url` or `urls` which allow multiple mirrors (not sure if it is a race or how the resolution is done, but it does work). This would also then be more generic than Git. I rely on tarballs for my inputs a lot since I like using Darcs & Pijul which are not supported as Flake inputs (unlike `fetch*` from Nixpkgs), but you *can* get a tarball archive.

crabdancing commented

2024-08-03 18:30:14 +00:00

Author

@toastal

Transparent because if it's not transparent and global, the user does not control which mirrors are used -- upstream flakes do. You can in principle fork a flake if where it fetches from is a problem, but a flake dependency of a flake dependency of a flake dependency? What if flake-parts does not feel like adding the particular mirror that you need? What if you're working on a corporate or personal intranet and are setting up your own mirror? By making it transparently substitute inputs, the user can actually control where the resources are fetched from in niche cases, without sacrificing reproducibility, and without putting the onus on upstream (which may or may not cooperate with various use cases).

@toastal Transparent because if it's not transparent and global, the user does not control which mirrors are used -- upstream flakes do. You can in principle fork a flake if where it fetches from is a problem, but a flake dependency of a flake dependency of a flake dependency? What if `flake-parts` does not feel like adding the particular mirror that you need? What if you're working on a corporate or personal intranet and are setting up your own mirror? By making it transparently substitute inputs, the user can actually control where the resources are fetched from in niche cases, without sacrificing reproducibility, and without putting the onus on upstream (which may or may not cooperate with various use cases).

toastal commented

2024-08-04 04:18:37 +00:00

That makes sense & it means my request is different.