Generic eval service #4

ma27 · 2026-01-04T13:17:25Z

ma27 commented

2026-01-04 13:17:25 +00:00

The goal is to have something that can substitute hydra-eval-jobset shelling out to nix-eval-jobs, including a way to push drvs into a remote store (S3 only at the time).

The goal is to have something that can substitute `hydra-eval-jobset` shelling out to `nix-eval-jobs`, including a way to push drvs into a remote store (S3 only at the time).

ma27 added 71 commits

2026-01-04 13:17:26 +00:00

Add endpoint + functionality to evaluate arbitrary repositories 206a3a982e

This essentially mimics the traditional Hydra way of evaluating stuff: a
"jobset" consists of a number of inputs and the file to evaluate is a
path relative to a selected input. This is the first step towards making
this useful for Hydra.

So far, the following differences exist:

* We accept only libfetchers-style URLs for git, not Hydra-style URLs
  (i.e. `git-url branch`). The former can be trivially constructed given
  the latter though.

* Only git inputs are accepted, yet. Also, the libfetchers-support is
  far from complete, e.g. submodules are not implemented, yet.

* Hydra determines revCount et al. and passes it to the evaluation which
  is e.g. important for nixpkgs. This is also missing here, so far.

Before, the identifier of an evaluation was the nixpkgs revision, but this
is not unique anymore. Hence, we generate a fingerprint that is a hash
of all input URLs, input revisions, names and the input to evaluate.
This hash is used for the log-file and in all messages to identify the
evaluation in the server logs.

evaluation: refactor entrypoint to return list OR ControlMessage c180779943

There's no place where two consecutive control messages are supposed to
be returned, so it doesn't make sense to wrap those in a list.

Switch to uv+uv2nix+lon+sprinkles d6e28a47a7

The old build-system was kinda old already. Both in terms of nixpkgs
stuff and in terms of Python tooling. Since it's only me working on it,
let's turn it into another playground for uv+uv2nix to find out if this
is something I want to do in more projects.

evaluation: join old nixpkgs entrypoint end generic entrypoint into a single one e4223a9aee

* A bit more reformat than I would like since I fixed pre-commit after
  the first commit.

* Fix parquet (doesn't seem to work with dicts, though this is just a
  temporary measure)

But on the bright side, we have a single entrypoint for everyting now.
Yay!

api: terminate immediately on FETCH_ERROR 37cdaf5b0c

This is a very fatal error, i.e. nothing that we could cache (we cannot
even fingerprint the evaluation). So let's stop early in this case.

evaluation: resolve git rev before checking out the repository 875e4845f3

Apparently, doing a git fetch of a ref doesn't make the ref available in
the checkout. Hence, we'll now resolve it on the remote-side and then
only work with the rev instead of ref.

git: refactor git command execution e5cc16fd1f

Introduce a function execute_git_command that's not part of GitRepo
since it is necessary for e.g. resolving refs to revs on remote
repository which happens at a time, no GitRepo was instantiated yet.

Also, use `create_subprocess_exec` since we don't have to worry about
shell injection of we don't use a shell.

api: don't do streaming of parquet for now 8471e48097

This isn't as easy as I hoped and I'm wondering how bad this really is.

evaluation: actually fetch inputs concurrently 643ad960ab

api.evaluation.inputs: init 262ea3c1c8

We not only need `git`, but also other stuff such as plain Nix
expressions, dicts (I figured that's a nicer option than Nix code in a
lot of cases) and further VCS inputs in the future.

This is essentially the git/fetching part abstracted away from before.

evaluation: work with log identifiers 02c7fa335a

main: Add lock for evaluation d0dfdefd7c

Usually, you don't want to run too many evaluations at once.
Since there's no standard way of doing a semaphore between multiple
processes with the lock being a file, we'll just allow one evaluation
per worker.

treewide; use uvicorn.error for now to actually have logs shown in output c9d22aed9e

This should be reconsidered eventually.

evaluation: add all properties or stub values for git repos 148673c96a

evaluation: human-readable codes d5146ffc90

evolive: add support for restrict-eval 33fa073c13

evaluation: prohibit ifd 04d154d239

api: add structlog (to main only, for now) d2e538e74b

main: use semaphore for concurrency check 56951a50ad

main: make threadpool size configurable b963cfe8b5

treewide: stronger structlog use f679259603

evaluation/inputs: make shallow configurable 77d47023c5

After a bit of research, I don't think there's a general-purpose way
of obtaining metadata such as rev count for shallow clones.

I think it's still valuable to keep shallow=false here since the target
audience is nixpkgs evaluations, but you may still want to have full
metadata, e.g. for "release" evaluations. I'm not sure if it's a wise
choice to use the same repository for this, but we'll see about that.

lon: update 6a4f5cf181

treewide: otel traces 11c3c3838c

evaluation: more span attributes 2b7fe79401

evaluation, git: don't pass around single tracer, instantiate per module, more spans 7aec9929fb

nix: init module 18703bea63

evaluation: make workers and memory size configurable 2ba0d92281

drvs: add websocket API to import derivations f5335819c8

The idea is to have a websocket connection that is left open while
streaming the response from evolive's evaluation endpoint. Whenever you
get a new derivation, you request its transfer on this websocket
endpoint.

This is long-lived to avoid retransmissions as much as possible: if a
path or a .drv was already transferred, don't do this again.

Internally, this does the following:

* compute the drv's closure via the daemon socket: this gets hopefully
  replaced by an RPC call soon because this implementation sucks ass.
  But still better than shelling out for this.

* Dump a path to a NAR by hand. This happens on purpose like this
  because streaming NARs from the daemon is a dumb idea (requires
  parsing the NAR to know when the stream ends) and we support a local
  store anyways. While we could do chroot stores in the future under a
  different prefix, experience has shown that evaluation on any other
  more or less remote store is incredibly slow and thus irrelevant for
  this project.

drvs: add more traces, move websocket code into its own function af5c54706d

api/config: make daemon socket path configurable cf330ba008

api/drvs: use vars for magic values 32153dc3ee

main: add trace_id to log messages 8b060916ac

api: make log level configurable cc2746fcfe

drvs: provide nix-store export format ee0f3ebc0a

That way information like the store-path is included and import is
hopefully easier.

drvs: split out proto code e090a6ece3

drvs: close drv-closure span when closure exists 460083d238

Yielding from within keeps it alive for the rest of the session.

api: use trace id instead of hand-baked log id a16921cd80

api: await & close ws sesion of drv server when no new drvs are enqueued for a given time 466784eaea

drvs: handle exceptions at the top-most level f2e56fb82b

If something goes wrong very badly, error out early instead of awaiting
the timeout when the task gets awaited. The latter is still useful as
last resort, but it's still better to give this feedback as early as
possible.

drvs: lock websocket usage b6e046cc1e

Next step will be to add tasks for all the drvs to serve. However, we
still have to send some messages continuously, so we lock here.

drvs: gracefully handle disconnect, fix chunking 80bdafea64

README: document rough ideas 77a0a84fd9

drvs: fix non-drv case of opQueryValidPathInfo 770c4b4c9d

drvs: rm todos 130fba0ece

Slow, but working implementation of push-based drv persistence f88a013cd1

drvs: parallelize checking for narinfo existance 27f6028140

main: initialize s3 store early f61ed0406e

drvs/s3: use hash_part property 6298a169f4

drvs: process drvs in generations, exit early when path already exists 9433871c98

drvs: make s3 semaphore part of the class da94de3cbd

drvs/s3: don't check for NAR existence with HEAD, make compression configurable dcf3743715

Upload performance now matches Nix CLI!

drvs: update header 891f3acbd7

main: create tasks for each s3 upload b7d8deff40

main: remove FIXME about parquet 541aa496d7

The overhead is negligible. It costs additional 2s on a full nixpkgs
evaluation and far less on smaller ones.

main: drop legacy endpoint 4e093d044f

drvs/s3: perf improvements to get upload speed down on a cold cache 7961d5ea0e

* No locking: the synchronisation has a major overhead (I've had
  slowdowns of up to 100% compared to the time it takes now). If we do
  upload one thing twice, it's most likely less bad than the overall
  overhead of synchronisation.

* Remove the semaphore: one my first shot yesterday I observed the
  upload jobs to hang and suspected this might be some problem with too
  many tasks, hence the semaphore. As it turns out, this was probably
  wrong, uploading still works fine.

nix: add config option for s3 creds 62e9d6c1e5

drvs: implement nix-copy to s3 11593df14a

The problem with running `nix copy` itself is that it'd become pretty
slow over time to shell out for each store-path. Also, the entire logic
is handled on the client-side of Nix, so we cannot leverage a worker
operation for that. Hence, this essentially does the following now for
each instantiated job J:

* computeFSClosure(J).
* upload paths in topological generations such that the S3 store remains
  consistent.

Additionally, a few small improvements were added:

* Cap the amount of concurrent uploads to s3 itself and instantiated
  jobs being uploaded. Each job gets its own Nix daemon-client, hence
  the locking isn't necessary anymore.

* Cache existence of paths in a local Redis.

* Use uvloop as runtime.

Also cache machine id per evaluation now.

Make amount of job uploads configurable 23221cf1bb

Add more config options, fix defaults 81c41652a6

Use REPO_ROOT for local paths bdd2ae5151

drvs: fix logging of nar stats fc600544eb

config: remove obsolete setting 4d12fa0907

drvs: only flush cache to redis when dirty 9e0a127211

drvs: improve debug log msgs 819d9899e0

drvs: name closure computation after the Lix implementation it is derived from f57e05bcbc

drvs: remove obsolete lock 1f01d64714

drvs: fix narinfo format 7603da731b

Add more details to the README f9d58db4aa

Push parquet files into S3 and check if it already exists 610ad444cf

raito reviewed

2026-01-04 13:22:09 +00:00

README.md

					
				@ -16,0 +86,4 @@

				  shallow clones. `shallow` must be explicitly turned off.

				- Paths are marked as uploaded before the actual job starts to prevent duplicated uploads. Also,

				  the in-memory cache isn't flushed immediately into Rest. This has two implications:

Redis

raito reviewed

2026-01-04 13:23:16 +00:00

nix/module.nix

					
				@ -0,0 +20,4 @@

				      '';

				    };

				    lixEvalJobsPackage = lib.mkOption {

interpreterPackages (like kernelPackages) ?

raito reviewed

2026-01-04 13:31:57 +00:00

src/api/main.py

					
				@ -47,0 +297,4 @@

				                            def write_parquet() -> None:

				                                df.write_parquet(

				                                    write_side,

				                                    metadata={

probably architecture/OS makes sense as well

raito approved these changes

2026-01-22 23:54:10 +00:00

This pull request can be merged automatically.

You are not authorized to merge this pull request.

View command line instructions

Checkout

From your project repository, check out a new branch and test the changes.

git fetch -u generic-eval-service:ma27-generic-eval-service

git switch ma27-generic-eval-service

Merge

Merge the changes and update on Forgejo.

Warning: The "Autodetect manual merge" setting is not enabled for this repository, you will have to mark this pull request as manually merged afterwards.

git switch main

git merge --no-ff ma27-generic-eval-service

git switch ma27-generic-eval-service

git rebase main

git switch main

git merge --ff-only ma27-generic-eval-service

git switch ma27-generic-eval-service

git rebase main

git switch main

git merge --no-ff ma27-generic-eval-service

git switch main

git merge --squash ma27-generic-eval-service

git switch main

git merge --ff-only ma27-generic-eval-service

git switch main

git merge ma27-generic-eval-service

git push origin main

Sign in to join this conversation.

No reviewers

No labels

No milestone

No project

No assignees

2 participants

Notifications

Due date

The due date is invalid or out of range. Please use the format "yyyy-mm-dd".

No due date set.

Dependencies

No dependencies set.

Reference

the-distro/evolive!4

Rows
Columns