RFE: revision of hydra-eval-jobset 'build' type attr filter #31
Loading…
Add table
Add a link
Reference in a new issue
No description provided.
Delete branch "%!s()"
Deleting a branch is permanent. Although the deleted branch may continue to exist for a short time before it actually gets removed, it CANNOT be undone in most cases. Continue?
The architecture doc lists
BuildInputs
as "probably obsolute [sic]".hydra-eval-jobset however does provide the means to filter by attrs on that table.
Given that the table is obsolete (empty on my hydra to be specific), it would be cool if we could change that to filter on the jobsetevalinputs instead (by name and revision for example), this would allow running a job only when it's triggered with the same revision of a repository.
For me that'd mean being able to ensure to run a diff (nix-diff/nvd) runs against the exact same revision of the base repository when only additional inputs changed.
Tell me if you want me to create this issue upstream instead (but given that constituents were recently reintroduced in this repo I'm not sure where to ask for features anymore ^^")
OK, disclaimer: this is a feature of Hydra that I actually haven't used (as far as I can remember).
First of all: I do understand the code right that the idea is to use a previous Hydra build from any project (that's read out of the input name with the beautiful
parseJobName
inhydra-eval-jobset
, right?). And the BuildInputs are just additional things to restrict the previous build, correct?I do agree that it's probably obsolete: not only is it empty here as well (also on work Hydra) and I didn't find any code writing to this table. I'd probably check when the last code writing to it got dropped, but I'd be in favor of yeeting it.
So the idea would be to use the syntax from above, but to filter "Previous Hydra builds" by its inputs? I.e. the hydra-eval-jobset thing would have to join to jobsetevalinputs (don't know from the top of my head how many relations to resolve for this) and then filter over this, right?
OK, i'm kinda curious about how your setup looks like: Do you have an nvd job diffing two closures and just check the logs of it whenever needed? Or what does this look like?
I'll be extremely verbose here for a bit, just to make sure we're both on the same page.
The
fetchBuildInputs
routine fetches another hydra build as an input for the current job, yes.When you add an input in hydra it's that tuple of "name", "type", and "source" (like
nixpkgs
,git
, andhttps://example.com/repo branch-name
).The type
build
is the relevant here, not to be confused witheval
andsysbuild
.I have no clue what
sysbuild
does to be honest, because I can't for the life of me figure out what build exactly it's querying, but it seems to query builds too (i.e. one singular output of one eval of a jobset).eval
on the other hand gives you an attrset of all the outputs of an evaluation.With that said,
build
fetches a build, likesysbuild
does, but with different criteria.The criteria are the return value of
parseJobName
, which gets as a parameter the "source" as mentioned above.It has that funny
project:jobset:build [attrs]
syntax where it takes the latestbuild
of any evaluation ofjobset
ofproject
which matchesattrs
.Since
attrs
filters the "buildinputs" I assume they used to be the specific revisions of repositories passed into a build (the stuff that gets passed tonix-eval-jobs
I guess?), which is empty nowadays.Given that the build input is now equal to the input of the specific evaluation (which gets passed to
hydra-eval-jobs
) of a jobset (since the same inputs yield a different build which then states that the result was "cached from build $number"), it would be nice to filter those same inputs, but on the evaluation level.To give a concrete example, if my
.jobset
pulls in a repository of a client-server software and its refs it could in theory set theattrs
in a way that includes the specific revision of a ref.If you have jobsets Client, Server, and Integration Test, where Integration Test uses the outputs of the other two (to ensure that the exact build artifacts which are shipped are used in the test, rather than it evaluating the thing by itself), then you want both Client and Server to be builds of the same revision.
Now if the attrs could pin that revision it would avoid one of Client and Server finishing before the other, and the Integration Test kicking off with an old version of the Server but the current version of the Client; both a waste of resources and potentially failing the pipeline causing dismay among devs of have to ignore half the builds in the jobset.
Note that two evals which evaluate to the same derivation will still cause two builds to be created, with one stating "cached build of $other", which probably explains why inputs were tied to builds instead of evals earlier.
Currently this queries the builds (taking the most recent one) and filters it by "finished successfully" and the reference to the jobset (provided as input), and the job (the name of the build within one eval of a jobset), but also has a subquery which uses the attrs to filter the joined build inputs, which is the presumably obsolete and empty table.
This syntax does not currently match the requirements to get the jobsetevalinputs (tbf looking at it I'm confused how it met the buildinputs one) since the input looks like
myproject:myjobset:myjob [mysource="something?"]
, which could be used to match "something?" with the revision of the jobsetevalinputs, however it'd be cool if it were generic on that since not every input is a git repository.In any case it would require joining from the builds via the table that covers the m:n relationship between builds and jobseteval (which given the above mentioned difference that presumably made buildinputs obsolete probably is actually 1:n and wouldn't require that table, but I'm not sure), to the jobseteval, which should be directly joinable to the jobsetevalinputs since that is a 1:n relationship if memory serves (I had a brief dive into that DB when hydra refused to delete a project due to foreign key constraints, so I know something but I'm not a reliable source of course), where it would use the key in the
attrs
syntax as the jobsetevalinput "name" and the value to filter the columns somehow (such as by checking value == revision, with the above mentioned "not everything is a git repo" caveat).I used the server-client application example here because it's clearer, but you can find the full explanation of what I'm doing in particular over in #33.
The short version is, I have several NixOS builds in hydra (i.e. my laptop, the lot of the servers, etc) and I run both nvd and nix-diff over the entirety of the nodes/machines/nixos instances/whatever you call them (they're both containers and physical machines and headless ones as well as the laptop I'm typing this on, so "servers" can be misleading).
This only happens for select branches, meaning I have my main branch, which builds normally (well.… my setup already does a reverse flake-parts and IFD to literally patch nixpkgs, so "normal" is a stretch I guess), and then I have "next" which pulls in the current nixpkgs/nixos-24.11 branch, so I can see if any upstream changes broke something (I also use this branch to update my machines without running the eval on them, they just curl the latest output and then
nix build --profile
that directly into the system profile and switch to it), and another branch called head, which does the same for all inputs, meaning I can see when changes in the lix repo break hydra builds (#32), and the good old staging branch that I can manually push to and get a diff of my changes.So before running
nix flake update --commit-lock-file
I can first check the current head-to-main nix-diff (I would link it here but I accidentally did bad things to my nix cache S3 bucket after it started failing due to the issue above) to see if some unexpected config changes happened. I pull in the agenix module for instance, so if agenix suddenly had some changes to file paths that need my attention before pulling that in I can see that, or in the nvd diff I can see whether lix gets a minor version update so I can check the blog for what's new.And similarly when I make changes I can push to staging to get the diff to main which are (usually) using the same dependencies in their inputs so I can see what my changes cause in the derivation, so when changing something in
services.nginx
I can verify that the configuration file ends up the way I expect.However when I work on multiple branches at the same time, like noticing a misconfiguration while working on staging and pushing the fix to main and rebasing staging, then I may end up with a diff that was built against a staging branch before the rebase and a main after the push, so I get the reverse of the commit I pushed in the diff which is a) confusing, and b) caused an unnecessary diff which can be a bit costly since by nature it has to depend on the .drv files for diffing and therefore has to push more than just the build outputs into my bucket (which is slooooow with a geographically distributed Ceph cluster running on HDDs >.<)
OK, I think I understand now.
So the main objective is to essentially have multiple jobsets that use the exact same revisions for some of the inputs, right? (Not all of them, to regularly build against upstream changes in e.g. Lix AFAIU). And the way you specify the inputs (ref, or maybe even a rev?) is by automating this within declarative jobsets.
Personally, I think I'd prefer to have some more structured data for configurations like this, but this would mean larger changes to the schema and the UI and I don't even have a good idea how it should look like.
On top of that, this behavior isn't documented, the currently used table is empty and I don't know how many Lix Hydra are out there (except for ours and hydra.forkos.org).
So, assuming my understanding is correct I'd say we can do it, but merging may take until we figured out something more than we currently have regarding a changelog. I won't do actual Hydra releases (just do maintenance for Lix branches), but not having people look through the commit history to find out about breakage would be nice.
OT: I must say the
parseJobName
function is still my favorite piece of Perl code so far ;)@ma27 wrote in #31 (comment):
nit: multiple jobsets that use the same exact revision and an additional jobset that depends on jobs from two or more of the others where the revision of the jobinput of the specific job is equal (yes, the length length difference of your definition and my emphasized text reflects the proportional increase in SQL query complexity).
Given that I have a list of the repo's refs as the input for a generated declarative project I can technically already generate
A[rev=abcdef]
andB[rev=abcdef]
by injecting the rev of the ref into the jobspec, however I cannot haveC[A[rev=abcdef], B[rev=abcdef]]
since specifying that extra bit of info forbuild
type inputs is precisely what[attrs]
used to do its thing.Same ref would not work since the refs get updates (read: I push to branches, and I'm not just testing tags) and only by pinning the jobinput of the job that is pulled as a jobinput via
build
to a rev can I ensure that a push to a branch creates atomic jobset updates.To take the above syntax;
C[A[ref=main], B[ref=main]]
is still racy whenever main updates to a new rev.Though I think I recently learned out that it's not racy on "build finished" but racy on either "eval finished" or maybe even "inputs updated", which is a lot less racy, but still racy since type
build
, as far as I know (could be wrong) is polled, and therefore not atomic on an update of the .jobset of a declarative generated project.If it is atomic then this whole problem goes away with a metric ton of annoying nix boilerplate.
It would be even nicer if one could use
C[A[input1[ref=main, rev=[B[input1[rev]]]]], B[input1[ref=main]]]
since it allows polled inputs (which for me is not that much of a concern since the only polled inputs on A are the primary repo, since I have the flake-like "this exact set of inputs" build an A, and the "poll these inputs instead" on B, which allows the whole diff in C. So I only ever have a single input for the entirety of A, thus it is feasible to inject the revs on that (with multiple refs I'd have to pull another input on .jobset per repo which.… possible but eww.For my specific feature request here, which basically revives an non-functional feature, it'd be enough to revive it with the static rev support, the rev-by-reference would be overkill and rings "this is a new feature and not just a different SQL query" bells.@ma27 wrote in #31 (comment):
That's not a problem for me.
I'll happily overlay the heck out of hydra, and while I would get the SQL part done myself, I would struggle with the Perl part since the parsing would have to change ever so slightly.
Any code with "hm, maybe I should stop programming [insert language here] before it's too late..." is *chef's kiss*
I'm not sure if I follow what this is: is C a job here that has the jobs A & B as input?
Being able to do
A[rev=B[rev]]
(read: job input A needs to have the same rev as job input B) would be kinda nice already, that way you could "pin" a jobset eval to a single revision, without using declarative jobsets, right?As a follow-up, it may also be nice do something similar to eval inputs, right? After skimming through hydra-eval-jobset this isn't possible yet (but my brain feels a bit like a sieve right now, so I may be wrong on that).
Anyways: yeah, I think what you're suggestion is a first step in the right direction with my suggestions being potential follow-ups. 👍
WDYT?
Correct. I am really struggling to write this down in a way that makes sense.
Well, technically you can nail down any input to a revision even without declarative jobsets[citation needed], it just becomes a pain to update since you'd have to hard-code it into every single one.
However it would make the whole thing a lot more ergonomic.
So if I understand you correctly then yes, it'd be a neat thing to have.
I can't think of a reason why it wouldn't be a good feature, but at the same time I am struggling to think of the use-case in general right now. So FWIW you're probably on the right track about adding this as a potential feature, I just can't confirm it outright.
Sieve brain club! (turns out switching back and forth between heavily typed Rust spaghetti code and Nix is making my head hurt)
I agree on pretty much everything in this context so yeah.
Just one side note; it'd be great if the syntax/way of specifying revs for dependencies and all that could be somewhat unified between the different types of input (unlike the current system where eval, build, and sysbuild all have effectively their own parser without sharing code).
So before implementing any one syntax it might help to sit down for a minute or two and think about whether the same syntax works for the other input types and maybe adjust if there's an easy fix for it.
But that's just a bonus after all.
fwiw, that's why I didn't consider this a viable option ;)
I think this only allows a project/jobset/job triple anyways? Not sure if this actually needs more advanced rules?
Anyways, is there anything we'll have to discuss on that?
Btw I found out about
748c3409b4
: which is another strong indication that the attrsToSQL part was never updated.