WIP: Bring back constituents to Hydra #17

Draft
ma27 wants to merge 1 commit from ma27/nix-eval-jobs:bring-back-constituents into main
Member

This change is part of an ongoing discussion (lix-project/hydra#11) on whether or not to bring back constituents to Hydra. This needs support from the evaluator (previously hydra-eval-jobs, now nix-eval-jobs), hence this PR. I'll do my best to explain the current state and implications, my intent so far is to gather some more opinions on that topic.

Note: yes, the code is pretty ugly atm, I mostly hacked it together to have a proof of concept and to have the feature back working on my Hydra. Obviously, this will be revised if the decision is to bring it back.


Constituents are right now used for e.g. release jobs[1] where you specify other jobs as dependencies. Either as derivations or as strings being the job names, the evaluator will then rewrite the derivation of the release job to depend on the derivations the job names point to. This is a nasty hack, but brings a big performance boost[2]. Given that this is for jobs in general, it might even be useful for other CI systems.

In fact, the CppNix Hydra is also unsure how to proceed[5] and I also haven't gotten a definitive answer in the #hydra channel on Matrix.

This patch adds the feature back behind a CLI flag. It keeps the streaming nature of nix-eval-jobs for everything except those aggregates[3] which potentially need a rewrite at the end.

The behavior still has its problems that I started to fix as part of my dayjob[4]: if you have aggregate jobs depending on other aggregate jobs, the rewrites may happen in the wrong order and thus it's not guaranteed that aggregates only pass if their (transitive) constituents do. Also, having strings for a lot of stuff may be a little hairy, for that I implemented globbing. We may want to consider porting that or parts of that here as well.

So, my questions are:

  • would you consider this something worth adding back to nix-eval-jobs?
  • if we'd add this behavior (or something similar): what would need to be overhauled in your opinion?

cc @vriska @raito @lheckemann @delroth @jade (I probably forgot other people, feel free to chime in regardless)

[1] cc6e93a617/nixos/release-small.nix (L100)
[2] This - among higher parallelization with lower memory thresholds for each jobs - reduced the jobset eval time on my employer's Hydra from 15min to ~50s.
[3] I think this is fine given that jobsets usually have many constituents and very few aggregates.
[4] https://github.com/flyingcircusio/hydra/pull/1
[5] https://github.com/NixOS/hydra/pull/1421#discussion_r1844974067

This change is part of an ongoing discussion (https://git.lix.systems/lix-project/hydra/issues/11) on whether or not to bring back constituents to Hydra. This needs support from the evaluator (previously hydra-eval-jobs, now nix-eval-jobs), hence this PR. I'll do my best to explain the current state and implications, my intent so far is to gather some more opinions on that topic. __Note:__ yes, the code is pretty ugly atm, I mostly hacked it together to have a proof of concept and to have the feature back working on my Hydra. Obviously, this will be revised if the decision is to bring it back. --- Constituents are right now used for e.g. release jobs[1] where you specify other jobs as dependencies. Either as derivations or as strings being the job names, the evaluator will then rewrite the derivation of the release job to depend on the derivations the job names point to. This is a nasty hack, but brings a big performance boost[2]. Given that this is for jobs in general, it _might_ even be useful for other CI systems. In fact, the CppNix Hydra is also unsure how to proceed[5] and I also haven't gotten a definitive answer in the #hydra channel on Matrix. This patch adds the feature back behind a CLI flag. It keeps the streaming nature of nix-eval-jobs for everything except those aggregates[3] which potentially need a rewrite at the end. The behavior still has its problems that I started to fix as part of my dayjob[4]: if you have aggregate jobs depending on other aggregate jobs, the rewrites may happen in the wrong order and thus it's not guaranteed that aggregates only pass if their (transitive) constituents do. Also, having strings for a lot of stuff may be a little hairy, for that I implemented globbing. We may want to consider porting that or parts of that here as well. So, my questions are: * would you consider this something worth adding back to nix-eval-jobs? * if we'd add this behavior (or something similar): what would need to be overhauled in your opinion? cc @vriska @raito @lheckemann @delroth @jade (I probably forgot other people, feel free to chime in regardless) [1] https://github.com/NixOS/nixpkgs/blob/cc6e93a6178bee76311e20ef3621f2254028469e/nixos/release-small.nix#L100 [2] This - among higher parallelization with lower memory thresholds for each jobs - reduced the jobset eval time on my employer's Hydra from 15min to ~50s. [3] I think this is fine given that jobsets usually have _many_ constituents and very few aggregates. [4] https://github.com/flyingcircusio/hydra/pull/1 [5] https://github.com/NixOS/hydra/pull/1421#discussion_r1844974067
ma27 added 2 commits 2024-11-23 10:29:55 +00:00
Flake lock file updates:

• Updated input 'lix':
    'c859d03013.tar.gz?narHash=sha256-bq21I1EjXJa/s5Rra9J9ot2NkPCnI0F5uNPurwYLdpE%3D&rev=c859d03013712b349d82ee6223948d6d03e63a8d' (2024-11-15)
  → '66f6dbda32.tar.gz?narHash=sha256-H7GN4%2B%2Ba4vE49SUNojZx%2BFSk4mmpb2ifJUtJMJHProI%3D&rev=66f6dbda32959dd5cf3a9aaba15af72d037ab7ff' (2024-11-20)
• Updated input 'nix-github-actions':
    'github:nix-community/nix-github-actions/e04df33f62cdcf93d73e9a04142464753a16db67' (2024-10-24)
  → 'github:nix-community/nix-github-actions/7b5f051df789b6b20d259924d349a9ba3319b226' (2024-11-18)
• Updated input 'nixpkgs':
    'github:NixOS/nixpkgs/035d434d48f4375ac5d3a620954cf5fda7dd7c36' (2024-11-15)
  → 'github:NixOS/nixpkgs/df94f897ffe1af1bcd60cb68697c5d8e6431346e' (2024-11-22)
• Updated input 'treefmt-nix':
    'github:numtide/treefmt-nix/746901bb8dba96d154b66492a29f5db0693dbfcc' (2024-10-30)
  → 'github:numtide/treefmt-nix/705df92694af7093dfbb27109ce16d828a79155f' (2024-11-22)
The part about finding `_hydraAggregate`/`constituents` is basically
derived from `hydra-eval-jobs`, however the part about
`namedConstituents` has been changed: we still stream out jobs when they
appear, however we suppress this for aggregate jobs.

These jobs are post-processed at the end, i.e. if `namedConstituents`
exist, these will be mapped to the drvPath of the other jobs. Then, the
drv will be rewritten to contain the drvPath of said jobs[1] and the
JSON containing the rewritten `drvPath` will be printed out.

[1] This was an optimization to reduce the memory footprint of
    evaluating e.g. the `tested` job in nixpkgs.
Owner

Personally I am pretty hard in the "the ends justify the means" on this one, since it exposes surface that only people using Hydra (cough and thus accepting breakage risk because hydra is jank) care about, and does not introduce new bad API that has not yet been invented. That said, this code is pretty bad and probably could use a cleanup and maybe some light description of what a constituent is even for above the code for it.

It would be helpful if the n-e-j tests were easier to run; our main stability strategy for it seems to be of exercising the code pretty hard in production and not editing it often.

Personally I am pretty hard in the "the ends justify the means" on this one, since it exposes surface that only people using Hydra (*cough* and thus accepting breakage risk because hydra is jank) care about, and does not introduce new bad API that has not yet been invented. That said, this code is pretty bad and probably could use a cleanup and maybe some light description of what a constituent is even for above the code for it. It would be helpful if the n-e-j tests were easier to run; our main stability strategy for it seems to be of exercising the code pretty hard in production and not editing it often.
ma27 added 1 commit 2024-12-02 20:43:09 +00:00
Author
Member

That said, this code is pretty bad and probably could use a cleanup and maybe some light description of what a constituent is even for above the code for it.

It absolutely is! This will be revised before I consider this acceptable to be merged. This was mostly a proof of concept to get the feature back working on my end and to start the discussion.

> That said, this code is pretty bad and probably could use a cleanup and maybe some light description of what a constituent is even for above the code for it. It absolutely is! This will be revised before I consider this acceptable to be merged. This was mostly a proof of concept to get the feature back working on my end and to start the discussion.
raito approved these changes 2024-12-14 13:55:53 +00:00
raito left a comment
Owner

Cursory check of the code; LGTM. It doesn't shock me that we are bringing back this feature.

Cursory check of the code; LGTM. It doesn't shock me that we are bringing back this feature.
Owner

(but I +1 Jade remark, I guess I'm just too engrossed in all of these terms of what is a constituent, etc.)

(but I +1 Jade remark, I guess I'm just too engrossed in all of these terms of what is a constituent, etc.)
ma27 force-pushed bring-back-constituents from 9c5c88d57b to ac27a3f9d5 2024-12-18 14:51:20 +00:00 Compare
This pull request is marked as a work in progress.
View command line instructions

Checkout

From your project repository, check out a new branch and test the changes.
git fetch -u bring-back-constituents:ma27-bring-back-constituents
git checkout ma27-bring-back-constituents

Merge

Merge the changes and update on Forgejo.
git checkout main
git merge --no-ff ma27-bring-back-constituents
git checkout main
git merge --ff-only ma27-bring-back-constituents
git checkout ma27-bring-back-constituents
git rebase main
git checkout main
git merge --no-ff ma27-bring-back-constituents
git checkout main
git merge --squash ma27-bring-back-constituents
git checkout main
git merge --ff-only ma27-bring-back-constituents
git checkout main
git merge ma27-bring-back-constituents
git push origin main
Sign in to join this conversation.
No reviewers
No milestone
No project
No assignees
3 participants
Notifications
Due date
The due date is invalid or out of range. Please use the format "yyyy-mm-dd".

No due date set.

Dependencies

No dependencies set.

Reference: lix-project/nix-eval-jobs#17
No description provided.