Meson test timeout is too short #890

Open
opened 2025-06-29 17:24:52 +00:00 by emilazy · 3 comments
Member

functional-flakes-flakes just took 239.18 s on my ageing MacBook Pro, and ran over the 300 s timeout several times before that.

In Nixpkgs, we pass --timeout-multiplier=0 to all Meson checkPhases by default, because it’s easy for tests to run into timeouts on busy machines and redundant to the overall Nix build timeout. The Nixpkgs Lix derivation should probably match that for its installCheckPhase, and I think the flake package too. But even for builds outside of a derivation it seems suboptimal, so I would also potentially suggest raising the timeout or disabling it entirely on the Meson level.

`functional-flakes-flakes` just took 239.18 s on my ageing MacBook Pro, and ran over the 300 s timeout several times before that. In Nixpkgs, we pass `--timeout-multiplier=0` to all Meson `checkPhase`s by default, because it’s easy for tests to run into timeouts on busy machines and redundant to the overall Nix build timeout. The Nixpkgs Lix derivation should probably match that for its `installCheckPhase`, and I think the flake package too. But even for builds outside of a derivation it seems suboptimal, so I would also potentially suggest raising the timeout or disabling it entirely on the Meson level.
Owner

disagree; we should instead figure out why the flake tests runs so fuckass long on darwin. nix-eval-jobs regularly spends 20x or more the time on its darwin flake test suite than on the linux test suite and nobody has yet figured out why. this is definitely a darwin problem, but it's not with the timeouts being too short as such

disagree; we should instead figure out *why* the flake tests runs so fuckass long on darwin. nix-eval-jobs regularly spends 20x or more the time on its darwin flake test suite than on the linux test suite and nobody has yet figured out why. this is definitely a darwin problem, but it's not with the timeouts being too short as such
Author
Member

I agree that’s a good idea, but I don’t think there’s a good reason to force people to restart their build from scratch repeatedly while it remains unfixed. If timeouts are there to catch tests that are legitimately hanging they should be set meaningfully higher than any existing test is expected to possibly take, even if that amount of time is problematic in itself. For interactive Meson usage it’s less of a severe penalty. The timeout could be raised only for specific tests and only on Darwin if tests hanging due to being broken is a common enough occurrence to want to minimize the amount of time spent waiting to find that out.

(I think that in general timeouts within Nix builds are a bad idea since anything that cares about them not hanging forever will set an outer timeout, and builders tend to be very overloaded compared to workstations. They turn slow builds into flaky builds; having to restart a bunch of Hydra jobs because they went over the timeout on one specific test is painful. That’s why Nixpkgs started passing --timeout-multiplier=0 globally by default; it’s just that the Lix derivation’s custom installCheckPhase instead of a checkPhase means it doesn’t get applied.)

I agree that’s a good idea, but I don’t think there’s a good reason to force people to restart their build from scratch repeatedly while it remains unfixed. If timeouts are there to catch tests that are legitimately hanging they should be set meaningfully higher than any existing test is expected to possibly take, even if that amount of time is problematic in itself. For interactive Meson usage it’s less of a severe penalty. The timeout could be raised only for specific tests and only on Darwin if tests hanging due to being broken is a common enough occurrence to want to minimize the amount of time spent waiting to find that out. (I think that in general timeouts within Nix builds are a bad idea since anything that cares about them not hanging forever will set an outer timeout, and builders tend to be very overloaded compared to workstations. They turn slow builds into flaky builds; having to restart a bunch of Hydra jobs because they went over the timeout on one specific test is painful. That’s why Nixpkgs started passing `--timeout-multiplier=0` globally by default; it’s just that the Lix derivation’s custom `installCheckPhase` instead of a `checkPhase` means it doesn’t get applied.)
Owner

we absolutely can increase the test timeout temporarily, but we must investigate why the tests are taking so much longer on weird platforms to begin with. whatever it is might be causing some glaring ux issues too :/

we absolutely can increase the test timeout temporarily, but we *must* investigate why the tests are taking so much longer on weird platforms to begin with. whatever it is might be causing some glaring ux issues too :/
Sign in to join this conversation.
No milestone
No project
No assignees
2 participants
Notifications
Due date
The due date is invalid or out of range. Please use the format "yyyy-mm-dd".

No due date set.

Dependencies

No dependencies set.

Reference: lix-project/lix#890
No description provided.