WIP: hydra-queue-runner: do not assume reservation is available when build step throws unexpectedly #1

Closed
raito wants to merge 1 commit from raito/hydra:rsv-nullpo into main
Owner

reservation might be null when an uncaught exception has been thrown.
We copy the data we need to report the error.

Not sure if this is the right way to go about this fix.

`reservation` might be null when an uncaught exception has been thrown. We copy the data we need to report the error. Not sure if this is the right way to go about this fix.
raito added 1 commit 2024-07-08 15:05:07 +00:00
`reservation` might be null when an uncaught exception has been thrown.
We copy the data we need to report the error.

Signed-off-by: Raito Bezarius <masterancpp@gmail.com>
Author
Owner

I don't think it passes all the tests, one is failing (flakiness, idk?).

I don't think it passes all the tests, one is failing (flakiness, idk?).
Author
Owner

And it's the LDAP test…

And it's the LDAP test…
Member

Yeah that looks correct - a while back I made it so machines drop their reservations early to not hog build slots while copying stuff around, and I'm guessing we've been encountering exceptions that happen on the "copying outputs" step?

Yeah that looks correct - a while back I made it so machines drop their reservations early to not hog build slots while copying stuff around, and I'm guessing we've been encountering exceptions that happen on the "copying outputs" step?
Member

(See 5c3e508e55)

(See 5c3e508e55d99b6d9e50aee48b8bf514e15ad006)
delroth reviewed 2024-07-08 19:21:37 +00:00
@ -40,2 +41,4 @@
res = doBuildStep(destStore, reservation, activeStep);
} catch (std::exception & e) {
if (!reservation) {
printMsg(lvlError, "machine '%s' has been released unexpectedly", machineSshName);
Member

Not unexpectedly. See my other comments.

Not unexpectedly. See my other comments.
Author
Owner

Yep, makes sense. Will change it, to a lvlTalkative maybe?

Yep, makes sense. Will change it, to a `lvlTalkative` maybe?
raito marked this conversation as resolved
raito force-pushed rsv-nullpo from a8a5dea411 to 1f9d8dc0a6 2024-07-09 01:12:42 +00:00 Compare
lukegb approved these changes 2024-07-09 01:16:08 +00:00
Author
Owner

The LDAP test still fails deterministcally, I have no idea why.

The LDAP test still fails deterministcally, I have no idea why.
Member

Fixed in fb9e29d4.

Fixed in fb9e29d4.
delroth closed this pull request 2024-07-13 04:12:52 +00:00
Some checks failed
Test / tests (pull_request) Has been cancelled

Pull request closed

Sign in to join this conversation.
No reviewers
No labels
No milestone
No project
No assignees
3 participants
Notifications
Due date
The due date is invalid or out of range. Please use the format "yyyy-mm-dd".

No due date set.

Dependencies

No dependencies set.

Reference: lix-project/hydra#1
No description provided.