hydra/t/queue-runner/build-locally-with-substitutable-path.t
Rick van Schijndel ef619eca99
t: increase timeouts for slow commands with high load
We've seen many fails on ofborg, at lot of them ultimately appear to come down to
a timeout being hit, resulting in something like this:

Failure executing slapadd -F /<path>/slap.d -b dc=example -l /<path>/load.ldif.

Hopefully this resolves it for most cases.
I've done some endurance testing and this helps a lot.
some other commands also regularly time-out with high load:

- hydra-init
- hydra-create-user
- nix-store --delete

This should address most issues with tests randomly failing.

Used the following script for endurance testing:

```

import os
import subprocess

run_counter = 0
fail_counter = 0

while True:
    try:
        run_counter += 1
        print(f"Starting run {run_counter}")
        env = os.environ
        env["YATH_JOB_COUNT"] = "20"
        result = subprocess.run(["perl", "t/test.pl"], env=env)
        if (result.returncode != 0):
            fail_counter += 1
        print(f"Finish run {run_counter}, total fail count: {fail_counter}")
    except KeyboardInterrupt:
        print(f"Finished {run_counter} runs with {fail_counter} fails")
        break
```

In case someone else wants to do it on their system :).
Note that YATH_JOB_COUNT may need to be changed loosely based on your
cores.
I only have 4 cores (8 threads), so for others higher numbers might
yield better results in hashing out unstable tests.
2024-08-11 16:08:09 +02:00

55 lines
1.9 KiB
Perl
Raw Permalink Blame History

This file contains ambiguous Unicode characters

This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.

use strict;
use warnings;
use Setup;
use Data::Dumper;
use Test2::V0;
use Hydra::Helper::Exec;
my $ctx = test_context(
use_external_destination_store => 1
);
require Hydra::Helper::Nix;
# This test is regarding https://github.com/NixOS/hydra/pull/1126
#
# A hydra instance was regularly failing to build derivations with:
#
# possibly transient failure building /nix/store/X.drv on localhost:
# dependency '/nix/store/Y' of '/nix/store/Y.drv' does not exist,
# and substitution is disabled
#
# However it would only fail when building on localhost, and it would only
# fail if the build output was already in the binary cache.
#
# This test replicates this scenario by having two jobs, underlyingJob and
# dependentJob. dependentJob depends on underlyingJob. We first build
# underlyingJob and copy it to an external cache. Then forcefully delete
# the output of underlyingJob, and build dependentJob. In order to pass
# it must either rebuild underlyingJob or fetch it from the cache.
subtest "Building, caching, and then garbage collecting the underlying job" => sub {
my $builds = $ctx->makeAndEvaluateJobset(
expression => "dependencies/underlyingOnly.nix",
build => 1
);
my $path = $builds->{"underlyingJob"}->buildoutputs->find({ name => "out" })->path;
ok(unlink(Hydra::Helper::Nix::gcRootFor($path)), "Unlinking the GC root for underlying Dependency succeeds");
(my $ret, my $stdout, my $stderr) = captureStdoutStderr(15, "nix-store", "--delete", $path);
is($ret, 0, "Deleting the underlying dependency should succeed");
};
subtest "Building the dependent job should now succeed, even though we're missing a local dependency" => sub {
my $builds = $ctx->makeAndEvaluateJobset(
expression => "dependencies/dependentOnly.nix"
);
ok(runBuild($builds->{"dependentJob"}), "building the job should succeed");
};
done_testing;