We've seen many fails on ofborg, at lot of them ultimately appear to come down to
a timeout being hit, resulting in something like this:
Failure executing slapadd -F /<path>/slap.d -b dc=example -l /<path>/load.ldif.
Hopefully this resolves it for most cases.
I've done some endurance testing and this helps a lot.
some other commands also regularly time-out with high load:
- hydra-init
- hydra-create-user
- nix-store --delete
This should address most issues with tests randomly failing.
Used the following script for endurance testing:
```
import os
import subprocess
run_counter = 0
fail_counter = 0
while True:
try:
run_counter += 1
print(f"Starting run {run_counter}")
env = os.environ
env["YATH_JOB_COUNT"] = "20"
result = subprocess.run(["perl", "t/test.pl"], env=env)
if (result.returncode != 0):
fail_counter += 1
print(f"Finish run {run_counter}, total fail count: {fail_counter}")
except KeyboardInterrupt:
print(f"Finished {run_counter} runs with {fail_counter} fails")
break
```
In case someone else wants to do it on their system :).
Note that YATH_JOB_COUNT may need to be changed loosely based on your
cores.
I only have 4 cores (8 threads), so for others higher numbers might
yield better results in hashing out unstable tests.
There are some known regressions regarding local testing setups - since
everything was kinda half written with the expectation that build dir =
source dir (which should not be true anymore). But everything builds and
the test suite runs fine, after several hours spent debugging random
crashes in libpqxx with MALLOC_PERTURB_...
This is implement in an extremely hacky way due to poor DBIx feature
support. Ideally, what we'd need is a way to tell DBIx to ignore the
errormsg column unless explicitly requested, and to automatically add a
computed 'errormsg IS NULL' column in others. Since it does not support
that, this commit instead hacks some support via method overrides while
taking care to not break anything obvious.
* Let tests themselves intentionally leak temp dir
By default Yath will clean up temporary files, so the result is the
same. But `--keep-dirs` can be passed to `yath test` telling Yath to
*not* clean them up instead. This is very useful for debugging.
* Update t/lib/HydraTestContext.pm
Co-authored-by: Cole Helbling <cole.e.helbling@outlook.com>
This is necessary because jobset and project names are not allowed to
begin with a digit, and yet the generated jobset and project names would
do just that.
Not the most elegant solution, but it works.
This might, hopefully, I don't know, possibly force the
database to live a little while longer and *reduce* but not
eliminate errors around stopping the database before we lose all
our DB::PG handles to it.
At the moment, the jobset object is unlikely to actually retrieve the
evaluation error output, because it isn't refreshed after
hydra-eval-jobsets is run.
Explicitly calling DBIx::Class::Row->discard_changes causes any updated
data to be refreshed, at the cost of losing any not-yet committed
changes to the row.
Set `dest_store` in the test hydra config, so that the testsuite ensures
that the distinction between the local store and the destination store
is properly taken into account.
Fix#938
By moving the tests subdirectory to t, we gain the ability to run `yath
test` with no arguments from inside `nix develop` in the root of the
the repo.
(`nix develop` is necessary in order to set the proper env vars for
`yath` to find our test libraries.)