functional2/lang infrastructure #825
Labels
No labels
Affects/CppNix
Affects/Nightly
Affects/Only nightly
Affects/Stable
Area/build-packaging
Area/cli
Area/evaluator
Area/fetching
Area/flakes
Area/language
Area/lix ci
Area/nix-eval-jobs
Area/profiles
Area/protocol
Area/releng
Area/remote-builds
Area/repl
Area/repl/debugger
Area/store
bug
Context
contributors
Context
drive-by
Context
maintainers
Context
RFD
crash 💥
Cross Compilation
devx
docs
Downstream Dependents
E/easy
E/hard
E/help wanted
E/reproducible
E/requires rearchitecture
imported
Language/Bash
Language/C++
Language/NixLang
Language/Python
Language/Rust
Needs Langver
OS/Linux
OS/macOS
performance
regression
release-blocker
stability
Status
blocked
Status
invalid
Status
postponed
Status
wontfix
testing
testing/flakey
Topic/Large Scale Installations
ux
No milestone
No project
No assignees
4 participants
Notifications
Due date
No due date set.
Dependencies
No dependencies set.
Reference: lix-project/lix#825
Loading…
Add table
Add a link
Reference in a new issue
No description provided.
Delete branch "%!s()"
Deleting a branch is permanent. Although the deleted branch may continue to exist for a short time before it actually gets removed, it CANNOT be undone in most cases. Continue?
Goal is to build the infrastructure for migrating the functional/lang tests to functional2. A prerequisite is to have our testlib be able to do snapshot testing (#595).
Here's my list of requirements in order to be able to migrate functional/lang tests while providing improvements in developer experience as well as being sufficiently future-proof to handle the nix-lang2 tests.
Note: I am clearly prioritizing a devX improvement over an easy migration path. I am willing to eat the manual migration if that is the price for having nice things.
This issue was mentioned on Gerrit on the following CLs:
there are currently multiple sets of tests, which check the parser/eval with a number on input files all for the feature/issue/bug, using the same parser and same flags.
It would be nice to be able to have something similar without much overhead, i.e. define a set of input and output files for one "test group" to run the parser/eval/eval+parser on
examples are:
parse-fail-dup-attr
andeval-okay-scope
The fundamental issue is that the current test runner is files-only, which strongly prefers having larger files which test many things at once. @pennae suggested a test flow inspired by checkfile, similar to our current repl-characterization tests. This would have the advantage of being able to quickly add lots of small tests with little overhead. The downside is that, depending on the details of the implementation, this kind of testing does not fare well with the large input and output files we currently have. Some of our tests could theoretically be split up, but that would be non-trivial manual work. Therefore I'd like to keep that approach as only supplemental to the one currently proposed in CL 3146.
About the
parse-fail-dup-attr-$n
tests, my current suggestion would be to expand our prototype to support multiple in files (in-$n.nix
etc), where all runners configured for that test are run on all input files.Also, I'm increasingly in favor of having a small yaml or toml file instead of the
__init__.py
, since we currently do not need the full expressivity of the Python language.Proposed format:
Where the output files would be
test-name.out.exp
, ortest-name-n.out.exp
in the case of multiple in files.Moreover, as a shorthand, that toml file may be omitted if there is only one input file and the output files are named after the test type. For example, with
in.nix
andparse-fail.err.exp
in a test folder, this would imply the following configuration:This way, we don't have to clutter most folders with an extra file, and most current tests have an easy migration path (just rename and be done), while still providing a much more powerful framework for the tests that need it (no symlink hacks anymore!)
there must be an authoritative list of test cases somewhere the runner can actually see it. the current tests suffer from the run-from-install-dir problem where meson will happily leave old tests in place in the install dir, and since the runner only lists the install dir to find its test cases it'll also happily continue running a deleted test. (if that authoritative list is the dir listing of
tests/functional2/lang
rather than a configured or installed version of that we'd already be on much firmer ground)we're still very fond of the filecheck syntax for this, changed to be a tiny bit shebangy. imagine an
in.nix
:we'd wish that to be accompanied by for sets of expectations (
{,other-}depr.{out,err} parse.{out,err} eval.{out,err}
) and the test runner to run the same file four times with three different arguments. the actual filecheck thing of running an ordered list of regexes over the output could also be done, but that loses fidelity (because stdout and stderr must be merged) on top of requiring tests to be split. that would handle the vast majority of test cases we have today with minimal boilerplate, and if the runner lists the source directory rather than the configured directory we get many conveniences for freeMy assumption so far has been that the test files themselves are the authoritative listing, and that with the new framework we don't have to worry about any "install" shenanigans or build directory impurities anymore. I'd be against having an explicit enumeration if avoidable.
I'm not strictly against having the test description inline with the test input, however, it comes with a new set of questions to resolve:
yeah, that's the ideal case. if we can make that happen we absolutely should
will probably need a parser, but that can be extremely simple:
^# (?<kind>(?:PARSE|EVAL)-(?:GOOD|FAIL))\((?<tag>[a-zA-Z0-9_-+)(?:,(?<flags>.*))\)$
with the flags running throug ashlex.split
at the prelude. having it start elsewhere messes with the semantics of
./.
, and the prelude being entirely comments shouldn't hurt (as long as we have a stable single-line comment syntax)that's not easy in a filecheck-like setting this simplified. there's no reason not to have both a test list file for matrix testing and embedded metadata for simple tests though (and "simple" tests, especially once we start splitting the huge tests we have today, are probably the more numerous)
I'd like to hear your opinion on inferring the runner information for the simple test cases. These are in the majority anyways, and doing this would mean that migration would be just a simple rename.
build/
#836Done and implemented 🎉