functional2/lang infrastructure

piegames commented

2025-05-10 18:15:28 +00:00

Member

Goal is to build the infrastructure for migrating the functional/lang tests to functional2. A prerequisite is to have our testlib be able to do snapshot testing (#595).

Here's my list of requirements in order to be able to migrate functional/lang tests while providing improvements in developer experience as well as being sufficiently future-proof to handle the nix-lang2 tests.

Be able to group tests into named folders instead of having a flat structure like we currently do
Be able to update all .exp files upon test failure
Be files-first: All test inputs and outputs should be done over files. Having tests defined with inline strings in Python might be a neat bonus though, but it's not necessary and shouldn't be the default.
Be able to run parse, eval or parse+eval tests on the same Nix file
- Implied: don't have "parse-fail" or "eval-okay" in the input file name
Be able to run tests with different flags on the same Nix file
Be able to have multiple versions of a test. Scenario: I change a language feature (old or new), I want to have a Nix file with and without it, and test each with and without the feature flag
Have the same output post-processing as functional/lang (e.g. JSON to YAML conversion, path sanitation, etc.), so that most tests can be migrated with a simple file move

Note: I am clearly prioritizing a devX improvement over an easy migration path. I am willing to eat the manual migration if that is the price for having nice things.

Goal is to build the infrastructure for migrating the functional/lang tests to functional2. A prerequisite is to have our testlib be able to do snapshot testing (#595). Here's my list of requirements in order to be able to migrate functional/lang tests while providing improvements in developer experience as well as being sufficiently future-proof to handle the nix-lang2 tests. - Be able to group tests into named folders instead of having a flat structure like we currently do - Be able to update all .exp files upon test failure - Be files-first: All test inputs and outputs should be done over files. Having tests defined with inline strings in Python might be a neat bonus though, but it's not necessary and shouldn't be the default. - Be able to run parse, eval or parse+eval tests on the same Nix file - Implied: don't have "parse-fail" or "eval-okay" in the input file name - Be able to run tests with different flags on the same Nix file - Be able to have multiple versions of a test. Scenario: I change a language feature (old or new), I want to have a Nix file with and without it, and test each with and without the feature flag - Have the same output post-processing as functional/lang (e.g. JSON to YAML conversion, path sanitation, etc.), so that most tests can be migrated with a simple file move Note: I am clearly prioritizing a devX improvement over an easy migration path. I am willing to eat the manual migration if that is the price for having nice things.

piegames added this to the functional2 project

2025-05-10 18:15:28 +00:00

piegames added the

testing

devx

labels

2025-05-10 18:15:44 +00:00

lix-bot commented

2025-05-12 14:36:02 +00:00

Member

This issue was mentioned on Gerrit on the following CLs:

commit message in cl/3146 ("functional2: add framework for langtests")
commit message in cl/3149 ("tests/functional2: add framework for lang tests")

This issue was mentioned on Gerrit on the following CLs: * commit message in [cl/3146](https://gerrit.lix.systems/c/lix/+/3146) ("functional2: add framework for langtests") * commit message in [cl/3149](https://gerrit.lix.systems/c/lix/+/3149) ("tests/functional2: add framework for lang tests")

commentatorforall commented

2025-05-12 17:44:21 +00:00

Member

there are currently multiple sets of tests, which check the parser/eval with a number on input files all for the feature/issue/bug, using the same parser and same flags.
It would be nice to be able to have something similar without much overhead, i.e. define a set of input and output files for one "test group" to run the parser/eval/eval+parser on

examples are:
parse-fail-dup-attr and eval-okay-scope

there are currently multiple sets of tests, which check the parser/eval with a number on input files all for the feature/issue/bug, using the same parser and same flags. It would be nice to be able to have something similar without much overhead, i.e. define a set of input and output files for one "test group" to run the parser/eval/eval+parser on examples are: `parse-fail-dup-attr` and `eval-okay-scope`

piegames commented

2025-05-13 10:57:28 +00:00

Author

Member

The fundamental issue is that the current test runner is files-only, which strongly prefers having larger files which test many things at once. @pennae suggested a test flow inspired by checkfile, similar to our current repl-characterization tests. This would have the advantage of being able to quickly add lots of small tests with little overhead. The downside is that, depending on the details of the implementation, this kind of testing does not fare well with the large input and output files we currently have. Some of our tests could theoretically be split up, but that would be non-trivial manual work. Therefore I'd like to keep that approach as only supplemental to the one currently proposed in CL 3146.

About the parse-fail-dup-attr-$n tests, my current suggestion would be to expand our prototype to support multiple in files (in-$n.nix etc), where all runners configured for that test are run on all input files.

The fundamental issue is that the current test runner is files-only, which strongly prefers having larger files which test many things at once. @pennae suggested a test flow inspired by checkfile, similar to our current repl-characterization tests. This would have the advantage of being able to quickly add lots of small tests with little overhead. The downside is that, depending on the details of the implementation, this kind of testing does not fare well with the large input and output files we currently have. Some of our tests could theoretically be split up, but that would be non-trivial manual work. Therefore I'd like to keep that approach as only supplemental to the one currently proposed in CL 3146. About the `parse-fail-dup-attr-$n` tests, my current suggestion would be to expand our prototype to support multiple in files (`in-$n.nix` etc), where all runners configured for that test are run on all input files.

👍 1

piegames commented

2025-05-13 11:15:26 +00:00

Author

Member

Also, I'm increasingly in favor of having a small yaml or toml file instead of the __init__.py, since we currently do not need the full expressivity of the Python language.

Proposed format:

[test-name]
type = "eval-fail"
flags = […]

Where the output files would be test-name.out.exp, or test-name-n.out.exp in the case of multiple in files.

Moreover, as a shorthand, that toml file may be omitted if there is only one input file and the output files are named after the test type. For example, with in.nix and parse-fail.err.exp in a test folder, this would imply the following configuration:

[parse-fail]
type = "parse-fail"
flags = []

This way, we don't have to clutter most folders with an extra file, and most current tests have an easy migration path (just rename and be done), while still providing a much more powerful framework for the tests that need it (no symlink hacks anymore!)

Also, I'm increasingly in favor of having a small yaml or toml file instead of the `__init__.py`, since we currently do not need the full expressivity of the Python language. Proposed format: ```toml [test-name] type = "eval-fail" flags = […] ``` Where the output files would be `test-name.out.exp`, or `test-name-n.out.exp` in the case of multiple in files. Moreover, as a shorthand, that toml file may be omitted if there is only one input file and the output files are named after the test type. For example, with `in.nix` and `parse-fail.err.exp` in a test folder, this would imply the following configuration: ```toml [parse-fail] type = "parse-fail" flags = [] ``` This way, we don't have to clutter most folders with an extra file, and most current tests have an easy migration path (just rename and be done), while still providing a much more powerful framework for the tests that need it (no symlink hacks anymore!)

pennae commented

2025-05-13 12:07:41 +00:00

Owner

there must be an authoritative list of test cases somewhere the runner can actually see it. the current tests suffer from the run-from-install-dir problem where meson will happily leave old tests in place in the install dir, and since the runner only lists the install dir to find its test cases it'll also happily continue running a deleted test. (if that authoritative list is the dir listing of tests/functional2/lang rather than a configured or installed version of that we'd already be on much firmer ground)

we're still very fond of the filecheck syntax for this, changed to be a tiny bit shebangy. imagine an in.nix:

# PARSE-FAIL(depr)
# PARSE-FAIL(other-depr, --extra-experimental-features cr-line-endings)
# PARSE-GOOD(parse, --extra-experimental-features ancient-let)
# EVAL-GOOD(eval, --extra-experimental-features ancient-let)
let { body = 1; }

we'd wish that to be accompanied by for sets of expectations ({,other-}depr.{out,err} parse.{out,err} eval.{out,err}) and the test runner to run the same file four times with three different arguments. the actual filecheck thing of running an ordered list of regexes over the output could also be done, but that loses fidelity (because stdout and stderr must be merged) on top of requiring tests to be split. that would handle the vast majority of test cases we have today with minimal boilerplate, and if the runner lists the source directory rather than the configured directory we get many conveniences for free

there must be an authoritative list of test cases *somewhere* the runner can actually see it. the current tests suffer from the run-from-install-dir problem where meson will happily leave old tests in place in the install dir, and since the runner only lists the install dir to find its test cases it'll also happily continue running a deleted test. (if that authoritative list is the dir listing of `tests/functional2/lang` rather than a configured or installed version of that we'd already be on much firmer ground) we're still very fond of the filecheck syntax for this, changed to be a tiny bit shebangy. imagine an `in.nix`: ``` # PARSE-FAIL(depr) # PARSE-FAIL(other-depr, --extra-experimental-features cr-line-endings) # PARSE-GOOD(parse, --extra-experimental-features ancient-let) # EVAL-GOOD(eval, --extra-experimental-features ancient-let) let { body = 1; } ``` we'd wish that to be accompanied by for sets of expectations (`{,other-}depr.{out,err} parse.{out,err} eval.{out,err}`) and the test runner to run the same file four times with three different arguments. the *actual* filecheck thing of running an ordered list of regexes over the output could also be done, but that loses fidelity (because stdout and stderr must be merged) on top of requiring tests to be split. that would handle the vast majority of test cases we have today with minimal boilerplate, and if the runner lists the *source* directory rather than the *configured* directory we get many conveniences for free

piegames commented

2025-05-13 12:27:24 +00:00

Author

Member

My assumption so far has been that the test files themselves are the authoritative listing, and that with the new framework we don't have to worry about any "install" shenanigans or build directory impurities anymore. I'd be against having an explicit enumeration if avoidable.

I'm not strictly against having the test description inline with the test input, however, it comes with a new set of questions to resolve:

Are there any libraries to parse it or will it require a hand-rolled parser?
Does the input Nix code start after the prelude or at the start of the file?
What about multiple input files, as discussed above?

My assumption so far has been that the test files themselves *are* the authoritative listing, and that with the new framework we don't have to worry about any "install" shenanigans or build directory impurities anymore. I'd be against having an explicit enumeration if avoidable. I'm not strictly against having the test description inline with the test input, however, it comes with a new set of questions to resolve: - Are there any libraries to parse it or will it require a hand-rolled parser? - Does the input Nix code start after the prelude or at the start of the file? - What about multiple input files, as discussed above?

pennae commented

2025-05-13 12:44:22 +00:00

Owner

My assumption so far has been that the test files themselves are the authoritative listing, and that with the new framework we don't have to worry about any "install" shenanigans or build directory impurities anymore.

yeah, that's the ideal case. if we can make that happen we absolutely should

Are there any libraries to parse it or will it require a hand-rolled parser?

will probably need a parser, but that can be extremely simple: ^# (?<kind>(?:PARSE|EVAL)-(?:GOOD|FAIL))$(?<tag>[a-zA-Z0-9_-+)(?:,(?<flags>.*))$$ with the flags running throug a shlex.split

Does the input Nix code start after the prelude or at the start of the file?

at the prelude. having it start elsewhere messes with the semantics of ./., and the prelude being entirely comments shouldn't hurt (as long as we have a stable single-line comment syntax)

What about multiple input files, as discussed above?

that's not easy in a filecheck-like setting this simplified. there's no reason not to have both a test list file for matrix testing and embedded metadata for simple tests though (and "simple" tests, especially once we start splitting the huge tests we have today, are probably the more numerous)

> My assumption so far has been that the test files themselves are the authoritative listing, and that with the new framework we don't have to worry about any "install" shenanigans or build directory impurities anymore. yeah, that's the ideal case. if we can make that happen we absolutely should > Are there any libraries to parse it or will it require a hand-rolled parser? will probably need a parser, but that can be extremely simple: `^# (?<kind>(?:PARSE|EVAL)-(?:GOOD|FAIL))$(?<tag>[a-zA-Z0-9_-+)(?:,(?<flags>.*))$$` with the flags running throug a `shlex.split` > Does the input Nix code start after the prelude or at the start of the file? at the prelude. having it start elsewhere messes with the semantics of `./.`, and the prelude being entirely comments shouldn't hurt (as long as we have a stable single-line comment syntax) > What about multiple input files, as discussed above? that's not easy in a filecheck-like setting this simplified. there's no reason not to have *both* a test list file for matrix testing and embedded metadata for simple tests though (and "simple" tests, especially once we start splitting the huge tests we have today, are probably the more numerous)

piegames commented

2025-05-13 13:43:45 +00:00

Author

Member

I'd like to hear your opinion on inferring the runner information for the simple test cases. These are in the majority anyways, and doing this would mean that migration would be just a simple rename.

commentatorforall commented

2025-05-13 14:20:52 +00:00

Member

meson will happily leave old tests in place in the install dir
the internal implementation wouldn't change much when using the toml instead of the init
aka the framework would still copy each file individually into a per-test directory and check it there, with no residual files being left over on a change/deletion

> meson will happily leave old tests in place in the install dir the internal implementation wouldn't change much when using the toml instead of the init aka the framework would still copy each file individually into a per-test directory and check it there, with no residual files being left over on a change/deletion

piegames referenced this issue

2025-05-20 17:00:39 +00:00

functional-lang: regeneration occurs in build/ #836