Consider replacing pytest in functional2 #594

New issue

Open

opened 2024-12-05 20:16:43 +00:00 by jade · 4 comments

jade commented

2024-12-05 20:16:43 +00:00

Owner

I am getting bad vibes from pytest to be honest, especially with #593. Among other things, it could be a lot faster, particularly startup-time wise, and I am mildly worried that we might want to have the ability to do threading or asyncio driven parallelism in the future for faster testing which it entirely cannot do. The startup times at present are not great and spawning a pile of workers is a super bad design for something which does not have the normal python reason for being slow or the normal python reasons to need a process-pool solution.

It also does a lot of magic, such as automatically loading plugins based on PYTHONPATH (proximate cause of #593). I also don't really like the way that the fixtures interact with type annotations. I would rather the fixtures be defined by the type annotations, which makes it significantly more obvious what they are and improves IDE behaviour substantially. Overall it is extremely complex and this complexity seems to be causing some hauntings.

Maybe we should replace it with something like chromium's expect_tests or @puck's nix conformance suite runner or something written from scratch.

This is not to say that functional2 should not be used; there is no urgency to actually execute on a replacement since the test-facing interface is both good and fairly abstract, and we can just migrate the code if we decide we want something different. It's more important that we have decent tests at the moment IMO.

Does anyone have any thoughts or opinions on this? We are in no rush to actually make any changes here.

I am getting bad vibes from pytest to be honest, especially with #593. Among other things, it could be a *lot* faster, particularly startup-time wise, and I am mildly worried that we might want to have the ability to do threading or asyncio driven parallelism in the future for faster testing which it entirely cannot do. The startup times at present are not great and spawning a pile of workers is a super bad design for something which does not have the normal python reason for being slow or the normal python reasons to need a process-pool solution. It also does a lot of magic, such as automatically loading plugins based on PYTHONPATH (proximate cause of #593). I also don't really like the way that the fixtures interact with type annotations. I would rather the fixtures be defined by the type annotations, which makes it *significantly* more obvious what they are and improves IDE behaviour substantially. Overall it is *extremely* complex and this complexity seems to be causing some hauntings. Maybe we should replace it with something like [chromium's expect_tests](https://chromium.googlesource.com/infra/testing/expect_tests/) or @puck's nix conformance suite runner or something written from scratch. This is not to say that functional2 should not be used; there is no urgency to actually execute on a replacement since the test-facing interface is both good and fairly abstract, and we *can* just migrate the code if we decide we want something different. It's more important that we have decent tests at the moment IMO. Does anyone have any thoughts or opinions on this? We are in no rush to actually make any changes here.

jade added the

Context

RFD

label

2024-12-05 20:16:43 +00:00

jade commented

2024-12-10 19:12:06 +00:00

Author

Owner

@cobaltcause on matrix:

fwiw, i find maintaining pytest test suites to be very difficult because of fixtures; they seem to make people make weird choices that become really hard to untangle later when things change that would be less of a problem with normal functions, and also they don't cooperate with lsp servers at all (at least, not the ones i know of)
IME, go-to-references and go-to-definition don't work, there's no guarantee that the fixture argument type signature match the type signature of the fixture definition, there's nothing to statically check the existence of a fixture, especially with things like autouse and such

i::jade concur with this and it is a major beef i have with pytest. in practice the fixtures exist primarily to avoid context manager rightward drift, which could be replaced with a thing for combining multiple context managers. it also exists to deal with having shared per-test temp directory or so flexibly (which is fair enough). basically a test has to be able to set shared things in its context which can reuse the memoized versions of other things in the context and this is inevitably going to be somewhat ugly even if you force explicit data passing.

@cobaltcause on matrix: > fwiw, i find maintaining pytest test suites to be very difficult because of fixtures; they seem to make people make weird choices that become really hard to untangle later when things change that would be less of a problem with normal functions, and also they don't cooperate with lsp servers at all (at least, not the ones i know of) > IME, go-to-references and go-to-definition don't work, there's no guarantee that the fixture argument type signature match the type signature of the fixture definition, there's nothing to statically check the existence of a fixture, especially with things like autouse and such i::jade concur with this and it is a major beef i have with pytest. in practice the fixtures exist primarily to avoid context manager rightward drift, which could be replaced with a thing for combining multiple context managers. it also exists to deal with having shared per-test temp directory or so flexibly (which is fair enough). basically a test has to be able to set shared things in its context which can reuse the memoized versions of other things in the context and this is inevitably going to be somewhat ugly even if you force explicit data passing.

pennae commented

2024-12-10 19:18:17 +00:00

Owner

in practice the fixtures exist primarily to avoid context manager rightward drift

but with statements can totally instantiate multiple context managers from a simple list of them as though they were nested? if fixtures are mainly around to turn with foo() as a, bar() as b: into def nonsense(foo, bar):, then what even is the point? (confused noises)

> in practice the fixtures exist primarily to avoid context manager rightward drift but with statements can totally instantiate multiple context managers from a simple list of them as though they were nested? if fixtures are mainly around to turn `with foo() as a, bar() as b:` into `def nonsense(foo, bar):`, then what even is the point? (confused noises)

jade commented

2024-12-10 20:51:00 +00:00

Author

Owner

However, we could implement this differently: tests could be given a TestContext from which you can with ctx.fixture(Nix) as nix, ctx.fixture(BinaryCacheServer) as bcs. Or do that to fixtures. There could be some amount of explicitness to remove the worst of the magic.

So they both manage life cycle and also allow sharing fixtures between tests (and between fixtures in one test). The latter reason is an important one that it would be nice to still be able to have fixtures somehow: being able to ask the ambient environment what the temp dir is for the test without having to explicitly give it is a very nice thing as it avoids having to change a pile of call sites if you need another bit of such state for a fixture. It's the classical dependency injection framework thing. However, we could implement this differently: tests could be given a TestContext from which you can `with ctx.fixture(Nix) as nix, ctx.fixture(BinaryCacheServer) as bcs`. Or do that to fixtures. There could be some amount of explicitness to remove the worst of the magic.

kfearsoff commented

2024-12-11 08:05:16 +00:00

Member

if fixtures are mainly around to turn with foo() as a, bar() as b: into def nonsense(foo, bar):, then what even is the point? (confused noises)

Well, that's the point! Unlike the with statement, functions are composable in Python: you can take them as arguments to functions and pass them as arguments to other functions. This lets you do Dependency Injection and have tests and fixtures depend on fixtures that are only defined in one place.

Also, personally, I've always struggled writing code that is needed to use with in Python; I assume that's especially true for people who had less experience with static typing languages that have interfaces and stuff. But that's less of a technical and more of a, uhh, cultural issue I guess :)

I saw mentions of Rust in Matrix, so I just had to chime in and put my own 2 cents here. So here goes.

Regarding pytest: yeah, aside from turning with foo() as a into def nonsense(a), it also does a whole bunch of ad-hoc things related to state management which I just don't love.

For the Rust side, it's weird. Rust has a lot of cool testing libraries for snapshot testing, regression testing, property testing and all kinds of other weird things, but from what I've seen testing in Rust often tends to fall into one of: a lot of repetition, homegrown approaches, or macro abuse.

So I guess I'm in favor of rewriting the Python test runner. I'd rather a homegrown test runner than a homegrown Rust test suite.

> if fixtures are mainly around to turn `with foo() as a, bar() as b:` into `def nonsense(foo, bar):`, then what even is the point? (confused noises) Well, that's the point! Unlike the `with` statement, functions are composable in Python: you can take them as arguments to functions and pass them as arguments to other functions. This lets you do Dependency Injection and have tests and fixtures depend on fixtures that are only defined in one place. Also, personally, I've always struggled writing code that is needed to use `with` in Python; I assume that's especially true for people who had less experience with static typing languages that have interfaces and stuff. But that's less of a technical and more of a, uhh, cultural issue I guess :) I saw mentions of Rust in Matrix, so I just had to chime in and put my own 2 cents here. So here goes. Regarding pytest: yeah, aside from turning `with foo() as a` into `def nonsense(a)`, it also does a whole bunch of ad-hoc things related to state management which I just don't love. For the Rust side, it's weird. Rust has a lot of cool testing libraries for snapshot testing, regression testing, property testing and all kinds of other weird things, but from what I've seen testing in Rust often tends to fall into one of: a lot of repetition, homegrown approaches, or macro abuse. So I guess I'm in favor of rewriting the Python test runner. I'd rather a homegrown test runner than a homegrown Rust test suite.