Investigate QoS classes on Darwin #717

Open
opened 2025-03-06 03:54:39 +00:00 by jade · 3 comments
Owner

emilazy pointed out to me that we might have a similar mistake as bazel used to have on macOS, caused by incorrect QoS classes at process startup time. https://jmmv.dev/2019/03/macos-threads-qos-and-bazel.html

I have not looked into this, but I can guarantee that we have code written that does not consider this today.

Note that this would probably have to go with a project to remove the fork()s from the Lix daemon, but, thankfully, we want that anyway for other reasons.

emilazy pointed out to me that we might have a similar mistake as bazel used to have on macOS, caused by incorrect QoS classes at process startup time. https://jmmv.dev/2019/03/macos-threads-qos-and-bazel.html I have not looked into this, but I can guarantee that we have code written that does not consider this today. Note that this would probably have to go with a project to remove the fork()s from the Lix daemon, but, thankfully, we want that anyway for other reasons.
Author
Owner
Link for further debugging: https://developer.apple.com/library/archive/documentation/Performance/Conceptual/power_efficiency_guidelines_osx/PrioritizeWorkAtTheTaskLevel.html#//apple_ref/doc/uid/TP40013929-CH35-SW10
Author
Owner

Update on looking at this: the Lix daemon is running at Utility QoS class (I think), as are its child processes. It's not obvious to me what the correct QoS class is for "rendering a video" or other batch jobs kicked off by a user that they're actively watching (or not).

Maybe we could set the Lix daemon to a higher class than its children (e.g. User-Initiated) and that might make stuff go faster. But I am not sure what is being starved/slow and by whom.

Update on looking at this: the Lix daemon is running at Utility QoS class (I think), as are its child processes. It's not obvious to me what the correct QoS class is for "rendering a video" or other batch jobs kicked off by a user that they're actively watching (or not). Maybe we could set the Lix daemon to a higher class than its children (e.g. User-Initiated) and that might make stuff go faster. But I am not sure what is being starved/slow and by whom.
Author
Owner

QoS class utility (launchd, default)

lix> Full log written to /nix/var/nix/builds/nix-build-lix-2.94.0-dev-pre20250727-2d01098.drv-0/b/source/build/meson-logs/testlog.txt
lix> installCheckPhase completed in 1 minutes 54 seconds
/nix/store/43cz7wimzcxpk4fc88mwvmiqh9fd5m0i-lix-2.94.0-dev-pre20250727-2d01098-dev
/nix/store/lyrj23vsy491kscr7322y9y8wizry6jl-lix-2.94.0-dev-pre20250727-2d01098-doc
/nix/store/19jn0drzqy8a7lg6z20fpw8zd2c0fxax-lix-2.94.0-dev-pre20250727-2d01098
nb   0.09s user 0.05s system 0% cpu 4:59.22 total

QoS class default (jade "ran it in her terminal lol"):

lix> Full log written to /nix/var/nix/builds/nix-build-lix-2.94.0-dev-pre20250727-2d01098.drv-0/b/source/build/meson-logs/testlog.txt
lix> installCheckPhase completed in 59 seconds
/nix/store/43cz7wimzcxpk4fc88mwvmiqh9fd5m0i-lix-2.94.0-dev-pre20250727-2d01098-dev
/nix/store/lyrj23vsy491kscr7322y9y8wizry6jl-lix-2.94.0-dev-pre20250727-2d01098-doc
/nix/store/19jn0drzqy8a7lg6z20fpw8zd2c0fxax-lix-2.94.0-dev-pre20250727-2d01098
nb   0.07s user 0.04s system 0% cpu 3:39.75 total

that's 1 minute faster lix derivation build by setting a higher QoS class. Note that it seems that this affects the test suite the most, which is probably because of delays getting piles of child processes onto run queues or something? I dunno.

Either way there's hella smoke here and someone should go looking for the fire (can we just set the daemon to Default and not children? Is it a logging problem (#935)?).

QoS class utility (launchd, default) ``` lix> Full log written to /nix/var/nix/builds/nix-build-lix-2.94.0-dev-pre20250727-2d01098.drv-0/b/source/build/meson-logs/testlog.txt lix> installCheckPhase completed in 1 minutes 54 seconds /nix/store/43cz7wimzcxpk4fc88mwvmiqh9fd5m0i-lix-2.94.0-dev-pre20250727-2d01098-dev /nix/store/lyrj23vsy491kscr7322y9y8wizry6jl-lix-2.94.0-dev-pre20250727-2d01098-doc /nix/store/19jn0drzqy8a7lg6z20fpw8zd2c0fxax-lix-2.94.0-dev-pre20250727-2d01098 nb 0.09s user 0.05s system 0% cpu 4:59.22 total ``` QoS class default (jade "ran it in her terminal lol"): ``` lix> Full log written to /nix/var/nix/builds/nix-build-lix-2.94.0-dev-pre20250727-2d01098.drv-0/b/source/build/meson-logs/testlog.txt lix> installCheckPhase completed in 59 seconds /nix/store/43cz7wimzcxpk4fc88mwvmiqh9fd5m0i-lix-2.94.0-dev-pre20250727-2d01098-dev /nix/store/lyrj23vsy491kscr7322y9y8wizry6jl-lix-2.94.0-dev-pre20250727-2d01098-doc /nix/store/19jn0drzqy8a7lg6z20fpw8zd2c0fxax-lix-2.94.0-dev-pre20250727-2d01098 nb 0.07s user 0.04s system 0% cpu 3:39.75 total ``` that's 1 minute faster lix derivation build by setting a higher QoS class. Note that it seems that this affects the test suite the most, which is probably because of delays getting piles of child processes onto run queues or something? I dunno. Either way there's hella smoke here and someone should go looking for the fire (can we just set the daemon to Default and not children? Is it a logging problem (https://git.lix.systems/lix-project/lix/issues/935)?).
Sign in to join this conversation.
No milestone
No project
No assignees
1 participant
Notifications
Due date
The due date is invalid or out of range. Please use the format "yyyy-mm-dd".

No due date set.

Dependencies

No dependencies set.

Reference: lix-project/lix#717
No description provided.