Investigate QoS classes on Darwin #717

Open
opened 2025-03-06 03:54:39 +00:00 by jade · 5 comments
Owner

emilazy pointed out to me that we might have a similar mistake as bazel used to have on macOS, caused by incorrect QoS classes at process startup time. https://jmmv.dev/2019/03/macos-threads-qos-and-bazel.html

I have not looked into this, but I can guarantee that we have code written that does not consider this today.

Note that this would probably have to go with a project to remove the fork()s from the Lix daemon, but, thankfully, we want that anyway for other reasons.

emilazy pointed out to me that we might have a similar mistake as bazel used to have on macOS, caused by incorrect QoS classes at process startup time. https://jmmv.dev/2019/03/macos-threads-qos-and-bazel.html I have not looked into this, but I can guarantee that we have code written that does not consider this today. Note that this would probably have to go with a project to remove the fork()s from the Lix daemon, but, thankfully, we want that anyway for other reasons.
Author
Owner
Link for further debugging: https://developer.apple.com/library/archive/documentation/Performance/Conceptual/power_efficiency_guidelines_osx/PrioritizeWorkAtTheTaskLevel.html#//apple_ref/doc/uid/TP40013929-CH35-SW10
Author
Owner

Update on looking at this: the Lix daemon is running at Utility QoS class (I think), as are its child processes. It's not obvious to me what the correct QoS class is for "rendering a video" or other batch jobs kicked off by a user that they're actively watching (or not).

Maybe we could set the Lix daemon to a higher class than its children (e.g. User-Initiated) and that might make stuff go faster. But I am not sure what is being starved/slow and by whom.

Update on looking at this: the Lix daemon is running at Utility QoS class (I think), as are its child processes. It's not obvious to me what the correct QoS class is for "rendering a video" or other batch jobs kicked off by a user that they're actively watching (or not). Maybe we could set the Lix daemon to a higher class than its children (e.g. User-Initiated) and that might make stuff go faster. But I am not sure what is being starved/slow and by whom.
Author
Owner

QoS class utility (launchd, default)

lix> Full log written to /nix/var/nix/builds/nix-build-lix-2.94.0-dev-pre20250727-2d01098.drv-0/b/source/build/meson-logs/testlog.txt
lix> installCheckPhase completed in 1 minutes 54 seconds
/nix/store/43cz7wimzcxpk4fc88mwvmiqh9fd5m0i-lix-2.94.0-dev-pre20250727-2d01098-dev
/nix/store/lyrj23vsy491kscr7322y9y8wizry6jl-lix-2.94.0-dev-pre20250727-2d01098-doc
/nix/store/19jn0drzqy8a7lg6z20fpw8zd2c0fxax-lix-2.94.0-dev-pre20250727-2d01098
nb   0.09s user 0.05s system 0% cpu 4:59.22 total

QoS class default (jade "ran it in her terminal lol"):

lix> Full log written to /nix/var/nix/builds/nix-build-lix-2.94.0-dev-pre20250727-2d01098.drv-0/b/source/build/meson-logs/testlog.txt
lix> installCheckPhase completed in 59 seconds
/nix/store/43cz7wimzcxpk4fc88mwvmiqh9fd5m0i-lix-2.94.0-dev-pre20250727-2d01098-dev
/nix/store/lyrj23vsy491kscr7322y9y8wizry6jl-lix-2.94.0-dev-pre20250727-2d01098-doc
/nix/store/19jn0drzqy8a7lg6z20fpw8zd2c0fxax-lix-2.94.0-dev-pre20250727-2d01098
nb   0.07s user 0.04s system 0% cpu 3:39.75 total

that's 1 minute faster lix derivation build by setting a higher QoS class. Note that it seems that this affects the test suite the most, which is probably because of delays getting piles of child processes onto run queues or something? I dunno.

Either way there's hella smoke here and someone should go looking for the fire (can we just set the daemon to Default and not children? Is it a logging problem (#935)?).

QoS class utility (launchd, default) ``` lix> Full log written to /nix/var/nix/builds/nix-build-lix-2.94.0-dev-pre20250727-2d01098.drv-0/b/source/build/meson-logs/testlog.txt lix> installCheckPhase completed in 1 minutes 54 seconds /nix/store/43cz7wimzcxpk4fc88mwvmiqh9fd5m0i-lix-2.94.0-dev-pre20250727-2d01098-dev /nix/store/lyrj23vsy491kscr7322y9y8wizry6jl-lix-2.94.0-dev-pre20250727-2d01098-doc /nix/store/19jn0drzqy8a7lg6z20fpw8zd2c0fxax-lix-2.94.0-dev-pre20250727-2d01098 nb 0.09s user 0.05s system 0% cpu 4:59.22 total ``` QoS class default (jade "ran it in her terminal lol"): ``` lix> Full log written to /nix/var/nix/builds/nix-build-lix-2.94.0-dev-pre20250727-2d01098.drv-0/b/source/build/meson-logs/testlog.txt lix> installCheckPhase completed in 59 seconds /nix/store/43cz7wimzcxpk4fc88mwvmiqh9fd5m0i-lix-2.94.0-dev-pre20250727-2d01098-dev /nix/store/lyrj23vsy491kscr7322y9y8wizry6jl-lix-2.94.0-dev-pre20250727-2d01098-doc /nix/store/19jn0drzqy8a7lg6z20fpw8zd2c0fxax-lix-2.94.0-dev-pre20250727-2d01098 nb 0.07s user 0.04s system 0% cpu 3:39.75 total ``` that's 1 minute faster lix derivation build by setting a higher QoS class. Note that it seems that this affects the test suite the most, which is probably because of delays getting piles of child processes onto run queues or something? I dunno. Either way there's hella smoke here and someone should go looking for the fire (can we just set the daemon to Default and not children? Is it a logging problem (https://git.lix.systems/lix-project/lix/issues/935)?).
pennae added this to the 2.95 milestone 2025-12-01 14:51:31 +00:00
Author
Owner

Running taskpolicy -c utility nix-daemon --daemon in a terminal: full performance; not a qos class problem!

image

Running the daemon through launchd (also utility class):

image

So this is more complex than just changing the daemon configuration in the way I was thinking.

This article seems to say that if you don't use Apple's proprietary IPC, you get whacked by performance penalties. https://developer.apple.com/documentation/apple-silicon/tuning-your-code-s-performance-for-apple-silicon

Running `taskpolicy -c utility nix-daemon --daemon` in a terminal: full performance; not a qos class problem! ![image](/attachments/4ef54521-414c-47e8-a283-6178c9cb9ec9) Running the daemon through launchd (also utility class): ![image](/attachments/f2ba702a-67c1-449c-afb8-ef8d5174e788) So this is more complex than just changing the daemon configuration in the way I was thinking. This article seems to say that if you don't use Apple's proprietary IPC, you get whacked by performance penalties. https://developer.apple.com/documentation/apple-silicon/tuning-your-code-s-performance-for-apple-silicon
949 KiB
1.7 MiB
Author
Owner

I have cracked it. CL incoming.

I have cracked it. CL incoming.
Sign in to join this conversation.
No milestone
No project
No assignees
1 participant
Notifications
Due date
The due date is invalid or out of range. Please use the format "yyyy-mm-dd".

No due date set.

Dependencies

No dependencies set.

Reference
lix-project/lix#717
No description provided.