Perf regressions around CL 1626: Expr returning Value & using new Value constructors #792

Open
opened 2025-04-07 20:00:19 +00:00 by fireyfly · 5 comments
Member

So I took a stab at trying to narrow down the perf regressions in CL 1626.. and the tldr is that while the hit to eval seems related to the topic of the CL (returning Value rather than using out-parameters), the hit to nix search seems to be more related to the move to newer Value constructors (somehow? I'm confused too)

Since the latter is kinda orthogonal to the CL topic and relates to constructors already merged and in the codebase, I thought it makes sense to create an issue for this rather than continue the conversation in Gerrit.. and also since it seems more convenient for longer-form comments

Benchmark runs

I rebased CL 1626 against latest lix HEAD (@ 2ef4b69) and did some preliminary benchmark runs and looks at it.. and after a bit I realised we're kinda doing two things in the CL: move to return a Value, and at the same time also use some of the new constructors instead of the old mkFoo functions. Since we're trying to track down confusing perf behaviour, I thought it might not hurt to split the latter out into its own commit.

I attached patches for both of these (the CL rebased and split into only the eval-returns-value part, and then the new constructors as a separate step atop that)

Here's the benchmark results (walltime + icount) for all three versions (baseline, return value, return value + new ctors), trying to run them on an as-idle-as-possible system.

bench-1 walltime (relative)
----------------------------------------
           result-baseline result-returns-value result-returns-value-new-ctors
search               1.000                1.063                          1.145
rebuild              1.000                0.937                          1.056
rebuild_lh          *1.000              **1.099                          1.133
parse                1.000                0.964                          0.978
* first run significantly slower; ** statistical outliers detected
(NB: with chrt call commented out)

bench-2 walltime (relative)
----------------------------------------
           result-baseline result-returns-value result-returns-value-new-ctors
search               1.000                1.061                          1.193
rebuild              1.000                1.050                          1.054
rebuild_lh           1.000                0.994                          1.080
parse                1.000                0.984                          0.928
(NB: with chrt call commented out)

perf-1 icount (relative)
----------------------------------------
           result-baseline result-returns-value result-returns-value-new-ctors
search               1.000                1.000                          1.100
rebuild              1.000                1.069                          1.070
rebuild_lh           1.000                1.002                          1.003
parse                1.000                1.000                          1.000

perf-2 icount (relative)
----------------------------------------
           result-baseline result-returns-value result-returns-value-new-ctors
search               1.000                1.010                          1.121
rebuild              1.000                1.069                          1.066
rebuild_lh           1.000                1.002                          1.003
parse                1.000                1.000                          1.000

...which looks a lot like the new-constructors change is responsible for (most of) the search perf hit?

the other notable thing to me is how the large-heap rebuild seems mostly unaffected, which I guess would seem to imply most of the hit to the rebuild eval stems from additional GC allocations? but I'm not really sure why this would be the case from the change to returning Values on the stack

At this point I also build the different testsubjects with temporary files kept (clang save-temps) and tried to diff the generated assembly between returns-value and returns-value-new-ctors.. figuring such a large diff has to stand out. the only thing I noticed is the diffs for ExprList and ExprConcatStrings were quite involved; the other functions all seemed to have mostly-trivial differences. building a version with those two functions reverted still shows same performance as returns-value-new-ctors though, so no dice there either

At this point I dunno what to try next, but I thought the split CL and perf difference between them was curious enough, and maybe it prompts some insights for someone with more C++/lix codebase familiarity?

So I took a stab at trying to narrow down the perf regressions in [CL 1626][cl].. and the tldr is that while the hit to eval seems related to the topic of the CL (returning Value rather than using out-parameters), the hit to `nix search` seems to be more related to the move to newer Value constructors (somehow? I'm confused too) [cl]: https://gerrit.lix.systems/c/lix/+/1626 Since the latter is kinda orthogonal to the CL topic and relates to constructors already merged and in the codebase, I thought it makes sense to create an issue for this rather than continue the conversation in Gerrit.. and also since it seems more convenient for longer-form comments ## Benchmark runs I rebased CL 1626 against latest lix HEAD (@ 2ef4b69) and did some preliminary benchmark runs and looks at it.. and after a bit I realised we're kinda doing two things in the CL: move to return a Value, and at the same time also use some of the new constructors instead of the old mkFoo functions. Since we're trying to track down confusing perf behaviour, I thought it might not hurt to split the latter out into its own commit. I attached patches for both of these (the CL rebased and split into only the eval-returns-value part, and then the new constructors as a separate step atop that) Here's the benchmark results (walltime + icount) for all three versions (baseline, return value, return value + new ctors), trying to run them on an as-idle-as-possible system. ``` bench-1 walltime (relative) ---------------------------------------- result-baseline result-returns-value result-returns-value-new-ctors search 1.000 1.063 1.145 rebuild 1.000 0.937 1.056 rebuild_lh *1.000 **1.099 1.133 parse 1.000 0.964 0.978 * first run significantly slower; ** statistical outliers detected (NB: with chrt call commented out) bench-2 walltime (relative) ---------------------------------------- result-baseline result-returns-value result-returns-value-new-ctors search 1.000 1.061 1.193 rebuild 1.000 1.050 1.054 rebuild_lh 1.000 0.994 1.080 parse 1.000 0.984 0.928 (NB: with chrt call commented out) perf-1 icount (relative) ---------------------------------------- result-baseline result-returns-value result-returns-value-new-ctors search 1.000 1.000 1.100 rebuild 1.000 1.069 1.070 rebuild_lh 1.000 1.002 1.003 parse 1.000 1.000 1.000 perf-2 icount (relative) ---------------------------------------- result-baseline result-returns-value result-returns-value-new-ctors search 1.000 1.010 1.121 rebuild 1.000 1.069 1.066 rebuild_lh 1.000 1.002 1.003 parse 1.000 1.000 1.000 ``` ...which looks a lot like the new-constructors change is responsible for (most of) the search perf hit? the other notable thing to me is how the large-heap rebuild seems mostly unaffected, which I guess would seem to imply most of the hit to the rebuild eval stems from additional GC allocations? but I'm not really sure why this would be the case from the change to returning `Value`s on the stack At this point I also build the different testsubjects with temporary files kept (clang `save-temps`) and tried to diff the generated assembly between returns-value and returns-value-new-ctors.. figuring such a large diff has to stand out. the only thing I noticed is the diffs for `ExprList` and `ExprConcatStrings` were quite involved; the other functions all seemed to have mostly-trivial differences. building a version with those two functions reverted still shows same performance as `returns-value-new-ctors` though, so no dice there either At this point I dunno what to try next, but I thought the split CL and perf difference between them was curious enough, and maybe it prompts some insights for someone with more C++/lix codebase familiarity?
Member

This issue was mentioned on Gerrit on the following CLs:

  • comment in cl/1626 ("libexpr: migrate Expr::eval to return a Value")
  • commit message in cl/2971 ("libexpr: pad boolean Value case to a full word")
<!-- GERRIT_LINKBOT: {"cls": [{"backlink": "https://gerrit.lix.systems/c/lix/+/1626", "number": 1626, "kind": "comment"}, {"backlink": "https://gerrit.lix.systems/c/lix/+/2971", "number": 2971, "kind": "commit message"}], "cl_meta": {"1626": {"change_title": "libexpr: migrate Expr::eval to return a Value"}, "2971": {"change_title": "libexpr: pad boolean Value case to a full word"}}} --> This issue was mentioned on Gerrit on the following CLs: * comment in [cl/1626](https://gerrit.lix.systems/c/lix/+/1626) ("libexpr: migrate Expr::eval to return a Value") * commit message in [cl/2971](https://gerrit.lix.systems/c/lix/+/2971) ("libexpr: pad boolean Value case to a full word")
Owner

that this only affects search and rebuild means it's a gc collection symptom, not extra allocations. which also means that it's caused by differences in memory contents caused by the code changes. so we went through the new constructors one by one and checked how they affect memory contents, which turned up that the modern boolean constructor does not fully clear its first data word (it only writes the low byte, the others remain indeterminate). this in turn means that whatever was previously in the (mostly stack) memory that we're not fully overwriting remains a valid pointer in many cases, which causes the gc to collect less, which causes the gc to run more, which wrecks performance.

something similar happens when returning values instead of using out references, but that's not something we can mitigate as easily—nor should we, possibly, since the new gc we've been thinking about would work better with the current out-ref pattern. we can probably make the constructor usage change and drop all the mk${type} methods on Value, but the return type changes should be shelved for now.

modified   lix/libexpr/value.hh
@@ -538,7 +538,11 @@ public:
 
         NixInt integer;
         struct {
-            bool boolean;
+            struct {
+                uintptr_t b;
+                operator bool() const { return b; }
+                void operator=(bool o) { b = o; }
+            } boolean;
             uintptr_t _bool_pad;
         };
that this only affects search and rebuild means it's a gc *collection* symptom, not extra allocations. which also means that it's caused by differences in memory contents caused by the code changes. so we went through the new constructors one by one and checked how they affect memory contents, which turned up that the modern boolean constructor does not fully clear its first data word (it only writes the low byte, the others remain indeterminate). this in turn means that whatever was previously in the (mostly stack) memory that we're not fully overwriting remains a valid pointer in many cases, which causes the gc to collect less, which causes the gc to run more, which wrecks performance. something similar happens when returning values instead of using out references, but that's not something we can mitigate as easily—nor should we, possibly, since the new gc we've been thinking about would work better with the current out-ref pattern. we can *probably* make the constructor usage change and drop all the `mk${type}` methods on Value, but the return type changes should be shelved for now. ``` modified lix/libexpr/value.hh @@ -538,7 +538,11 @@ public: NixInt integer; struct { - bool boolean; + struct { + uintptr_t b; + operator bool() const { return b; } + void operator=(bool o) { b = o; } + } boolean; uintptr_t _bool_pad; }; ```
Author
Member

oh! that makes a lot of sense now that I'm reading your explanation of the problem.. but I'm slightly annoyed at myself for not noticing that when looking at the assembly diffs yesterday (also sorry for the somewhat-more-verbose-than-I'd-intended report, heh)

will you submit a CL with that diff?

thanks for the heads-up re the return-type changes--in that case I guess we'd keep the default constructors for now in Value v; eval(..., &v); style cases

oh! that makes a lot of sense now that I'm reading your explanation of the problem.. but I'm slightly annoyed at myself for not noticing that when looking at the assembly diffs yesterday (also sorry for the somewhat-more-verbose-than-I'd-intended report, heh) will you submit a CL with that diff? thanks for the heads-up re the return-type changes--in that case I guess we'd keep the default constructors for now in `Value v; eval(..., &v);` style cases
Owner

will you submit a CL with that diff?

probably not, feel free to take it and make a ctor-only cl (and maybe even remove the mkX methods, if possible)

> will you submit a CL with that diff? probably not, feel free to take it and make a ctor-only cl (and maybe even remove the `mkX` methods, if possible)
Author
Member

oki, I'll see what I can cook up then, and continue on some half-baked changes I stashed

oki, I'll see what I can cook up then, and continue on some half-baked changes I stashed
Sign in to join this conversation.
No milestone
No project
No assignees
3 participants
Notifications
Due date
The due date is invalid or out of range. Please use the format "yyyy-mm-dd".

No due date set.

Dependencies

No dependencies set.

Reference: lix-project/lix#792
No description provided.