bulk-allocate Value instances in the evaluator

calling GC_malloc for each value is significantly more expensive than
allocating a bunch of values at once with GC_malloc_many. "a bunch" here
is a GC block size, ie 16KiB or less.

this gives a 1.5% performance boost when evaluating our nixos system.

tested with

nix eval --raw --impure --expr 'with import <nixpkgs/nixos> {}; system'

 # on master

  Time (mean ± σ):      3.335 s ±  0.007 s    [User: 2.774 s, System: 0.293 s]
  Range (min … max):    3.315 s …  3.347 s    50 runs

 # with this change

  Time (mean ± σ):      3.288 s ±  0.006 s    [User: 2.728 s, System: 0.292 s]
  Range (min … max):    3.274 s …  3.307 s    50 runs
This commit is contained in:
pennae 2021-12-20 13:28:54 +01:00
parent 6e6e998930
commit 09b245690a
2 changed files with 19 additions and 1 deletions

View file

@ -821,8 +821,23 @@ inline Value * EvalState::lookupVar(Env * env, const ExprVar & var, bool noEval)
Value * EvalState::allocValue()
{
/* We use the boehm batch allocator to speed up allocations of Values (of which there are many).
GC_malloc_many returns a linked list of objects of the given size, where the first word
of each object is also the pointer to the next object in the list. This also means that we
have to explicitly clear the first word of every object we take. */
if (!valueAllocCache) {
valueAllocCache = GC_malloc_many(sizeof(Value));
if (!valueAllocCache) throw std::bad_alloc();
}
/* GC_NEXT is a convenience macro for accessing the first word of an object.
Take the first list item, advance the list to the next item, and clear the next pointer. */
void * p = valueAllocCache;
GC_PTR_STORE_AND_DIRTY(&valueAllocCache, GC_NEXT(p));
GC_NEXT(p) = nullptr;
nrValues++;
auto v = (Value *) allocBytes(sizeof(Value));
auto v = (Value *) p;
return v;
}

View file

@ -133,6 +133,9 @@ private:
/* Cache used by prim_match(). */
std::shared_ptr<RegexCache> regexCache;
/* Allocation cache for GC'd Value objects. */
void * valueAllocCache = nullptr;
public:
EvalState(