hydra/doc/architecture.md
fricklerhandwerk 0803634a41 add architecture notes
meeting notes from @edolstra giving a one-hour tour of the code
2022-04-06 09:01:10 +02:00

5 KiB
Raw Permalink Blame History

This is a rough overview from informal discussions and explanations of inner workings of Hydra. You can use it as a guide to navigate the codebase or ask questions.

Architecture

Components

  • Postgres database
    • configuration
    • build queue
      • what is already built
      • what is going to build
  • hydra-server
    • Perl, Catalyst
    • web frontend
  • hydra-evaluator
    • Perl, C++
    • fetches repositories
    • evaluates job sets
      • pointers to a repository
    • adds builds to the queue
  • hydra-queue-runner
    • C++
    • monitors the queue
    • executes build steps
    • uploads build results
      • copy to a Nix store
  • Nix store
    • contains .drvs
    • populated by hydra-evaluator
    • read by hydra-queue-runner
  • destination Nix store
    • can be a binary cache
    • e.g. [cache.nixos.org](http://cache.nixos.org) or the same store again (for small Hydra instances)
  • plugin architecture
    • extend evaluator for new kinds of repositories
      • e.g. fetch from git

Database Schema

https://github.com/NixOS/hydra/blob/master/src/sql/hydra.sql

  • Jobsets
    • populated by calling Nix evaluator
    • every Nix derivation in release.nix is a Job
    • flake
      • URL to flake, if job is from a flake
      • single-point of configuration for flake builds
      • flake itself contains pointers to dependencies
      • for other builds we need more configuration data
  • JobsetInputs
    • more configuration for a Job
  • JobsetInputAlts
    • historical, where you could have more than one alternative for each input
    • it would have done the cross product of all possibilities
    • not used any more, as now every input is unique
    • originally that was to have alternative values for the system parameter
      • x86-linux, x86_64-darwin
      • turned out not to be a good idea, as job set names did not uniquely identify output
  • Builds
    • queue: scheduled and finished builds
    • instance of a Job
    • corresponds to a top-level derivation
      • can have many dependencies that dont have a corresponding build
      • dependencies represented as BuildSteps
    • a Job is all the builds with a particular name, e.g.
      • git.x86_64-linux is a job
      • there maybe be multiple builds for that job
        • build ID: just an auto-increment number
    • building one thing can actually cause many (hundreds of) derivations to be built
    • for queued builds, the drv has to be present in the store
      • otherwise build will fail, e.g. after garbage collection
  • BuildSteps
    • corresponds to a derivation or substitution
    • are reused through the Nix store
    • may be duplicated for unique derivations due to how they relate to Jobs
  • BuildStepOutputs
    • corresponds directly to derivation outputs
      • out, dev, ...
  • BuildProducts
    • not a Nix concept
    • populated from a special file $out/nix-support/hydra-build-producs
    • used to scrape parts of build results out to the web frontend
      • e.g. manuals, ISO images, etc.
  • BuildMetrics
    • scrapes data from magic location, similar to BuildProducts to show fancy graphs
      • e.g. test coverage, build times, CPU utilization for build
    • $out/nix-support/hydra-metrics
  • BuildInputs
    • probably obsolute
  • JobsetEvalMembers
    • joins evaluations with jobs
    • huge table, 10ks of entries for one nixpkgs evaluation
    • can be imagined as a subset of the eval cache
      • could in principle use the eval cache

release.nix

  • hydra-specific convention to describe the build
  • should evaluate to an attribute set that contains derivations
  • hydra considers every attribute in that set a job
  • every job needs a unique name
    • if you want to build for multiple platforms, you need to reflect that in the name
  • hydra does a deep traversal of the attribute set
    • just evaluating the names may take half an hour

FAQ

Can we imagine Hydra to be a persistence layer for the build graph?

  • partially, it lacks a lot of information
    • does not keep edges of the build graph

How does Hydra relate to nix build?

  • reimplements the top level Nix build loop, scheduling, etc.
  • Hydra has to persist build results
  • Hydra has more sophisticated remote build execution and scheduling than Nix

Is it conceptually possible to unify Hydras capabilities with regular Nix?

  • Nix does not have any scheduling, it just traverses the build graph
  • Hydra has scheduling in terms of job set priorities, tracks how much of a job set it has worked on
    • makes sure jobs dont starve each other
  • Nix cannot dynamically add build jobs at runtime
    • RFC 92 should enable that
    • internally it is already possible, but there is no interface to do that
  • Hydra queue runner is a long running process
    • Nix takes a static set of jobs, working it off at once