explain store directory

2022-06-09 11:07:50 +02:00 · 2022-06-09 11:07:50 +02:00 · fa7ad4593d
parent f632816cba
commit fa7ad4593d
4 changed files with 94 additions and 110 deletions
--- a/doc/manual/src/SUMMARY.md.in
+++ b/doc/manual/src/SUMMARY.md.in
@ -17,8 +17,10 @@
  - [Upgrading Nix](installation/upgrading.md)
 - [Architecture](architecture/architecture.md)
  - [Store](architecture/store/store.md)
-    - [Store Object](architecture/store/objects.md)
-    - [Store Path](architecture/store/paths.md)
+    - [Store Path](architecture/store/path.md)
+    - [Digest](architecture/store/path.md#digest)
+    - [Input Addressing](architecture/store/path.md#input-addressing)
+    - [Content Addressing](architecture/store/path.md#content-addressing)
 - [Package Management](package-management/package-management.md)
  - [Basic Package Management](package-management/basic-package-mgmt.md)
  - [Profiles](package-management/profiles.md)
--- a/doc/manual/src/architecture/store/objects.md
+++ b/doc/manual/src/architecture/store/objects.md
@ -1,48 +0,0 @@
-# Store Object
-
-Nix organizes the data it manages into *store objects*.
-A store object is the pair of
-
-  - a [file system object](#file-system-object)
-  - a set of [references](#reference) to store objects.
-
-We call a store object's outermost file system object the *root*.
-
-```haskell
-data StoreOject = StoreObject {
-  root       :: FileSystemObject
-, references :: Set StoreObject
-}
-```
-
-## File system object {#file-system-object}
-
-The Nix store uses a simple file system model.
-
-Every file system object is one of the following:
- - File: an executable flag, and arbitrary data for contents
- - Directory: mapping of names to child file system objects
- - [Symbolic link](https://en.m.wikipedia.org/wiki/Symbolic_link): may point anywhere.
-
-```haskell
-data FileSystemObject
-  = File { isExecutable :: Bool, contents :: Bytes }
-  | Directory { entries ::  Map FileName FileSystemObject }
-  | SymLink { target :: Path }
-```
-
-A bare file or symlink can be a root file system object.
-
-Symlinks pointing outside of their own root, or to a store object without a matching reference, are allowed, but might not function as intended.
-
-### Reference scanning
-
-While references could be arbitrary paths, Nix requires them to be store paths to ensure correctness.
-Anything outside a given store is not under control of Nix, and therefore cannot be guaranteed to be present when needed.
-
-However, having references match store paths in files is not enforced by the data model:
-Store objects could have excess or incomplete references with respect to store paths found in their file contents.
-
-Scanning files therefore allows reliably capturing run time dependencies without declaring them explicitly.
-Doing it at build time and persisting references in the store object avoids repeating this time-consuming operation.
-
--- a/doc/manual/src/architecture/store/paths.md
+++ b/doc/manual/src/architecture/store/paths.md
@ -1,78 +1,103 @@
 # Store Path

-A store path is a pair of a 20-byte digest and a name.
+Nix implements [references](store.md#reference) to [store objects](store.md#store-object) as *store paths*.

-## String representation
+Store paths are pairs of

-A store path is rendered as the concatenation of
+- a 20-byte [digest](#digest) for identification
+- a symbolic name for people to read.

-  - a store directory
-
-  - a path-separator (`/`)
-
-  - the digest rendered as Base-32 (20 arbitrary bytes becomes 32 ASCII chars)
-
-  - a hyphen (`-`)
-
-  - the name
-
-Let's take the store path from the very beginning of this manual as an example:
-
-    /nix/store/b6gvzjyb2pg0kjfwrjmg1vfhh54ad73z-firefox-33.1
-
-This parses like so:
-
-    /nix/store/b6gvzjyb2pg0kjfwrjmg1vfhh54ad73z-firefox-33.1
-    ^^^^^^^^^^ ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ ^^^^^^^^^^^^
-    store dir  digest                           name
-
-We then can discard the store dir to recover the conceptual pair that is a store path:
+Example:

    {
      digest: "b6gvzjyb2pg0kjfwrjmg1vfhh54ad73z",
      name:   "firefox-33.1",
    }

-### Where did the "store directory" come from?
+It is rendered to a file system path as the concatenation of

-If you notice, the above references a "store directory", but that is *not* part of the definition of a store path.
-We can discard it when parsing, but what about when printing?
-We need to get a store directory from *somewhere*.
+  - [store directory](#store-directory)
+  - path-separator (`/`)
+  - [digest](#digest) rendered in [base-32](https://en.m.wikipedia.org/wiki/Base32) (20 arbitrary bytes become 32 ASCII characters)
+  - hyphen (`-`)
+  - name

-The answer is, the store directory is a property of the store that contains the store path.
-The explanation for this is simple enough: a store is notionally mounted as a directory at some location, and the store object's root file system likewise mounted at this path within that directory.
+Example:

-This does, however, mean the string representation of a store path is not derived just from the store path itself, but is in fact "context dependent".
+      /nix/store/b6gvzjyb2pg0kjfwrjmg1vfhh54ad73z-firefox-33.1
+      |--------| |------------------------------| |----------|
+    store directory            digest                 name

-## The digest
+## Store Directory {#store-directory}

-The calculation of the digest is quite complicated for historical reasons.
-The details of the algorithms will be discussed later once more concepts have been introduced.
-For now, we just concern ourselves with the *key properties* of those algorithms.
+Every [store](./store.md) has a store directory.
+
+If the store has a [file system representation](./store.md#files-and-processes), this directory contains the store’s [file system objects](#file-system-object), which can be addressed by [store paths](#store-path).
+
+This means a store path is not just derived from the referenced store object itself, but depends on the store the store object is in.

 ::: {.note}
-**Historical note** The 20 byte restriction is because originally a digests were SHA-1 hashes.
-This is no longer true, but longer hashes and other information are still boiled down to 20 bytes.
+The store directory defaults to `/nix/store`, but is in principle arbitrary.
 :::

-Store paths are either *content-addressed* or *input-addressed*.
+It is important which store a given store object belongs to:
+Files in the store object can contain store paths, and processes may read these paths.
+Nix can only guarantee [referential integrity](store.md#closure) if store paths do not cross store boundaries.
+
+Therefore one can only copy store objects if
+
+- the source and target stores' directories match
+
+  or
+
+- the store object in question has no references, that is, contains no store paths.
+
+To move a store object to a store with a different store directory, it has to be rebuilt, together with all its dependencies.
+It is in general not enough to replace the store directory string in file contents, as this may break internal offsets or content hashes.
+
+# Digest {#digest}
+
+In a [store path](#store-path), the [digest][digest] is the output of a [cryptographic hash function][hash] of either all *inputs* involved in building the referenced store object or its actual *contents*.
+
+Store objects are therefore said to be either [input-addressed](#input-addressing) or [content-addressed](#content-addressing).

 ::: {.note}
-The former is a standard term used elsewhere.
-The later is our own creation to evoke a contrast with content addressing.
+**Historical note**: The 20 byte restriction is because originally digests were [SHA-1][sha-1] hashes.
+This is no longer true, but longer hashes and other information are still truncated to 20 bytes for compatibility.
 :::

-Content addressing means that the store path digest ultimately derives from referred store object's contents, namely its file system objects and references.
-There is more than one *method* of content-addressing, however.
-Still, if one does know the content addressing schema that was used,
-(or guesses, there isn't that many yet!)
-one can recalculate the store path and thus verify the store object.
+[digest]: https://en.m.wiktionary.org/wiki/digest#Noun
+[hash]: https://en.m.wikipedia.org/wiki/Cryptographic_hash_function
+[sha-1]: https://en.m.wikipedia.org/wiki/SHA-1

-Input addressing means that the store path digest derives from how the store path was produced, namely the "inputs" and plan that it was built from.
-Store paths of this sort can *not* be validated from the content of the store object.
-Rather, the store object might come with the store path it expects to be referred to by, and a signature of that path, the contents of the store path, and other metadata.
-The signature indicates that someone is vouching for the store object really being the results of a plan with that digest.

-While metadata is included in the digest calculation explaining which method it was calculated by, this only serves to thwart pre-image attacks.
-That metadata is scrambled with everything else so that it is difficult to tell how a given store path was produced short of a brute-force search.
-In the parlance of referencing schemes, this means that store paths are not "self-describing".
+### Reference scanning
+
+While references could be arbitrary paths, Nix requires them to be store paths to ensure correctness.
+Anything outside a given store is not under control of Nix, and therefore cannot be guaranteed to be present when needed.
+
+However, having references match store paths in files is not enforced by the data model:
+Store objects could have excess or incomplete references with respect to store paths found in their file contents.
+
+Scanning files therefore allows reliably capturing run time dependencies without declaring them explicitly.
+Doing it at build time and persisting references in the store object avoids repeating this time-consuming operation.
+
+## Input Addressing {#input-addressing}
+
+Input addressing means that the digest derives from how the store object was produced, namely its build inputs and build plan.
+
+To compute the hash of a store object one needs a deterministic serialisation, i.e., a binary string representation which only changes if the store object changes.
+
+Nix has a custom serialisation format called Nix Archive (NAR)
+
+Store object references of this sort can *not* be validated from the content of the store object.
+Rather, a cryptographic signature has to be used to indicate that someone is vouching for the store object really being produced from a build plan with that digest.
+
+## Content Addressing {#content-addressing}
+
+Content addressing means that the digest derives from the store object's contents, namely its file system objects and references.
+If one knows content addressing was used, one can recalculate the reference and thus verify the store object.
+
+Content addressing is currently only used for the special cases of source files and "fixed-output derivations", where the contents of a store object are known in advance.
+Content addressing of build results is still an [experimental feature subject to some restrictions](https://github.com/tweag/rfcs/blob/cas-rfc/rfcs/0062-content-addressed-paths.md).
+
--- a/doc/manual/src/architecture/store/store.md
+++ b/doc/manual/src/architecture/store/store.md
@ -67,18 +67,19 @@ As it keeps track of references, it can [garbage-collect][garbage-collection] un
    [ store ] --> collect garbage --> [ store' ]


-## Closure
+## Closure {#closure}

-Nix stores have the *closure property*: for each store object in the store, all the store objects it references must also be in the store.
+Nix stores ensure [referential integrity][referential-integrity]: for each store object in the store, all the store objects it references must also be in the store.

-Adding, building, copying and deleting store objects must be done in a way that obeys this property:
+The set of all store objects reachable by following references from a given initial set of store objects is called a *closure*.
+
+Adding, building, copying and deleting store objects must be done in a way that preserves referential integrity:

 - A newly added store object cannot have references, unless it is a build task.

 - Build results must only refer to store objects in the closure of the build inputs.

  Building a store object will add appropriate references, according to the build task.
-  These references can only come from declared build inputs.

 - Store objects being copied must refer to objects already in the destination store.

@ -86,16 +87,15 @@ Adding, building, copying and deleting store objects must be done in a way that

 - We can only safely delete store objects which are not reachable from any reference still in use.

-  Garbage collection will delete those store objects that cannot be reached from any reference in use.
-
  <!-- more details in section on garbage collection, link to it once it exists -->

+[referential-integrity]: https://en.m.wikipedia.org/wiki/Referential_integrity
 [garbage-collection]: https://en.m.wikipedia.org/wiki/Garbage_collection_(computer_science)
 [immutable-object]: https://en.m.wikipedia.org/wiki/Immutable_object
 [opaque-data-type]: https://en.m.wikipedia.org/wiki/Opaque_data_type
 [unique-identifier]: https://en.m.wikipedia.org/wiki/Unique_identifier

-## Files and Processes
+## Files and Processes {#files-and-processes}

 Nix maps between its store model and the [Unix paradigm][unix-paradigm] of [files and processes][file-descriptor], by encoding immutable store objects and opaque identifiers as file system primitives: files and directories, and paths.
 That allows processes to resolve references contained in files and thus access the contents of store objects.
@ -103,11 +103,16 @@ That allows processes to resolve references contained in files and thus access t
 Store objects are therefore implemented as the pair of

  - a [file system object](fso.md) for data
-  - a set of *store paths* for references.
+  - a set of [store paths](paths.md) for references.

 [unix-paradigm]: https://en.m.wikipedia.org/wiki/Everything_is_a_file
 [file-descriptor]: https://en.m.wikipedia.org/wiki/File_descriptor

+The following diagram shows a radical simplification of how Nix interacts with the operating system:
+It uses files as build inputs, and build outputs are files again.
+On the operating system, files are either "dead" data, or "live" as processes, which in turn operate on files, or can bring them to life.
+A build function also amounts to an operating system process (not depicted).
+
 ```
 +-----------------------------------------------------------------+
 | Nix                                                             |