From fa7ad4593d09d04afe1d215d2da09be02c2e2836 Mon Sep 17 00:00:00 2001 From: Valentin Gagarin Date: Thu, 9 Jun 2022 11:07:50 +0200 Subject: [PATCH] explain store directory --- doc/manual/src/SUMMARY.md.in | 6 +- doc/manual/src/architecture/store/objects.md | 48 ------- doc/manual/src/architecture/store/paths.md | 129 +++++++++++-------- doc/manual/src/architecture/store/store.md | 21 +-- 4 files changed, 94 insertions(+), 110 deletions(-) delete mode 100644 doc/manual/src/architecture/store/objects.md diff --git a/doc/manual/src/SUMMARY.md.in b/doc/manual/src/SUMMARY.md.in index 5e2d26bf7..997d75444 100644 --- a/doc/manual/src/SUMMARY.md.in +++ b/doc/manual/src/SUMMARY.md.in @@ -17,8 +17,10 @@ - [Upgrading Nix](installation/upgrading.md) - [Architecture](architecture/architecture.md) - [Store](architecture/store/store.md) - - [Store Object](architecture/store/objects.md) - - [Store Path](architecture/store/paths.md) + - [Store Path](architecture/store/path.md) + - [Digest](architecture/store/path.md#digest) + - [Input Addressing](architecture/store/path.md#input-addressing) + - [Content Addressing](architecture/store/path.md#content-addressing) - [Package Management](package-management/package-management.md) - [Basic Package Management](package-management/basic-package-mgmt.md) - [Profiles](package-management/profiles.md) diff --git a/doc/manual/src/architecture/store/objects.md b/doc/manual/src/architecture/store/objects.md deleted file mode 100644 index 8ab0b9368..000000000 --- a/doc/manual/src/architecture/store/objects.md +++ /dev/null @@ -1,48 +0,0 @@ -# Store Object - -Nix organizes the data it manages into *store objects*. -A store object is the pair of - - - a [file system object](#file-system-object) - - a set of [references](#reference) to store objects. - -We call a store object's outermost file system object the *root*. - -```haskell -data StoreOject = StoreObject { - root :: FileSystemObject -, references :: Set StoreObject -} -``` - -## File system object {#file-system-object} - -The Nix store uses a simple file system model. - -Every file system object is one of the following: - - File: an executable flag, and arbitrary data for contents - - Directory: mapping of names to child file system objects - - [Symbolic link](https://en.m.wikipedia.org/wiki/Symbolic_link): may point anywhere. - -```haskell -data FileSystemObject - = File { isExecutable :: Bool, contents :: Bytes } - | Directory { entries :: Map FileName FileSystemObject } - | SymLink { target :: Path } -``` - -A bare file or symlink can be a root file system object. - -Symlinks pointing outside of their own root, or to a store object without a matching reference, are allowed, but might not function as intended. - -### Reference scanning - -While references could be arbitrary paths, Nix requires them to be store paths to ensure correctness. -Anything outside a given store is not under control of Nix, and therefore cannot be guaranteed to be present when needed. - -However, having references match store paths in files is not enforced by the data model: -Store objects could have excess or incomplete references with respect to store paths found in their file contents. - -Scanning files therefore allows reliably capturing run time dependencies without declaring them explicitly. -Doing it at build time and persisting references in the store object avoids repeating this time-consuming operation. - diff --git a/doc/manual/src/architecture/store/paths.md b/doc/manual/src/architecture/store/paths.md index 402e55e69..4867e7fd3 100644 --- a/doc/manual/src/architecture/store/paths.md +++ b/doc/manual/src/architecture/store/paths.md @@ -1,78 +1,103 @@ # Store Path -A store path is a pair of a 20-byte digest and a name. +Nix implements [references](store.md#reference) to [store objects](store.md#store-object) as *store paths*. -## String representation +Store paths are pairs of -A store path is rendered as the concatenation of +- a 20-byte [digest](#digest) for identification +- a symbolic name for people to read. - - a store directory - - - a path-separator (`/`) - - - the digest rendered as Base-32 (20 arbitrary bytes becomes 32 ASCII chars) - - - a hyphen (`-`) - - - the name - -Let's take the store path from the very beginning of this manual as an example: - - /nix/store/b6gvzjyb2pg0kjfwrjmg1vfhh54ad73z-firefox-33.1 - -This parses like so: - - /nix/store/b6gvzjyb2pg0kjfwrjmg1vfhh54ad73z-firefox-33.1 - ^^^^^^^^^^ ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ ^^^^^^^^^^^^ - store dir digest name - -We then can discard the store dir to recover the conceptual pair that is a store path: +Example: { digest: "b6gvzjyb2pg0kjfwrjmg1vfhh54ad73z", name: "firefox-33.1", } -### Where did the "store directory" come from? +It is rendered to a file system path as the concatenation of -If you notice, the above references a "store directory", but that is *not* part of the definition of a store path. -We can discard it when parsing, but what about when printing? -We need to get a store directory from *somewhere*. + - [store directory](#store-directory) + - path-separator (`/`) + - [digest](#digest) rendered in [base-32](https://en.m.wikipedia.org/wiki/Base32) (20 arbitrary bytes become 32 ASCII characters) + - hyphen (`-`) + - name -The answer is, the store directory is a property of the store that contains the store path. -The explanation for this is simple enough: a store is notionally mounted as a directory at some location, and the store object's root file system likewise mounted at this path within that directory. +Example: -This does, however, mean the string representation of a store path is not derived just from the store path itself, but is in fact "context dependent". + /nix/store/b6gvzjyb2pg0kjfwrjmg1vfhh54ad73z-firefox-33.1 + |--------| |------------------------------| |----------| + store directory digest name -## The digest +## Store Directory {#store-directory} -The calculation of the digest is quite complicated for historical reasons. -The details of the algorithms will be discussed later once more concepts have been introduced. -For now, we just concern ourselves with the *key properties* of those algorithms. +Every [store](./store.md) has a store directory. + +If the store has a [file system representation](./store.md#files-and-processes), this directory contains the store’s [file system objects](#file-system-object), which can be addressed by [store paths](#store-path). + +This means a store path is not just derived from the referenced store object itself, but depends on the store the store object is in. ::: {.note} -**Historical note** The 20 byte restriction is because originally a digests were SHA-1 hashes. -This is no longer true, but longer hashes and other information are still boiled down to 20 bytes. +The store directory defaults to `/nix/store`, but is in principle arbitrary. ::: -Store paths are either *content-addressed* or *input-addressed*. +It is important which store a given store object belongs to: +Files in the store object can contain store paths, and processes may read these paths. +Nix can only guarantee [referential integrity](store.md#closure) if store paths do not cross store boundaries. + +Therefore one can only copy store objects if + +- the source and target stores' directories match + + or + +- the store object in question has no references, that is, contains no store paths. + +To move a store object to a store with a different store directory, it has to be rebuilt, together with all its dependencies. +It is in general not enough to replace the store directory string in file contents, as this may break internal offsets or content hashes. + +# Digest {#digest} + +In a [store path](#store-path), the [digest][digest] is the output of a [cryptographic hash function][hash] of either all *inputs* involved in building the referenced store object or its actual *contents*. + +Store objects are therefore said to be either [input-addressed](#input-addressing) or [content-addressed](#content-addressing). ::: {.note} -The former is a standard term used elsewhere. -The later is our own creation to evoke a contrast with content addressing. +**Historical note**: The 20 byte restriction is because originally digests were [SHA-1][sha-1] hashes. +This is no longer true, but longer hashes and other information are still truncated to 20 bytes for compatibility. ::: -Content addressing means that the store path digest ultimately derives from referred store object's contents, namely its file system objects and references. -There is more than one *method* of content-addressing, however. -Still, if one does know the content addressing schema that was used, -(or guesses, there isn't that many yet!) -one can recalculate the store path and thus verify the store object. +[digest]: https://en.m.wiktionary.org/wiki/digest#Noun +[hash]: https://en.m.wikipedia.org/wiki/Cryptographic_hash_function +[sha-1]: https://en.m.wikipedia.org/wiki/SHA-1 -Input addressing means that the store path digest derives from how the store path was produced, namely the "inputs" and plan that it was built from. -Store paths of this sort can *not* be validated from the content of the store object. -Rather, the store object might come with the store path it expects to be referred to by, and a signature of that path, the contents of the store path, and other metadata. -The signature indicates that someone is vouching for the store object really being the results of a plan with that digest. -While metadata is included in the digest calculation explaining which method it was calculated by, this only serves to thwart pre-image attacks. -That metadata is scrambled with everything else so that it is difficult to tell how a given store path was produced short of a brute-force search. -In the parlance of referencing schemes, this means that store paths are not "self-describing". +### Reference scanning + +While references could be arbitrary paths, Nix requires them to be store paths to ensure correctness. +Anything outside a given store is not under control of Nix, and therefore cannot be guaranteed to be present when needed. + +However, having references match store paths in files is not enforced by the data model: +Store objects could have excess or incomplete references with respect to store paths found in their file contents. + +Scanning files therefore allows reliably capturing run time dependencies without declaring them explicitly. +Doing it at build time and persisting references in the store object avoids repeating this time-consuming operation. + +## Input Addressing {#input-addressing} + +Input addressing means that the digest derives from how the store object was produced, namely its build inputs and build plan. + +To compute the hash of a store object one needs a deterministic serialisation, i.e., a binary string representation which only changes if the store object changes. + +Nix has a custom serialisation format called Nix Archive (NAR) + +Store object references of this sort can *not* be validated from the content of the store object. +Rather, a cryptographic signature has to be used to indicate that someone is vouching for the store object really being produced from a build plan with that digest. + +## Content Addressing {#content-addressing} + +Content addressing means that the digest derives from the store object's contents, namely its file system objects and references. +If one knows content addressing was used, one can recalculate the reference and thus verify the store object. + +Content addressing is currently only used for the special cases of source files and "fixed-output derivations", where the contents of a store object are known in advance. +Content addressing of build results is still an [experimental feature subject to some restrictions](https://github.com/tweag/rfcs/blob/cas-rfc/rfcs/0062-content-addressed-paths.md). + diff --git a/doc/manual/src/architecture/store/store.md b/doc/manual/src/architecture/store/store.md index 68bdadc4a..21a876f75 100644 --- a/doc/manual/src/architecture/store/store.md +++ b/doc/manual/src/architecture/store/store.md @@ -67,18 +67,19 @@ As it keeps track of references, it can [garbage-collect][garbage-collection] un [ store ] --> collect garbage --> [ store' ] -## Closure +## Closure {#closure} -Nix stores have the *closure property*: for each store object in the store, all the store objects it references must also be in the store. +Nix stores ensure [referential integrity][referential-integrity]: for each store object in the store, all the store objects it references must also be in the store. -Adding, building, copying and deleting store objects must be done in a way that obeys this property: +The set of all store objects reachable by following references from a given initial set of store objects is called a *closure*. + +Adding, building, copying and deleting store objects must be done in a way that preserves referential integrity: - A newly added store object cannot have references, unless it is a build task. - Build results must only refer to store objects in the closure of the build inputs. Building a store object will add appropriate references, according to the build task. - These references can only come from declared build inputs. - Store objects being copied must refer to objects already in the destination store. @@ -86,16 +87,15 @@ Adding, building, copying and deleting store objects must be done in a way that - We can only safely delete store objects which are not reachable from any reference still in use. - Garbage collection will delete those store objects that cannot be reached from any reference in use. - +[referential-integrity]: https://en.m.wikipedia.org/wiki/Referential_integrity [garbage-collection]: https://en.m.wikipedia.org/wiki/Garbage_collection_(computer_science) [immutable-object]: https://en.m.wikipedia.org/wiki/Immutable_object [opaque-data-type]: https://en.m.wikipedia.org/wiki/Opaque_data_type [unique-identifier]: https://en.m.wikipedia.org/wiki/Unique_identifier -## Files and Processes +## Files and Processes {#files-and-processes} Nix maps between its store model and the [Unix paradigm][unix-paradigm] of [files and processes][file-descriptor], by encoding immutable store objects and opaque identifiers as file system primitives: files and directories, and paths. That allows processes to resolve references contained in files and thus access the contents of store objects. @@ -103,11 +103,16 @@ That allows processes to resolve references contained in files and thus access t Store objects are therefore implemented as the pair of - a [file system object](fso.md) for data - - a set of *store paths* for references. + - a set of [store paths](paths.md) for references. [unix-paradigm]: https://en.m.wikipedia.org/wiki/Everything_is_a_file [file-descriptor]: https://en.m.wikipedia.org/wiki/File_descriptor +The following diagram shows a radical simplification of how Nix interacts with the operating system: +It uses files as build inputs, and build outputs are files again. +On the operating system, files are either "dead" data, or "live" as processes, which in turn operate on files, or can bring them to life. +A build function also amounts to an operating system process (not depicted). + ``` +-----------------------------------------------------------------+ | Nix |