explain store directory
This commit is contained in:
parent
f632816cba
commit
fa7ad4593d
4 changed files with 94 additions and 110 deletions
|
@ -17,8 +17,10 @@
|
|||
- [Upgrading Nix](installation/upgrading.md)
|
||||
- [Architecture](architecture/architecture.md)
|
||||
- [Store](architecture/store/store.md)
|
||||
- [Store Object](architecture/store/objects.md)
|
||||
- [Store Path](architecture/store/paths.md)
|
||||
- [Store Path](architecture/store/path.md)
|
||||
- [Digest](architecture/store/path.md#digest)
|
||||
- [Input Addressing](architecture/store/path.md#input-addressing)
|
||||
- [Content Addressing](architecture/store/path.md#content-addressing)
|
||||
- [Package Management](package-management/package-management.md)
|
||||
- [Basic Package Management](package-management/basic-package-mgmt.md)
|
||||
- [Profiles](package-management/profiles.md)
|
||||
|
|
|
@ -1,48 +0,0 @@
|
|||
# Store Object
|
||||
|
||||
Nix organizes the data it manages into *store objects*.
|
||||
A store object is the pair of
|
||||
|
||||
- a [file system object](#file-system-object)
|
||||
- a set of [references](#reference) to store objects.
|
||||
|
||||
We call a store object's outermost file system object the *root*.
|
||||
|
||||
```haskell
|
||||
data StoreOject = StoreObject {
|
||||
root :: FileSystemObject
|
||||
, references :: Set StoreObject
|
||||
}
|
||||
```
|
||||
|
||||
## File system object {#file-system-object}
|
||||
|
||||
The Nix store uses a simple file system model.
|
||||
|
||||
Every file system object is one of the following:
|
||||
- File: an executable flag, and arbitrary data for contents
|
||||
- Directory: mapping of names to child file system objects
|
||||
- [Symbolic link](https://en.m.wikipedia.org/wiki/Symbolic_link): may point anywhere.
|
||||
|
||||
```haskell
|
||||
data FileSystemObject
|
||||
= File { isExecutable :: Bool, contents :: Bytes }
|
||||
| Directory { entries :: Map FileName FileSystemObject }
|
||||
| SymLink { target :: Path }
|
||||
```
|
||||
|
||||
A bare file or symlink can be a root file system object.
|
||||
|
||||
Symlinks pointing outside of their own root, or to a store object without a matching reference, are allowed, but might not function as intended.
|
||||
|
||||
### Reference scanning
|
||||
|
||||
While references could be arbitrary paths, Nix requires them to be store paths to ensure correctness.
|
||||
Anything outside a given store is not under control of Nix, and therefore cannot be guaranteed to be present when needed.
|
||||
|
||||
However, having references match store paths in files is not enforced by the data model:
|
||||
Store objects could have excess or incomplete references with respect to store paths found in their file contents.
|
||||
|
||||
Scanning files therefore allows reliably capturing run time dependencies without declaring them explicitly.
|
||||
Doing it at build time and persisting references in the store object avoids repeating this time-consuming operation.
|
||||
|
|
@ -1,78 +1,103 @@
|
|||
# Store Path
|
||||
|
||||
A store path is a pair of a 20-byte digest and a name.
|
||||
Nix implements [references](store.md#reference) to [store objects](store.md#store-object) as *store paths*.
|
||||
|
||||
## String representation
|
||||
Store paths are pairs of
|
||||
|
||||
A store path is rendered as the concatenation of
|
||||
- a 20-byte [digest](#digest) for identification
|
||||
- a symbolic name for people to read.
|
||||
|
||||
- a store directory
|
||||
|
||||
- a path-separator (`/`)
|
||||
|
||||
- the digest rendered as Base-32 (20 arbitrary bytes becomes 32 ASCII chars)
|
||||
|
||||
- a hyphen (`-`)
|
||||
|
||||
- the name
|
||||
|
||||
Let's take the store path from the very beginning of this manual as an example:
|
||||
|
||||
/nix/store/b6gvzjyb2pg0kjfwrjmg1vfhh54ad73z-firefox-33.1
|
||||
|
||||
This parses like so:
|
||||
|
||||
/nix/store/b6gvzjyb2pg0kjfwrjmg1vfhh54ad73z-firefox-33.1
|
||||
^^^^^^^^^^ ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ ^^^^^^^^^^^^
|
||||
store dir digest name
|
||||
|
||||
We then can discard the store dir to recover the conceptual pair that is a store path:
|
||||
Example:
|
||||
|
||||
{
|
||||
digest: "b6gvzjyb2pg0kjfwrjmg1vfhh54ad73z",
|
||||
name: "firefox-33.1",
|
||||
}
|
||||
|
||||
### Where did the "store directory" come from?
|
||||
It is rendered to a file system path as the concatenation of
|
||||
|
||||
If you notice, the above references a "store directory", but that is *not* part of the definition of a store path.
|
||||
We can discard it when parsing, but what about when printing?
|
||||
We need to get a store directory from *somewhere*.
|
||||
- [store directory](#store-directory)
|
||||
- path-separator (`/`)
|
||||
- [digest](#digest) rendered in [base-32](https://en.m.wikipedia.org/wiki/Base32) (20 arbitrary bytes become 32 ASCII characters)
|
||||
- hyphen (`-`)
|
||||
- name
|
||||
|
||||
The answer is, the store directory is a property of the store that contains the store path.
|
||||
The explanation for this is simple enough: a store is notionally mounted as a directory at some location, and the store object's root file system likewise mounted at this path within that directory.
|
||||
Example:
|
||||
|
||||
This does, however, mean the string representation of a store path is not derived just from the store path itself, but is in fact "context dependent".
|
||||
/nix/store/b6gvzjyb2pg0kjfwrjmg1vfhh54ad73z-firefox-33.1
|
||||
|--------| |------------------------------| |----------|
|
||||
store directory digest name
|
||||
|
||||
## The digest
|
||||
## Store Directory {#store-directory}
|
||||
|
||||
The calculation of the digest is quite complicated for historical reasons.
|
||||
The details of the algorithms will be discussed later once more concepts have been introduced.
|
||||
For now, we just concern ourselves with the *key properties* of those algorithms.
|
||||
Every [store](./store.md) has a store directory.
|
||||
|
||||
If the store has a [file system representation](./store.md#files-and-processes), this directory contains the store’s [file system objects](#file-system-object), which can be addressed by [store paths](#store-path).
|
||||
|
||||
This means a store path is not just derived from the referenced store object itself, but depends on the store the store object is in.
|
||||
|
||||
::: {.note}
|
||||
**Historical note** The 20 byte restriction is because originally a digests were SHA-1 hashes.
|
||||
This is no longer true, but longer hashes and other information are still boiled down to 20 bytes.
|
||||
The store directory defaults to `/nix/store`, but is in principle arbitrary.
|
||||
:::
|
||||
|
||||
Store paths are either *content-addressed* or *input-addressed*.
|
||||
It is important which store a given store object belongs to:
|
||||
Files in the store object can contain store paths, and processes may read these paths.
|
||||
Nix can only guarantee [referential integrity](store.md#closure) if store paths do not cross store boundaries.
|
||||
|
||||
Therefore one can only copy store objects if
|
||||
|
||||
- the source and target stores' directories match
|
||||
|
||||
or
|
||||
|
||||
- the store object in question has no references, that is, contains no store paths.
|
||||
|
||||
To move a store object to a store with a different store directory, it has to be rebuilt, together with all its dependencies.
|
||||
It is in general not enough to replace the store directory string in file contents, as this may break internal offsets or content hashes.
|
||||
|
||||
# Digest {#digest}
|
||||
|
||||
In a [store path](#store-path), the [digest][digest] is the output of a [cryptographic hash function][hash] of either all *inputs* involved in building the referenced store object or its actual *contents*.
|
||||
|
||||
Store objects are therefore said to be either [input-addressed](#input-addressing) or [content-addressed](#content-addressing).
|
||||
|
||||
::: {.note}
|
||||
The former is a standard term used elsewhere.
|
||||
The later is our own creation to evoke a contrast with content addressing.
|
||||
**Historical note**: The 20 byte restriction is because originally digests were [SHA-1][sha-1] hashes.
|
||||
This is no longer true, but longer hashes and other information are still truncated to 20 bytes for compatibility.
|
||||
:::
|
||||
|
||||
Content addressing means that the store path digest ultimately derives from referred store object's contents, namely its file system objects and references.
|
||||
There is more than one *method* of content-addressing, however.
|
||||
Still, if one does know the content addressing schema that was used,
|
||||
(or guesses, there isn't that many yet!)
|
||||
one can recalculate the store path and thus verify the store object.
|
||||
[digest]: https://en.m.wiktionary.org/wiki/digest#Noun
|
||||
[hash]: https://en.m.wikipedia.org/wiki/Cryptographic_hash_function
|
||||
[sha-1]: https://en.m.wikipedia.org/wiki/SHA-1
|
||||
|
||||
Input addressing means that the store path digest derives from how the store path was produced, namely the "inputs" and plan that it was built from.
|
||||
Store paths of this sort can *not* be validated from the content of the store object.
|
||||
Rather, the store object might come with the store path it expects to be referred to by, and a signature of that path, the contents of the store path, and other metadata.
|
||||
The signature indicates that someone is vouching for the store object really being the results of a plan with that digest.
|
||||
|
||||
While metadata is included in the digest calculation explaining which method it was calculated by, this only serves to thwart pre-image attacks.
|
||||
That metadata is scrambled with everything else so that it is difficult to tell how a given store path was produced short of a brute-force search.
|
||||
In the parlance of referencing schemes, this means that store paths are not "self-describing".
|
||||
### Reference scanning
|
||||
|
||||
While references could be arbitrary paths, Nix requires them to be store paths to ensure correctness.
|
||||
Anything outside a given store is not under control of Nix, and therefore cannot be guaranteed to be present when needed.
|
||||
|
||||
However, having references match store paths in files is not enforced by the data model:
|
||||
Store objects could have excess or incomplete references with respect to store paths found in their file contents.
|
||||
|
||||
Scanning files therefore allows reliably capturing run time dependencies without declaring them explicitly.
|
||||
Doing it at build time and persisting references in the store object avoids repeating this time-consuming operation.
|
||||
|
||||
## Input Addressing {#input-addressing}
|
||||
|
||||
Input addressing means that the digest derives from how the store object was produced, namely its build inputs and build plan.
|
||||
|
||||
To compute the hash of a store object one needs a deterministic serialisation, i.e., a binary string representation which only changes if the store object changes.
|
||||
|
||||
Nix has a custom serialisation format called Nix Archive (NAR)
|
||||
|
||||
Store object references of this sort can *not* be validated from the content of the store object.
|
||||
Rather, a cryptographic signature has to be used to indicate that someone is vouching for the store object really being produced from a build plan with that digest.
|
||||
|
||||
## Content Addressing {#content-addressing}
|
||||
|
||||
Content addressing means that the digest derives from the store object's contents, namely its file system objects and references.
|
||||
If one knows content addressing was used, one can recalculate the reference and thus verify the store object.
|
||||
|
||||
Content addressing is currently only used for the special cases of source files and "fixed-output derivations", where the contents of a store object are known in advance.
|
||||
Content addressing of build results is still an [experimental feature subject to some restrictions](https://github.com/tweag/rfcs/blob/cas-rfc/rfcs/0062-content-addressed-paths.md).
|
||||
|
||||
|
|
|
@ -67,18 +67,19 @@ As it keeps track of references, it can [garbage-collect][garbage-collection] un
|
|||
[ store ] --> collect garbage --> [ store' ]
|
||||
|
||||
|
||||
## Closure
|
||||
## Closure {#closure}
|
||||
|
||||
Nix stores have the *closure property*: for each store object in the store, all the store objects it references must also be in the store.
|
||||
Nix stores ensure [referential integrity][referential-integrity]: for each store object in the store, all the store objects it references must also be in the store.
|
||||
|
||||
Adding, building, copying and deleting store objects must be done in a way that obeys this property:
|
||||
The set of all store objects reachable by following references from a given initial set of store objects is called a *closure*.
|
||||
|
||||
Adding, building, copying and deleting store objects must be done in a way that preserves referential integrity:
|
||||
|
||||
- A newly added store object cannot have references, unless it is a build task.
|
||||
|
||||
- Build results must only refer to store objects in the closure of the build inputs.
|
||||
|
||||
Building a store object will add appropriate references, according to the build task.
|
||||
These references can only come from declared build inputs.
|
||||
|
||||
- Store objects being copied must refer to objects already in the destination store.
|
||||
|
||||
|
@ -86,16 +87,15 @@ Adding, building, copying and deleting store objects must be done in a way that
|
|||
|
||||
- We can only safely delete store objects which are not reachable from any reference still in use.
|
||||
|
||||
Garbage collection will delete those store objects that cannot be reached from any reference in use.
|
||||
|
||||
<!-- more details in section on garbage collection, link to it once it exists -->
|
||||
|
||||
[referential-integrity]: https://en.m.wikipedia.org/wiki/Referential_integrity
|
||||
[garbage-collection]: https://en.m.wikipedia.org/wiki/Garbage_collection_(computer_science)
|
||||
[immutable-object]: https://en.m.wikipedia.org/wiki/Immutable_object
|
||||
[opaque-data-type]: https://en.m.wikipedia.org/wiki/Opaque_data_type
|
||||
[unique-identifier]: https://en.m.wikipedia.org/wiki/Unique_identifier
|
||||
|
||||
## Files and Processes
|
||||
## Files and Processes {#files-and-processes}
|
||||
|
||||
Nix maps between its store model and the [Unix paradigm][unix-paradigm] of [files and processes][file-descriptor], by encoding immutable store objects and opaque identifiers as file system primitives: files and directories, and paths.
|
||||
That allows processes to resolve references contained in files and thus access the contents of store objects.
|
||||
|
@ -103,11 +103,16 @@ That allows processes to resolve references contained in files and thus access t
|
|||
Store objects are therefore implemented as the pair of
|
||||
|
||||
- a [file system object](fso.md) for data
|
||||
- a set of *store paths* for references.
|
||||
- a set of [store paths](paths.md) for references.
|
||||
|
||||
[unix-paradigm]: https://en.m.wikipedia.org/wiki/Everything_is_a_file
|
||||
[file-descriptor]: https://en.m.wikipedia.org/wiki/File_descriptor
|
||||
|
||||
The following diagram shows a radical simplification of how Nix interacts with the operating system:
|
||||
It uses files as build inputs, and build outputs are files again.
|
||||
On the operating system, files are either "dead" data, or "live" as processes, which in turn operate on files, or can bring them to life.
|
||||
A build function also amounts to an operating system process (not depicted).
|
||||
|
||||
```
|
||||
+-----------------------------------------------------------------+
|
||||
| Nix |
|
||||
|
|
Loading…
Reference in a new issue