diff --git a/README.md b/README.md index e056966..7aa618d 100644 --- a/README.md +++ b/README.md @@ -22,7 +22,7 @@ And yes, it works on macOS too! ## Goals - **Multi-Tenancy**: Create a private cache for yourself, and one for friends and co-workers. Tenants are mutually untrusting and cannot pollute the views of other caches. -- **Global Deduplication**: Individual caches (tenants) are simply restricted views of the content-addressed global cache. When paths are uploaded, a mapping is created to grant the local cache access to the global NAR. +- **Global Deduplication**: Individual caches (tenants) are simply restricted views of the content-addressed NAR Store and Chunk Store. When paths are uploaded, a mapping is created to grant the local cache access to the global NAR. - **Managed Signing**: Signing is done on-the-fly by the server when store paths are fetched. The user pushing store paths does not have access to the signing key. - **Scalabilty**: Attic can be easily replicated. It's designed to be deployed to serverless platforms like fly.io but also works nicely in a single-machine setup. - **Garbage Collection**: Unused store paths can be garbage-collected in an LRU manner. diff --git a/book/src/SUMMARY.md b/book/src/SUMMARY.md index 10cc8d9..5c5688b 100644 --- a/book/src/SUMMARY.md +++ b/book/src/SUMMARY.md @@ -4,6 +4,7 @@ - [Tutorial](./tutorial.md) - [User Guide](./user-guide/README.md) - [Admin Guide](./admin-guide/README.md) + - [Chunking](./admin-guide/chunking.md) - [FAQs](./faqs.md) - [Reference](./reference/README.md) - [attic](./reference/attic-cli.md) diff --git a/book/src/admin-guide/chunking.md b/book/src/admin-guide/chunking.md new file mode 100644 index 0000000..f7de582 --- /dev/null +++ b/book/src/admin-guide/chunking.md @@ -0,0 +1,39 @@ +# Chunking + +Attic uses the [FastCDC algorithm](https://www.usenix.org/conference/atc16/technical-sessions/presentation/xia) to split uploaded NARs into chunks for deduplication. +There are four main parameters that control chunking in Attic: + +- `nar-size-threshold`: The minimum NAR size to trigger chunking + - When set to 0, chunking is disabled entirely for newly-uploaded NARs + - When set to 1, all newly-uploaded NARs are chunked +- `min-size`: The preferred minimum size of a chunk, in bytes +- `avg-size`: The preferred average size of a chunk, in bytes +- `max-size`: The preferred maximum size of a chunk, in bytes + +## Configuration + +When upgrading from an older version without support for chunking, you must include the new `[chunking]` section: + +```toml +# Data chunking +# +# Warning: If you change any of the values here, it will be +# difficult to reuse existing chunks for newly-uploaded NARs +# since the cutpoints will be different. As a result, the +# deduplication ratio will suffer for a while after the change. +[chunking] +# The minimum NAR size to trigger chunking +# +# If 0, chunking is disabled entirely for newly-uploaded NARs. +# If 1, all newly-uploaded NARs are chunked. +nar-size-threshold = 131072 # chunk files that are 128 KiB or larger + +# The preferred minimum size of a chunk, in bytes +min-size = 65536 # 64 KiB + +# The preferred average size of a chunk, in bytes +avg-size = 131072 # 128 KiB + +# The preferred maximum size of a chunk, in bytes +max-size = 262144 # 256 KiB +``` diff --git a/book/src/faqs.md b/book/src/faqs.md index 5c8c2b8..d67372b 100644 --- a/book/src/faqs.md +++ b/book/src/faqs.md @@ -18,12 +18,13 @@ The difference is that instead of having the upload streamed to the storage back Once the NAR hash is confirmed, a mapping is created to grant the local cache access to the global NAR. The global deduplication behavior is transparent to the client. -In the future, schemes to prove data possession without fully uploading the file may be supported. +This requirement may be disabled by setting `require-proof-of-possession` to false in the configuration. +When disabled, uploads of NARs that already exist in the Global NAR Store will immediately succeed. ## What happens if a user uploads a path with incorrect/malicious metadata? They will only pollute their own cache. -Path metadata (store path, references, deriver, etc.) are associated with the local cache and the global cache only contains content-addressed NARs that are "context-free." +Path metadata (store path, references, deriver, etc.) are associated with the local cache and the global cache only contains content-addressed NARs and chunks that are "context-free." ## How is authentication handled? @@ -31,40 +32,55 @@ Authentication is done via signed JWTs containing the allowed permissions. Each instance of `atticd --mode api-server` is stateless. This design may be revisited later, with option for a more stateful method of authentication. -## How is compression handled? - -Uploaded NARs are compressed on the server before being streamed to the storage backend. -We use the hash of the _uncompressed NAR_ to perform global deduplication. - -``` - ┌───────────────────────────────────►NAR Hash - │ - │ - ├───────────────────────────────────►NAR Size - │ - ┌─────┴────┐ ┌──────────┐ ┌───────────┐ - NAR Stream──►│NAR Hasher├─►│Compressor├─►│File Hasher├─►File Stream─►S3 - └──────────┘ └──────────┘ └─────┬─────┘ - │ - ├───────►File Hash - │ - │ - └───────►File Size -``` - -At first glance, performing compression on the client and deduplicating the result may sound appealing, but this approach isn't without problems: - -1. Different compression algorithms and levels naturally lead to different results which can't be deduplicated -2. Even with the same compression algorithm, the results are often non-deterministic (number of compression threads, library version, etc.) - -When we perform compression on the server and use the hashes of uncompressed NARs for lookups, non-determinism of compression is no longer a problem since we only compress once. - -On the other hand, performing compression on the server leads to additional CPU usage, increasing compute costs and the need to scale. -Such design decisions are to be revisited later. - ## On what granularity is deduplication done? -Currently, global deduplication is done on the level of NAR files. -File or chunk-level deduplication (e.g., casync) may be added later. -It remains to be seen how NAR reassembly can be done in a user-friendly yet economical manner. -On compute services, outbound traffic often isn't free while several S3-compatible storage services provide free egress (e.g., [Cloudflare R2](https://developers.cloudflare.com/r2/platform/pricing/)). +Global deduplication is done on two levels: NAR files and chunks. +During an upload, the NAR file is split into chunks using the [FastCDC algorithm](https://www.usenix.org/system/files/conference/atc16/atc16-paper-xia.pdf). +Identical chunks are only stored once in the storage backend. +If an identical NAR exists in the Global NAR Store, chunking is skipped and the NAR is directly deduplicated. + +During a download, `atticd` reassembles the entire NAR from constituent chunks by streaming from the storage backend. + +Data chunking is optional and can be disabled entirely for NARs smaller than a threshold. +When chunking is disabled, all new NARs are uploaded as a single chunk and NAR-level deduplication is still in effect. + +## Why chunk NARs instead of individual files? + +In the current design, chunking is applied to the entire uncompressed NAR file instead of individual constituent files in the NAR. +Big NARs that benefit the most from chunk-based deduplication (e.g., VSCode, Zoom) often have hundreds or thousands of small files. +During NAR reassembly, it's often uneconomical or impractical to fetch thousands of files to reconstruct the NAR in a scalable way. +By chunking the entire NAR, it's possible to configure the average chunk size to a larger value, ignoring file boundaries and lumping small files together. +This is also the approach [`casync`](https://github.com/systemd/casync) has taken. + +You may have heard that [the Tvix store protocol](https://flokli.de/posts/2022-06-30-store-protocol/) chunks individual files instead of the NAR. +The design of Attic is driven by the desire to effectively utilize existing platforms with practical limitations, while looking forward to the future. + +## What happens if a chunk is corrupt/missing? + +When a chunk is deleted from the database, all dependent `.narinfo` and `.nar` will become unavailable (503). +However, this can be recovered from automatically when any NAR containing the chunk is uploaded. + +At the moment, Attic cannot automatically detect when a chunk is corrupt or missing. +Correctly distinguishing between transient and persistent failures is difficult. +The `atticadm` utility will have the functionality to kill/delete bad chunks. + +## How is compression handled? + +Uploaded NARs are chunked then compressed on the server before being streamed to the storage backend. +On the chunk level, we use the hash of the _uncompressed chunk_ to perform global deduplication. + +``` + ┌───────────────────────────────────►Chunk Hash + │ + │ + ├───────────────────────────────────►Chunk Size + │ + ┌───────┴────┐ ┌──────────┐ ┌───────────┐ + Chunk Stream──►│Chunk Hasher├─►│Compressor├─►│File Hasher├─►File Stream─►S3 + └────────────┘ └──────────┘ └─────┬─────┘ + │ + ├───────►File Hash + │ + │ + └───────►File Size +``` diff --git a/book/src/introduction.md b/book/src/introduction.md index d6f12c3..8f5ad3b 100644 --- a/book/src/introduction.md +++ b/book/src/introduction.md @@ -17,7 +17,7 @@ Attic is still an early prototype and is looking for more testers. Want to jump ## Goals - **Multi-Tenancy**: Create a private cache for yourself, and one for friends and co-workers. Tenants are mutually untrusting and cannot pollute the views of other caches. -- **Global Deduplication**: Individual caches (tenants) are simply restricted views of the content-addressed global cache. When paths are uploaded, a mapping is created to grant the local cache access to the global NAR. +- **Global Deduplication**: Individual caches (tenants) are simply restricted views of the content-addressed NAR Store and Chunk Store. When paths are uploaded, a mapping is created to grant the local cache access to the global NAR. - **Managed Signing**: Signing is done on-the-fly by the server when store paths are fetched. The user pushing store paths does not have access to the signing key. -- **High Availability**: Attic can be easily replicated. It's designed to be deployed to serverless platforms like fly.io but also works nicely in a single-machine setup. +- **Scalabilty**: Attic can be easily replicated. It's designed to be deployed to serverless platforms like fly.io but also works nicely in a single-machine setup. - **Garbage Collection**: Unused store paths can be garbage-collected in an LRU manner. diff --git a/book/src/tutorial.md b/book/src/tutorial.md index e69dc21..e81fe35 100644 --- a/book/src/tutorial.md +++ b/book/src/tutorial.md @@ -174,10 +174,11 @@ Binary Cache Endpoint: http://localhost:8080/hello Retention Period: Global Default ``` -Because of Attic's global deduplication, garbage collection actually happens on two levels: +Because of Attic's global deduplication, garbage collection actually happens on three levels: 1. **Local Cache**: When an object is garbage collected, only the mapping between the metadata in the local cache and the NAR in the global cache gets deleted. The local cache loses access to the NAR, but the storage isn't freed. -2. **Global Cache**: Orphan NARs not referenced by any local cache then become eligible for deletion. This time the storage space is actually freed and subsequent uploads of the same NAR will actually trigger an upload to the storage backend. +2. **Global NAR Store**: Orphan NARs not referenced by any local cache then become eligible for deletion. +3. **Global Chunk Store**: Finally, orphan chunks not referenced by any NAR become eligible for deletion. This time the storage space is actually freed and subsequent uploads of the same chunk will actually trigger an upload to the storage backend. ## Summary