kill aws-sdk-cpp with fire #272
Labels
No labels
Affects/CppNix
Affects/Nightly
Affects/Only nightly
Affects/Stable
Area/build-packaging
Area/cli
Area/evaluator
Area/fetching
Area/flakes
Area/language
Area/lix ci
Area/nix-eval-jobs
Area/profiles
Area/protocol
Area/releng
Area/remote-builds
Area/repl
Area/repl/debugger
Area/store
bug
Context
contributors
Context
drive-by
Context
maintainers
Context
RFD
crash 💥
Cross Compilation
devx
docs
Downstream Dependents
E/easy
E/hard
E/help wanted
E/reproducible
E/requires rearchitecture
imported
Language/Bash
Language/C++
Language/NixLang
Language/Python
Language/Rust
Needs Langver
OS/Linux
OS/macOS
performance
regression
release-blocker
stability
Status
blocked
Status
invalid
Status
postponed
Status
wontfix
testing
testing/flakey
Topic/Large Scale Installations
ux
No milestone
No project
No assignees
8 participants
Notifications
Due date
No due date set.
Dependencies
No dependencies set.
Reference: lix-project/lix#272
Loading…
Add table
Add a link
Reference in a new issue
No description provided.
Delete branch "%!s()"
Deleting a branch is permanent. Although the deleted branch may continue to exist for a short time before it actually gets removed, it CANNOT be undone in most cases. Continue?
challenge accepted. i would even go so far as saying we could just take the aws sdk code for credential files, vendor it, then toss the entire rest in the bin, if that's all we need.
see also #246
please... cross-compiling aws-cpp-sdk from linux to freebsd is a huge pain
Related: using the curl support to add s3 capability https://gerrit.lix.systems/c/lix/+/926
https://github.com/rusoto/rusoto/tree/master/rusoto/credential maybe of interest? could we replace this (actually very complex) creds code with statically linking more rust into our process?
We discussed about this topic yesterday with @puck and one thing that came up is that entirely replacing aws-sdk-cpp with a custom implementation is likely out of the question. In particular, we know some users rely on features like "fetching instance credentials from IMDS for EC2 machines", and... yeah, you can reimplement that, but it starts getting hairy. Long term Lix could make the decision to drop those features, but this is something that should be carefully assessed.
(Rusoto implements that too, but only if you also depend on stuff like Hyper, and uh do we really want another HTTP client statically linked? Doubtful that it would do much good to the compile times / dependency chains at this point.)
IMO what we should however do is:
s3://mycache?creds-file=/run/secrets/creds.txt
). This has other benefits, like not relying on ambient environment (IMO a pretty bad misfeature in most cases...).The combination of (1) and (2) allows for Lix builds without aws-sdk-cpp that can still do basic S3 usage, especially in non-AWS environments.
I'm writing a short design to describe all that and make sure we agree on the details and unless that conflicts with anyone else's plans I'll probably work on it in the following days/weeks.
AWS authentication is really complex and feature-rich. There's SSO, MFA, IMDS, IAM... aws-vault does a good job describing those. Maintaining that ourselves is impossible, vendoring is problematic long-term, and including aws-sdk-cpp is overkill.
I'd like to see a detailed proposal. But roughly speaking, I think we should support three auth schemas:
The one that puzzles me is number 3. With AWS S3, it's quite easy: you can just export AWS env vars and run your S3 queries. Not sure if it works for other S3-compatible solutions.
Here is a concrete proposal how we can delete basically everything except for a dependency on
aws-c-auth
for the credential chain:https://github.com/NixOS/nix/issues/13084#issue-3018269771
Is your feature request related to a problem?
The s3-binary-cache-store (especially substitution) is extremely buggy. Meanwhile our http substituter is not buggy and way more battle-tested
Proposed solution
Use
http-binary-cache-store
to talk to S3 directlylibcurl
has aws-sigv4 authentication built in these days: https://curl.se/libcurl/c/CURLOPT_AWS_SIGV4.htmlSo we can simple use the existing FileTransfer implementation to push to and pull from S3. As S3 is simply REST semantics that map to what
http-binary-cache-store
already doesThe only thing that we need to keep is the AWS credential chain to actually fetch the credentials to pass to curl but for that we only need to depend on https://github.com/awslabs/aws-crt-cpp or even smaller https://github.com/awslabs/aws-c-auth
This also solves the problem of people complaining that we link against
aws-cpp-sdk
asaws-c-auth
is a way lighter dependencyaws-c-auth
suffers from the same problem as https://github.com/NixOS/nix/issues/5947 but now we only need one library to enableBYO_CRYPTO
instead of a whole bunch of them. So it makes things significantly easier.Something like this in filetransfer should work. We already special case
s3://
URLs inFileTransfer
so we can use that to do the following instead of calling out to the S3 SDK:Now all the HTTP PUT/GET/POST/GET operations should work as expected.
Alternative solutions
Fix all the weird bugs with the current S3 substituter
Additional context
Checklist
Add 👍 to issues you find important.
@delroth wrote in #272 (comment):
'Interestingly', running lix & lix-hydra 2.93 against garage, I seem to see ~10 second delays as aws-sdk tries to contact 169.254.169.254 (apparently for IMDS), unless I set:
systemd.services.hydra-server.serviceConfig.Environment = [ "AWS_EC2_METADATA_DISABLED=true" ]
This feels non-obvious, and another reason to try to kill as much of aws-sdk as possible?
That's the part of the SDK that I don't want to kill. Did you configure your Lix to have credentials for garage correctly? If there are no env vars with AWS credentials then AWS will try contacting the IMDS for credentials. But only if it couldn't find credentials in the first place.
Note that both the nix daemon needs access to the env vars
That's the part of the SDK that I don't want to kill. As I rely on IMDS at work :) Did you configure your Lix to have credentials for garage correctly? If there are no env vars with AWS credentials then AWS will try contacting the IMDS for credentials. But only if it couldn't find credentials in the first place.
Yeah I saw that, but I figured if that was the only bit of the SDK left standing it might be easier to selectively disable it. Pretty sure the creds must be set correctly, as garage doesn't allow anonymous access? I don't remember seeing this problem ~12 months ago, I wonder if the SDK behaviour has changed in a relatively recent version.
Edit: yeah, creds are set in
/var/lib/hydra/queue-runner/.aws/credentials
,/root/.aws/credentials
(and/var/lib/hydra/.aws/credentials
, but I don't think that's needed.) Hmm, now wondering if the search order is different if creds are set with an env var? Edit2: no :(nix copy --to s3://...
very slow, hangs #945https://github.com/NixOS/nix/pull/13752