Backups #42

Open
opened 2024-07-09 01:22:19 +00:00 by raito · 6 comments
Owner

It would be great to have a Bagel backup NixOS module: bagel.backups.postgresDatabases = [ "hydra" ]; and let it collect everything and send it to multiple backup locations, etc.

I don't know whether we want to use filesystem-level features to ship snapshots somewhere, I'm not very convinced, but note that all my provided VMs can ship ZFS snapshots of the whole VM's dataset anywhere, we can replicate this design if we use VMs for the final target for our VMs.

For now, let's start somewhere anyway:

  • backup Hydra DB "efficiently" (-Fc or similar)
  • backup various important services databases
  • set up a target location (can be delroth's S3 or another VM for me).
It would be great to have a Bagel backup NixOS module: `bagel.backups.postgresDatabases = [ "hydra" ];` and let it collect everything and send it to multiple backup locations, etc. I don't know whether we want to use filesystem-level features to ship snapshots somewhere, I'm not very convinced, but note that all my provided VMs can ship ZFS snapshots of the whole VM's dataset anywhere, we can replicate this design if we use VMs for the final target for our VMs. For now, let's start somewhere anyway: - backup Hydra DB "efficiently" (`-Fc` or similar) - backup various important services databases - set up a target location (can be delroth's S3 or another VM for me).

As an ext4 enjoyer myself I use a file system only approach to my (database) backups based on https://www.postgresql.org/docs/current/continuous-archiving.html using restic.

The gist is that you configure postgres to copy away WAL archives while your backup is running

  • tell postgres to start a backup (that writes an initial WAL entry with the currently uncommitted state)
  • run a restic job to change-detect and backup the postgres state directory
  • stop the database backup (this will again write uncommitted state)
  • run a second restic job to backup the WAL archives created in the process
  • remove the WAL archives

This should be pretty much ready to go with a bit of elbow grease, copying from my personal infra. That is a bit over engineered however (to be pretty much permission less and driven by some rust as bash didn't cut it for keeping psql connections open), I can simplify if thats wished for.

Some goals when designing this initally were keeping writes to a minimum, avoid locking the database, always backing up consistent database snapshots and being able to leverage restic's builtin fs change detection.

Feel free to rate my setup :D

As an ext4 enjoyer myself I use a file system only approach to my (database) backups based on https://www.postgresql.org/docs/current/continuous-archiving.html using restic. The gist is that you configure postgres to copy away WAL archives while your backup is running - tell postgres to start a backup (that writes an initial WAL entry with the currently uncommitted state) - run a restic job to change-detect and backup the postgres state directory - stop the database backup (this will again write uncommitted state) - run a second restic job to backup the WAL archives created in the process - remove the WAL archives This should be pretty much ready to go with a bit of elbow grease, copying from my personal infra. That is a bit over engineered however (to be pretty much permission less and driven by some rust as bash didn't cut it for keeping psql connections open), I can simplify if thats wished for. Some goals when designing this initally were keeping writes to a minimum, avoid locking the database, always backing up consistent database snapshots and being able to leverage restic's builtin fs change detection. Feel free to rate my setup :D
Author
Owner

The only problem I see with this sort of backup is the disk space overhead? How large are backups with this and how do they compare to pg_backup -Fc? or pg_basebackup -Ft?

The only problem I see with this sort of backup is the disk space overhead? How large are backups with this and how do they compare to pg_backup -Fc? or pg_basebackup -Ft?

Thats a bit complicated to ask, I'd have to compare the multiple variants and their delta size changes between backups for a while.

My database is currently around 23.3GB on disk for which restic reports a total size of 6.6GB stored over 58.3k blobs.

The previous snapshot was 22.9GB (6.5GB with 57.4k blobs on disk) and compared to the most recent snapshot ~13k blobs changed (Files: 71 new, 2340 changed, 19632 unmodified), which amounts to a 5.9GB delta uncompressed and 1.762 GiB stored.

What this tells me is that the data im backing up is somewhat stable, the changes are about 1/4 which is reasonable given that I run 3 Synapses and 5 Mastodons. I don't know exactly how the custom format behaves here but I could imagine it being less stable for small changes.

Additionally restic reports a compression ratio of 3.5x with 71.4% space saving for both my snapshots. For emily's backups, which use pg_dumpall with SQL output that's 2.9x and 65.9%.

Thats a bit complicated to ask, I'd have to compare the multiple variants and their delta size changes between backups for a while. My database is currently around 23.3GB on disk for which restic reports a total size of 6.6GB stored over 58.3k blobs. The previous snapshot was 22.9GB (6.5GB with 57.4k blobs on disk) and compared to the most recent snapshot ~13k blobs changed (Files: 71 new, 2340 changed, 19632 unmodified), which amounts to a 5.9GB delta uncompressed and 1.762 GiB stored. What this tells me is that the data im backing up is somewhat stable, the changes are about 1/4 which is reasonable given that I run 3 Synapses and 5 Mastodons. I don't know exactly how the custom format behaves here but I could imagine it being less stable for small changes. Additionally restic reports a compression ratio of 3.5x with 71.4% space saving for both my snapshots. For emily's backups, which use pg_dumpall with SQL output that's 2.9x and 65.9%.
Author
Owner

Yes, but you didn't compare it to pg_backup -Fc, did you? My point is that Hydra(.nixos.org) is 1TB on disk but 60-70GB after pg_backup -Fc. So the question is whether when we will grow up a large database, and we do snapshots whether are we going to waste disk space when we could do slower backups and have better disk space consumption.

Yes, but you didn't compare it to pg_backup -Fc, did you? My point is that Hydra(.nixos.org) is 1TB on disk but 60-70GB after pg_backup -Fc. So the question is whether when we will grow up a large database, and we do snapshots whether are we going to waste disk space when we could do slower backups and have better disk space consumption.

Not over time, no

pg_dump -Fc over all DBs in my postgres is 3.8G so that compares to 6.6G compressed restic (the restic backup doesn't include directories listed as not necessary in https://www.postgresql.org/docs/current/continuous-archiving.html#BACKUP-LOWLEVEL-BASE-BACKUP-DATA)

Not over time, no pg_dump -Fc over all DBs in my postgres is 3.8G so that compares to 6.6G compressed restic (the restic backup doesn't include directories listed as not necessary in https://www.postgresql.org/docs/current/continuous-archiving.html#BACKUP-LOWLEVEL-BASE-BACKUP-DATA)
Owner

Postgres on h.n.o. is currently 302GB on-disk.

  • pg_dump -Fc hydra takes 4-5 hours and comes in at 81GB
  • pg_dump -Ft hydra takes 75 mins, but not compressed and as such 314GB
  • TODO: pg_basebackup, requires a replication config
Postgres on h.n.o. is currently 302GB on-disk. - `pg_dump -Fc hydra` takes 4-5 hours and comes in at 81GB - `pg_dump -Ft hydra` takes 75 mins, but not compressed and as such 314GB - TODO: `pg_basebackup`, requires a replication config
Sign in to join this conversation.
No labels
No milestone
No project
No assignees
3 participants
Notifications
Due date
The due date is invalid or out of range. Please use the format "yyyy-mm-dd".

No due date set.

Dependencies

No dependencies set.

Reference: the-distro/infra#42
No description provided.