The problem #
I had an old fly.io postgres database that I could no longer connect to, for whatever reason (too old maybe), and data inside that I wanted to rescue.
First, I tried to diagnose the database, and connect in many different ways to see if I could get at the data:
$ fly apps list
NAME OWNER STATUS PLATFORM LATEST DEPLOY
mydb personal deployed machines
shy-field-1671 personal deployed machines
$ fly postgres connect -a mydb
Error: no active leader found
$ fly postgres config show -a mydb
Error: no active leader found
$ fly postgres failover -a mydb
Error: failover is not available for standalone postgres
$ fly postgres restart -a mydb
Error: no active leader found
$ fly status --app mydb
ID STATE ROLE REGION CHECKS IMAGE CREATED UPDATED
148e271cdd0d89 started unknown ams 2 total, 2 passing flyio/postgres-standalone:14.1 (v0.0.7) 2023-01-18T20:24:44Z 2023-08-13T14:46:37Z
❯ fly machine status 148e271cdd0d89 --app mydb
Machine ID: 148e271cdd0d89
Instance ID: 01H7QKQNZD4ZBGHP8PSFPQYJYH
State: started
**VM**
ID = 148e271cdd0d89
Instance ID = 01H7QKQNZD4ZBGHP8PSFPQYJYH
State = started
Image = flyio/postgres-standalone:14.1 (v0.0.7)
Name = young-river-6979
Private IP = fdaa:0:2072:a7b:23c8:2948:4c51:2
Region = ams
Process Group = app
CPU Kind = shared
vCPUs = 1
Memory = 256
Created = 2023-01-18T20:24:44Z
Updated = 2023-08-13T14:46:37Z
Entrypoint =
Command =
Volume = vol_ke628r6pepjvwmnp
**Event Logs**
STATE EVENT SOURCE TIMESTAMP INFO
started start flyd 2023-08-13T15:46:37.636+01:00
starting start user 2023-08-13T15:46:37.225+01:00
stopped exit flyd 2023-08-13T15:34:21.321+01:00 exit_code=0,oom_killed=false,requested_stop=true
stopping stop user 2023-08-13T15:34:19.823+01:00
started start flyd 2023-08-13T15:10:29.282+01:00
$ fly checks list --app mydb
Health Checks for mydb
NAME | STATUS | MACHINE | LAST UPDATED | OUTPUT
-------*---------*----------------*------------------*--------------------------------------------------------------------------
pg | passing | 148e271cdd0d89 | Jan 2 2024 19:39 | [✓] connections: 7 used, 3 reserved, 100 max (3.39ms)
-------*---------*----------------*------------------*--------------------------------------------------------------------------
vm | passing | 148e271cdd0d89 | Jan 2 2024 19:39 | [✓] checkDisk: 9.15 GB (93.9%) free space on /data/ (40.44µs)
| | | | [✓] checkLoad: load averages: 0.00 0.00 0.00 (62.78µs)
| | | | [✓] memory: system spent 0s of the last 60s waiting on memory (31.44µs)
| | | | [✓] cpu: system spent 0s of the last 60s waiting on cpu (21.98µs)
| | | | [✓] io: system spent 0s of the last 60s waiting on io (21.26µs)
-------*---------*----------------*------------------*--------------------------------------------------------------------------
So it LOOKS healthy... but this no active leader
message is really bad. Yikes.
Could I try creating a new postgresdb from a snapshot?
❯ fly volumes snapshots list vol_ke628r6pepjvwmnp
**Snapshots**
ID STATUS SIZE CREATED AT
vs_ejDYvgm1z0GXYsL created 28480078 14 hours ago
vs_Yj5YwKj0mz1YRhzX created 28510195 1 day ago
vs_lqM91DxPBnpl5uXo created 28514186 2 days ago
vs_kZoR7j7XjMbJBh7m created 28503069 3 days ago
vs_8x4zNOZ59lalXSZX created 28496591 4 days ago
vs_YVyoj8nbayzgJS6N created 28505782 5 days ago
❯ fly postgres create --snapshot-id vs_ejDYvgm1z0GXYsL
**?** **Choose an app name (leave blank to generate one):** mydb-restored-2024
automatically selected personal organization: Aaron Kelly
Some regions require a Launch plan or higher (bom, fra).
See https://fly.io/plans to set up a plan.
**?** **Select region:** Amsterdam, Netherlands (ams)
**?** **Select configuration:** Development - Single node, 1x shared CPU, 256MB RAM, 1GB disk
**?** **Scale single node pg to zero after one hour?** Yes
Creating postgres cluster in organization personal
Creating app...
Setting secrets on app mydb-restored-2024...
Restoring 1 of 1 machines with image flyio/postgres-standalone:14.1@sha256:ca27c53b81cae713e67d7ced87a4289961db4a81e382b09aaf42ea53032791eb
Error: failed to create volume: failed to create volume: Restore volume size must be at least 10GB (Request ID: 01HN5BS7FC83JWTB0PQ92WX2TK-ams)
No joy.
How about forking a volume?
$ fly postgres create --fork-from mydb
**?** **Choose an app name (leave blank to generate one):** mydb-restored-2024
automatically selected personal organization: Aaron Kelly
Error: Failed to resolve the volume associated with the primary instance. See `fly pg create --help` for more information
$ fly volumes list --app mydb
ID STATE NAME SIZE REGION ZONE ENCRYPTED ATTACHED VM CREATED AT
vol_ke628r6pepjvwmnp created pg_data 10GB ams ac80 false 148e271cdd0d89 2 years ago
vol_d7xkrkynxpo4w2q9 created pg_data 10GB ord bddf false 5 months ago
$ fly postgres create --fork-from mydb:vol_d7xkrkynxpo4w2q9
**?** **Choose an app name (leave blank to generate one):** mydb-restored-2024
automatically selected personal organization: Aaron Kelly
**?** **Select VM size:** shared-cpu-1x - CPU Kind: Shared, vCPUs: 1 Memory: 256MB
Creating postgres cluster in organization personal
Creating app...
Setting secrets on app mydb-restored-2024...
Provisioning 1 of 1 machines with image flyio/postgres:14.6@sha256:3c25db96357a78e827ca7dbbf4963089bdfcd48fb05a40cd82cfff5c61fb7710
Waiting for machine to start...
Machine 784e662b66d068 is created
==> Monitoring health checks
Waiting for **784e662b66d068** to become healthy (started, 3/3)
Postgres cluster mydb-restored-2024 created
Username: postgres
Password: UsvWa7FHbUhuuPn
Hostname: mydb-restored-2024.internal
Proxy port: 5432
Postgres port: 5433
Connection string: postgres://postgres:UsvWa7FHbUhuuPn@mydb-restored-2024.internal:5432
_Save your credentials in a secure place -- you won't be able to see them again!_
**Connect to postgres**
Any app within the Aaron Kelly organization can connect to this Postgres using the above connection string
Now that you've set up Postgres, here's what you need to understand: https://fly.io/docs/postgres/getting-started/what-you-should-know/
$ fly pg connect --app mydb-restored-2024
Connecting to fdaa:0:2072:a7b:111:952:a3c7:2... complete
psql (14.6 (Debian 14.6-1.pgdg110+1))
Type "help" for help.
postgres=# \dt
Did not find any relations.
Ehhh not good.
Looking at the logs on https://fly.io/apps/mydb-restored-2024/monitoring :
2024-01-27T11:51:13.904 app[784e90dc440568] ams [info] exporter | INFO[0088] Established new database connection to "fdaa:0:2072:a7b:13b:1bf2:88ea:2:5432". source="postgres_exporter.go:970"
2024-01-27T11:51:13.910 app[784e90dc440568] ams [info] flypg | 2024-01-27 11:51:13.908 UTC [537] FATAL: password authentication failed for user "flypgadmin"
2024-01-27T11:51:13.910 app[784e90dc440568] ams [info] flypg | 2024-01-27 11:51:13.908 UTC [537] DETAIL: Connection matched pg_hba.conf line 3: "host all all ::0/0 md5"
2024-01-27T11:51:14.445 app[784e90dc440568] ams [info] flypg | 2024-01-27 11:51:14.444 UTC [539] FATAL: password authentication failed for user "postgres"
2024-01-27T11:51:14.445 app[784e90dc440568] ams [info] flypg | 2024-01-27 11:51:14.444 UTC [539] DETAIL: Connection matched pg_hba.conf line 3: "host all all ::0/0 md5"
2024-01-27T11:51:14.445 app[784e90dc440568] ams [info] Failed to create required users: failed to connect to `host=fdaa:0:2072:a7b:13b:1bf2:88ea:2 user=postgres database=postgres`: failed SASL auth (FATAL: password authentication failed for user "postgres" (SQLSTATE 28P01))
Very sad
The solution #
Oh well, so I suppose I'm now in data rescue mode.
Could I just ssh to the volume and then create a dump of the database?
Lets see if I can connect, and see my data:
fly ssh console --app mydb
Connecting to fdaa:0:2072:a7b:23c8:2948:4c51:2... complete
root@148e271cdd0d89:/# su postgres
postgres@148e271cdd0d89:/$ psql
psql (14.1 (Debian 14.1-1.pgdg110+1))
Type "help" for help.
postgres=# \d
List of relations
Schema | Name | Type | Owner
--------+-------------------+----------+----------
public | appliances | table | postgres
public | appliances_id_seq | sequence | postgres
public | ketones | table | postgres
public | ketones_id_seq | sequence | postgres
public | weight | table | postgres
public | weight_id_seq | sequence | postgres
(6 rows)
There it is! Now lets exit the psql
program and take a dump of the db...
postgres-# \q
postgres@148e271cdd0d89:/$ cd
postgres@148e271cdd0d89:~$ pwd
/var/lib/postgresql
postgres@148e271cdd0d89:~$ pg_dump > mydb.sql
postgres@148e271cdd0d89:~$ cat mydb.sql
--
-- PostgreSQL database dump
--
-- Dumped from database version 14.1 (Debian 14.1-1.pgdg110+1)
-- Dumped by pg_dump version 14.1 (Debian 14.1-1.pgdg110+1)
SET statement_timeout = 0;
SET lock_timeout = 0;
Awesome, now lets exit the container competely, and download it:
fly sftp get /var/lib/postgresql/mydb.sql --app mydb
5379 bytes written to mydb.sql
Using the example https://hub.docker.com/_/postgres/ as an example, I created the following docker-compose.yml :
# Use postgres/example user/password credentials
version: '3.1'
services:
db:
image: postgres
restart: always
environment:
POSTGRES_PASSWORD: example
volumes:
- ./backups:/mnt
adminer:
image: adminer
restart: always
ports:
- 8080:8080
I moved my database backup file to the new directory backups/mydb.sql
I then connected to the database and then attempted to run the import:
postgres@f64d15e57bc5:/$ pg_restore -U postgres -d postgres < /mnt/mydb.sql
pg_restore: error: input file appears to be a text format dump. Please use psql.
Oh. I would have thought the export defaults would have been ok. Well, after searching a bit, the import command is pretty simple:
postgres@f64d15e57bc5:/$ psql -d postgres -f /mnt/mydb.sql
SET
SET
set_config
------------
(1 row)
CREATE TABLE
ALTER TABLE
CREATE SEQUENCE
...
I checked the Adminer on http://localhost:8080/, logged in with postgres
and example
...
and I could see my data, great success!
My original, embarrassing, hacky solution #
My first attempt involved:
- ssh'ing to the fly postgres Machine
- creating a backup.tar of the
/data/postgres
folder - sftp'ing this backup.tar to a locally running postgres container
- writing over the local postgres data folder with the contents from the tar
- praying that it worked
I don't know why I didn't think of looking for postgres tools to help with data dumps, but at least I learned a lot about other stuff along the way.
What I did #
Lets try and take a backup of the postgres data folder:
$ fly ssh console --app mydb
Connecting to fdaa:0:2072:a7b:23c8:2948:4c51:2... complete
root@148e271cdd0d89:~# cd
root@148e271cdd0d89:~#
root@148e271cdd0d89:/# tar cvf mydb_backup.tar /data/postgres
tar: Removing leading `/' from member names
/data/postgres/
/data/postgres/pg_serial/
/data/postgres/pg_replslot/
/data/postgres/postgresql.conf
/data/postgres/pg_stat_tmp/
/data/postgres/pg_stat_tmp/db_0.stat
...
/data/postgres/postmaster.opts
/data/postgres/pg_hba.conf
$ fly sftp get /root/mydb_backup.tar --app mydb
44103680 bytes written to mydb_backup.tar
Using the example https://hub.docker.com/_/postgres/ as an example, I created a folder with the following docker-compose.yml inside:
# Use postgres/example user/password credentials
version: '3.1'
services:
db:
image: postgres
restart: always
environment:
POSTGRES_PASSWORD: example
volumes:
- ./database-backups/data/postgres:/var/lib/postgresql/data
adminer:
image: adminer
restart: always
ports:
- 8080:8080
Inside that folder I then created a folder called database-backups, and moved mydb-backup.tar into it.
Next I ran this command to start the containers, and checked the logs to make sure the startup was ok:
$ docker compose up -d
$ docker compose logs
adminer-1 | [Sat Jan 27 13:11:05 2024] PHP 7.4.33 Development Server (http://[::]:8080) started
db-1 | The files belonging to this database system will be owned by user "postgres".
db-1 | This user must also own the server process.
db-1 |
db-1 | The database cluster will be initialized with locale "en_US.utf8".
...
db-1 | 2024-01-27 13:11:08.526 UTC [1] LOG: database system is ready to accept connections
I also checked that Adminer was up at http://localhost:8080/, which it was. I was able to login with the default username postgres
and password example
. I didn't give a value for the database. I logged in fine and saw the default databases default, template0, template1
Happy with all that, I entered the postgres container:
$ docker exec -it --user postgres db bash
And checked the permissioning of the postgres data folder inside the container:
postgres@048cbcd015f6:/$ cd /var/lib/postgresql
postgres@048cbcd015f6:~$ ls -al
total 12
drwxr-xr-x 1 postgres postgres 4096 Jan 12 00:23 .
drwxr-xr-x 1 root root 4096 Jan 12 00:23 ..
drwx------ 19 postgres postgres 4096 Jan 27 13:53 data
postgres@0a3201ad1f64:/$ id
uid=999(postgres) gid=999(postgres) groups=999(postgres),101(ssl-cert)
I exited the container, and then then extracted my backup:
$ tar xvf mydb_backup.tar
x data/postgres/
x data/postgres/pg_serial/
x data/postgres/pg_replslot/
x data/postgres/postgresql.conf
...
I then chowned and chmodded backup dir to match what's inside the dir inside the container:
$ sudo chown -R 999:999 database-backups/data
$ sudo chmod -R 700 database-backups/data
I then brought all the services up with docker compose up -d
, and checked that the logs were ok with docker compose logs
I went back to Adminer, however, when trying to login, it gave me the message:
Unable to connect to PostgreSQL server: FATAL: password authentication failed for user "postgres"
I needed a way to connect without a password. So, I to the container as root, installed an editor, and and edited this file:
$ docker compose exec -it db bash
$ apt update && apt install vim
$ vim /var/lib/postgresql/data/pg_hba.conf
It had this contents:
local all postgres trust
host all all 0.0.0.0/0 md5
host all all ::0/0 md5
I replaced all the md5
with trust
:
local all postgres trust
host all all 0.0.0.0/0 md5
host all all ::0/0 md5
Restarted the database:
pg_util restart
And I was able to login with:
server: db
username: postgres
password: example
When I first attempted this method months ago, it was a success - I saw all my old data in the postgres
db.
However, repeating these steps recently was not successful.
So, this solution is bad and very hacky.