What is Nostr?
Stefano Marinelli /
npub14ca…0nj8
2024-10-29 12:08:36

Stefano Marinelli on Nostr: Monitoring shouts at me: "This server is DOWN!" I immediately check - it doesn’t ...

Monitoring shouts at me: "This server is DOWN!"
I immediately check - it doesn’t respond to ping requests. I try to reboot it remotely - no luck.
I attempt to request a remote console; after more than 45 minutes, there’s still no reply.
I check the logs: the last ZFS send/receive based backup occurred just 23 minutes before the outage (it's an hourly backup).

I call the client to explain the situation: we can either wait or restore from a backup. They express a preference to get back to work after lunch (13:30).

I set up a VPS, install FreeBSD and some packages, then connect to the backup server:
zfs send -RLvw [mybckdataset]/bastille@lastSnap | pigz - | mbuffer -m512M | ssh destserver "pigz -d - | zfs receive -x canmount -x readonly zroot/bastille"

After a few minutes (50 GB later):
zfs load-key -r zroot/bastille (since they’re encrypted)
zfs mount -a
service bastille start

Everything's up and running. DNS record changed - disaster recovered. Time: 12:48.

I call the client and say, "Hey, you’re back up. Now we’ll wait for the original server to come back, and then we’ll resync the datasets."

The customer, with a witty remark that cleverly shows gratitude without being direct, replies, "Oh come on, and I was hoping to extend my lunch break! 😆"

FreeBSD, jails, and ZFS have, once again, done an excellent job.

Now, I can have my lunch.

#FreeBSD #ZFS #jails #RunBSD #IT #SysAdmin #DisasterRecovery
Author Public Key
npub14calwd6xg349ahf3nnhhyqem2w2e3gs66p7zctz2sna74u3tsddq7j0nj8