Need Guidance on Backing Up Running Database #65

rebaz94 · 2023-11-23T00:09:19Z

Hey there,

I wanted to start by saying a big thank you for your library—it's been a real game-changer for us! The speed it provides is just incredible.

I'd love to know the best way to backup the database while it's running. Can you share some guidance or tips on how we can ensure a proper backup process without disrupting the ongoing operations? we're wondering if it's possible to copy the database folder directly and expect everything to work seamlessly if we restore that folder onto another machine.

Thank you

EmmanuelOga · 2023-11-27T08:24:01Z

I was wondering the same thing, it seems to me just copying the data folder would be a bad idea if there was still some Pogreb process writing to them. It sounds like any writer process would have to halt operations until the copy is done... so one would call db.Sync(), stop writing to the DB until a copy is done, then resume operations.

The main caveat is that depending on the data size this could take a few seconds, even on a system with a quick SSD.

Thoughts?

rebaz94 · 2023-11-28T02:05:26Z

Indeed, the library should offer a simple method for backing up data. It's possible to copy the database while it's running, but during restoration, it might need to rebuild the index, which could be time-consuming.

As I use it as a caching database, before backing up, I create a new database in a separate directory, switch to the new one, then proceed to upload the file. Once completed, in my case, I upload to Cloud storage, and nearly 1GB takes about 15 seconds to upload each file of the database. After the upload, I refill the database and revert back to the original one. This method works seamlessly, but it required me to write nearly 600 lines of code to manage backup and restoration.

EmmanuelOga · 2023-11-30T00:58:36Z

Let me see if I get it right...

You create a db TMP and switch from ORIGINAL to handle writes, while you backup ORIGINAL? Do you still read from ORIGINAL while doing the backup? It sounds like during backup you need to lookup first in TMP then in ORIGINAL to handle reads...

And finally, you need to dump anything in TMP to ORIGINAL and get back to normal. Is that right?

rebaz94 · 2023-11-30T07:37:33Z

I set up a TMP db to take care of both reading and writing while backing up the ORIGINAL one. While the backup's happening since TMP is empty any reads have to hit up Redis or the MySQL database because the local data won't be there. Then whatever we grab from Redis or MySQL gets cached in TMP.

Once the backup's done, I'll make sure to transfer all the data from TMP back to the original table. That way we're back to our regular setup

EmmanuelOga · 2023-12-01T00:45:14Z

I think I get it.

A question is if iterator would be able to continue normally in the face of inserts, deletes and updates. If it can, then maybe it would be ok to backup like this, while CRUD ops are still going on:

backupDB, err := pogreb.Open("new/db/for/backup", nil)

existingDB.sync()
it := existingDB.Items()
for {
    key, val, err := it.Next()
    if err == pogreb.ErrIterationDone {
    	break
    }
    if err != nil { 
        log.Fatal(err)
    }
    backupDB.Set(key, val)
}

I also wonder if compaction would affect this somehow. Perhaps compaction needs to be paused while iterating? ... if the above snippet could work without needing to stop writes, perhaps is a better way to backup since the backup would end up fully compacted right away.

akrylysov · 2023-12-01T01:05:04Z

hi!

I wanted to start by saying a big thank you for your library—it's been a real game-changer for us! The speed it provides is just incredible.

Thank you!

copying the data folder would be a bad idea if there was still some Pogreb process writing to them. It sounds like any writer process would have to halt operations until the copy is done

You are correct, you can't just copy the database files and expect the copied database to work, unless nothing is writing to the database, while the files are copied.

If the database size is not large, forcing a recovery and rebuilding the index when the backup opened for the first time doesn't sound that terrible.

A question is if iterator would be able to continue normally in the face of inserts, deletes and updates

It's safe to insert, delete or run a compaction during iteration. The only drawback is that this backup method is going to take longer compared to just copying files.

Adding a proper backup mechanism that preserves the index might be tricky, let me think about this more. I may start with adding a backup method that requires rebuilding the index on first start from a backup, which is cheaper than iterating the entire database every time to make a backup. I assume creating a backup is a more frequent operation, than restoring from a backup.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Need Guidance on Backing Up Running Database #65

Need Guidance on Backing Up Running Database #65

rebaz94 commented Nov 23, 2023

EmmanuelOga commented Nov 27, 2023

rebaz94 commented Nov 28, 2023

EmmanuelOga commented Nov 30, 2023

rebaz94 commented Nov 30, 2023

EmmanuelOga commented Dec 1, 2023 •

edited

akrylysov commented Dec 1, 2023

Need Guidance on Backing Up Running Database #65

Need Guidance on Backing Up Running Database #65

Comments

rebaz94 commented Nov 23, 2023

EmmanuelOga commented Nov 27, 2023

rebaz94 commented Nov 28, 2023

EmmanuelOga commented Nov 30, 2023

rebaz94 commented Nov 30, 2023

EmmanuelOga commented Dec 1, 2023 • edited

akrylysov commented Dec 1, 2023

EmmanuelOga commented Dec 1, 2023 •

edited