[Feature] Store batch proposal to disk on shutdown #3171

raychu86 · 2024-03-15T00:38:53Z

🚀 Feature

Store the current proposal to DB on shutdown.

Currently when a node shutdowns/reboots while a batch proposal is still in progress, the proposal is dropped. Once the node restarts, it will need to generate a new proposal for the same round. Other nodes who see this new proposal will think the node is being malicious because each validator can't have more than one proposal per round.

This is really only meaningful if the other nodes have not progressed passed this round between the shutdown and reboot.

The solution is to store the pending batch proposal to disk on shutdown and reload it on bootup.

vvp · 2024-03-15T15:01:17Z

Another, more resilient but slower solution is to persist the pending batch proposal to disk before it is broadcasted (we cannot guarantee a clean shutdown, namely when being SIGKILLed from outside the process for various reasons).

raychu86 · 2024-03-15T15:59:59Z

You'd want to also store the signatures that you've received for that proposal, because peers won't resign a proposal they've already seen. So just storing a proposal before it is broadcasted is not enough, you'd likely have to do it every time you receive a new signature as well.

ljedrz · 2024-03-20T12:22:46Z

Happy to tackle this one, just a few questions:

could we persist the proposal to a file instead? it's probably better suited for this use case
what should we do in case of a hard/unclean shutdown, then?

raychu86 · 2024-03-29T22:54:50Z

@ljedrz

Preferably the proposal goes to the same DB, in case the validator needs to migrate machines or directories. So a file is not ideal. We could store it as a blob in rocksDB.
I believe hard/unclean shutdowns could cause DB corruptions, so i believe this sufficient enough to be best effort.

ljedrz · 2024-04-01T10:52:09Z

Preferably the proposal goes to the same DB, in case the validator needs to migrate machines or directories. So a file is not ideal. We could store it as a blob in rocksDB.

I've taken an initial stab at it, and I have some issues with this approach:

a single-entry blob is somewhat odd in our DataMap-first setup; while not impossible to implement that way, I'd consider it "clunky"
trying to use a Proposal in the storage-service crate introduces a cyclic package dependency
if a validator migrates machines or directories, it would also migrate their ~/.aleo folder, so any file stored there wouldn't get lost; the storage itself is kept there after all

I believe hard/unclean shutdowns could cause DB corruptions, so i believe this sufficient enough to be best effort.

That shouldn't happen by design, because we're using atomic writes; we would have to deviate from that in order to cause a storage corruption.

Summing up: while it is possible to go around the cyclic dependency, it would either require a refactoring effort across a few crates, or a conversion between 2 different Proposal objects. I would still recommend using a single file instead.

Your call @raychu86.

raychu86 · 2024-04-01T16:22:18Z

The DataMap approach is still preferred for a unified storage.

We used to use the Blob (Vec<u8>) approach when storing rejected transactions - AleoHQ/snarkVM@fa723a6#diff-fd03badfe6a0b2934f728aec159249acfe3d2f4218f062f88edbb4191d28942cL87.

ljedrz · 2024-04-01T16:48:37Z

Ah yes, the value is not an issue, it's more the key that is not really meaningful for a single blob, and while it can be worked around, my only ideas are either things that would be weird at first glance (e.g. using the unit () as the key, or introducing a dedicated placeholder object); the most structured approach would be to introduce a single-entry object on the level of the DataMap (e.g. DataEntry) to handle cases like this.

howardwu · 2024-04-02T16:18:35Z

To share additional context on this topic:

For context, during the ZKSecurity audit of the BFT, the priority was safety over liveness. When turning off a node, your mempool is cleared, and the batch proposal (in memory) is also cleared. However, the other validators remain online and retain a copy of your validator’s batch proposal. When your validator comes back online, its batch proposal will no longer match (for the same round), and the other validators will not sign. Again, this is to prioritize safety over liveness.

Now, there are 2 practical ways to resolve this:

Set a timer expiration on other validators` batch proposals. (tradeoff: liveness)
Have your validator store your batch when shutting down, and rebroadcast your saved batch proposal when you come back online. (tradeoff: staleness)

raychu86 added the feature New feature or request label Mar 15, 2024

raychu86 assigned joske, vvp, ljedrz and niklaslong Mar 15, 2024

raychu86 mentioned this issue Mar 29, 2024

[Bug] Expire or persist Batch Proposals #3197

Closed

ljedrz mentioned this issue Apr 2, 2024

Store batch proposal to disk on shutdown #3200

Closed

raychu86 mentioned this issue Apr 2, 2024

Introduce proposal expiration #3202

Merged

This was referenced Apr 30, 2024

[HackerOne-2452182] Store Proposal and SignedProposals to disk and remove proposal expiration #3239

Merged

[Optimize] Store pending certificates to proposal cache file #3247

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Feature] Store batch proposal to disk on shutdown #3171

[Feature] Store batch proposal to disk on shutdown #3171

raychu86 commented Mar 15, 2024

vvp commented Mar 15, 2024

raychu86 commented Mar 15, 2024

ljedrz commented Mar 20, 2024

raychu86 commented Mar 29, 2024

ljedrz commented Apr 1, 2024

raychu86 commented Apr 1, 2024

ljedrz commented Apr 1, 2024 •

edited

howardwu commented Apr 2, 2024

[Feature] Store batch proposal to disk on shutdown #3171

[Feature] Store batch proposal to disk on shutdown #3171

Comments

raychu86 commented Mar 15, 2024

🚀 Feature

vvp commented Mar 15, 2024

raychu86 commented Mar 15, 2024

ljedrz commented Mar 20, 2024

raychu86 commented Mar 29, 2024

ljedrz commented Apr 1, 2024

raychu86 commented Apr 1, 2024

ljedrz commented Apr 1, 2024 • edited

howardwu commented Apr 2, 2024

ljedrz commented Apr 1, 2024 •

edited