New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
ZOOKEEPER-4643, ZOOKEEPER-4646 & ZOOKEEPER-4394: Committed txns lost / NullPointerException in Learner.syncWithLeader() #2152
base: master
Are you sure you want to change the base?
Conversation
dad7545
to
ca51c7c
Compare
When the follower processes the NEWLEADER message, it only persists and processes the txns that the leader asks it to COMMIT. This guarantees no loss of committed txns because the follower syncs the leader's committed history before replying NEWLEADER ack (ZOOKEEPER-4646). Besides, this fix removes the redundant acks for the PROPOSALs of committed txns in SYNCHRONIZATION phase. In fact, there is no need to reply PROPOSAL acks for these committed txns. Because these txns have been committed by the leader, their PROPOSAL acks are redundant for the leader (See Leader#processAck). |
#2111 can avoid ZOOKEEPER-4643, but it cannot fix other issues like ZOOKEEPER-4394: the NullPointerException problem when the follower receives COMMIT after replying NEWLEADER ack in syncWithLeader(). In #1848, when receiving NEWLEADER, the follower only persists and processes the packets according to "packetsCommitted". It still keeps the outstanding proposals in "packetsNotCommitted" to avoid NullPointerException when receiving COMMIT packet(s) right after replying NEWLEADER ack (ZOOKEEPER-4394). Take #2111 and #1848 into consideration, this fix avoids a group of issues in SYNCHRONIZATION, including ZOOKEEPER-4643, ZOOKEEPER-4646, ZOOKEEPER-4685, ZOOKEEPER-4394 and the problem of flaky tests ZOOKEEPER-3023. |
Add a test case: |
One curiosity:
Who is 'we' ? |
c600496
to
acb4405
Compare
ZOOKEEPER-4394: NullPointerException when follower receives COMMIT after replying NEWLEADER ack ZOOKEEPER-4643: Committed txns lost when follower crashes after updating currentEpoch ZOOKEEPER-4646: Committed txns lost when follower crashes after replying NEWLEADER ack ZOOKEEPER-4685: Leader shutdown due to follower replies PROPOSAL ack before NEWLEADER ack in Synchronization phase ZOOKEEPER-3023: Flaky tests: Zab1_0Test#testNormalFollowerRunWithDiff
acb4405
to
cdd13ca
Compare
We are a research team dedicated to the verification of distributed systems using TLA+. Based on ZooKeeper's existing specification, we refine the actions according to the code implementation to explore the possible interleaving of multi-node and multi-threading events with failures. Then we check the specification to see whether it violates some invariants (derived from Zab paper and code) with the model checker. For example, when we specify Learner.syncWithLeader(..) in version 3.9.1, the model checking found the issue traces When specifying Learner.syncWithLeader(..) in version 3.9.2 (fixed by #2111), the issue traces of ZOOKEEPER-4394 and ZOOKEEPER-3023 can be detected. More details of the specifications can be found in this repo.
We verify this fix with the TLA+ specification zk_pr_2152. Some details in this specification: according to the fix in this PR, the logic when a follower receives NEWLEADER is specified into three actions:
The verification statistics are provided here. No violation is found during the checking with various configurations. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
+1, I have left some comments though.
Sorry for late. It has been a while since #1848(2 years) and thought on #1925(nearly 1 year) for me to bring back.
@li4wang @yisong-yue @jonmv @hanm Sorry for ping you, but would you mind take a look for this ?
/* | ||
* @see https://github.com/apache/zookeeper/pull/1848 | ||
* Persist and process the committed txns in "packetsNotCommitted" | ||
* according to "packetsCommitted", which have been committed by |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The comment is somewhat misunderstanding. The key is to log these committed ones, they are considered committed before election by the paper. All the reason we touch packetsNotCommitted
here is to make sure it is not logRequest
again in broadcast
phase. I think it might be better to rename packetsNotCommitted
to packetsNotLogged
as @jeffrey-xiao did in #1930. "log" is a disk operation, "commit" is an agreement. What we want here should be "log committed txns agreed in election".
Coming into the implementation, new proposals could still be committed before NEWLEADER
since LearnerHandler
does not issue NEWLEADER
right after these committed txns. But it does not harm us here as we are potentially to persist more but not less and new leader expect no ack
for committed ones.
* | ||
* @see https://issues.apache.org/jira/browse/ZOOKEEPER-4394 | ||
* Keep the outstanding proposals in "packetsNotCommitted" to avoid | ||
* NullPointerException when the follower receives COMMIT packet(s) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Given my comments above, then we should not clear packetsNotCommitted
apparently. All txns not in packetsCommitted
are proposals from new election.
long start = System.currentTimeMillis(); | ||
while (createSessionZxid != f.fzk.getLastProcessedZxid() | ||
&& (System.currentTimeMillis() - start) < 50) { | ||
Thread.sleep(1); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
When specifying Learner.syncWithLeader(..) in version 3.9.2 (fixed by #2111), the issue traces of ZOOKEEPER-4394 and ZOOKEEPER-3023 can be detected.
I did not see ZOOKEEPER-3023 after #2111. But if you are verifying this using TLA, this is doomed to failure. I am +1 to revert this to pre ZOOKEEPER-2678.
|
||
qp.setType(Leader.NEWLEADER); | ||
qp.setZxid(ZxidUtils.makeZxid(1, 0)); | ||
oa.writeRecord(qp, null); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
qp.data
should be cleared.
// Leader sends an outstanding proposal | ||
long proposalZxid = ZxidUtils.makeZxid(1, 1001); | ||
proposeSetData(qp, proposalZxid, "data2", 2); | ||
oa.writeRecord(qp, null); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it is what ZOOKEEPER-4394 tried to report. New proposals are issued before NEWLEADER
. This is the gap between paper and implementation.
// The outstanding proposal has not been persisted yet | ||
ZKDatabase zkDb2 = new ZKDatabase(new FileTxnSnapLog(logDir, snapDir)); | ||
long lastZxid = zkDb2.loadDataBase(); | ||
assertEquals("data1", new String(zkDb2.getData("/foo", stat, null))); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This seems an extra enforce to the paper. But given that we are driving the test step by step and we are testing implementation, I am +1 on this. By the paper, we should not have this problem and assertions should still hold anyway as there are no new proposals from new leader before NEWLEADER
.
…hLeader(..) to make it more understandable
Hi, this is great! As luck would have it, I had to rebase my fork onto 3.9.2 a few weeks ago, and due to conflicts with #2111 I already reviewed that commit. My conclusion then was that the NPE this PR fixes still remained an issue, and I think there are additional fixes in #1925 related to shutdown of the ZK database/server, that would also lead to (mostly harmless) duplicate series in the transaction log. I've dedicated some time for reviewing this PR, and will also make sure to mint a fresh PR with any outstanding fixes from #1925, in the hope that I may finally close that fork. |
Hi, @kezhuw. thanks a lot for your careful review! I've updated the code fix in this pr. Could u please take some time to check the new commit and see if there is anything else that needs improvement? Thanks! |
Hi @eolivelli. Sorry for bother you, but would you mind take a look at this pr and consider merging it? This pr fixes a bunch of issues in follower's syncWithLeader() that stay unresolved for a long time. Besides, it can drive the following fixes like #2154. |
This fix aims to fix a bunch of issues in follower's syncWithLeader().
e.g.,
ZOOKEEPER-4394: NullPointerException when follower receives COMMIT after replying NEWLEADER ack in syncWithLeader().
Explanation: In syncWithLeader(), a follower may receive the COMMIT message after replying NEWLEADER ack. e.g., a follower receives the following messages in order:
DIFF -> PROPOSAL -> COMMIT -> PROPOSAL -> NEWLEADER -> COMMIT -> UPTODATE
NullPointerException will occur if the corresponding PROPOSAL has been processed and removed from the "packetsNotCommitted" during the follower process NEWLEADER.
To fix this, when the follower receives NEWLEADER, it does the following things in order:
After the follower replies UPTODATE ack, it persist the txns in "packetsNotCommitted" and commit the txns in "packetsCommitted" asynchronously.
Besides, this fixes some other issues in Learner.syncWithLeader():
ZOOKEEPER-3023: Flaky test:
Zab1_0Test#testNormalFollowerRunWithDiff
ZOOKEEPER-4643: Committed txns lost when follower crashes after updating currentEpoch
ZOOKEEPER-4646: Committed txns lost when follower crashes after replying NEWLEADER ack
ZOOKEEPER-4685: Leader shutdown due to follower replies PROPOSAL ack before NEWLEADER ack in Synchronization phase
We have leveraged the TLA+ specifications of ZooKeeper and verified the correctness of this fix.