Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix(go-client): update config once replica server failed and forward to primary meta server if it was changed #1916

Open
wants to merge 8 commits into
base: master
Choose a base branch
from

Conversation

lengyuexuexuan
Copy link
Collaborator

What problem does this PR solve?

#1880
#1856

What is changed and how does it work?

As for #1856.
when go client is writing to one partition and the replica node core dump, go client will finish after timeout without updating the configuration. In this case, the go client only restart to solve the problem.

In this pr, the client would update conconfiguration of table automatically when someone replica core dump.
After testing, we found that the the replicaerror is "context.DeadlineExceeded" when the replica core dump.

case context.DeadlineExceeded:
confUpdate = true

Therefore, when client meets the errror, the go client will update configuration automatically.
Besides, this request will not retry. Because only in the case of timeout, the configuration will be automatically updated. If you try again before then, it will still fail. There is also the risk of infinite retries.
Therefore, it is better to directly return the request error to the user and let the user try again.

As for #1880
When the client sends an RPC message "RPC_CM_QUERY_PARTITION_CONFIG_BY_INDEX" to the meta server, if the meta server isn't primary, the response that forward to the primary meta server will return.

According to the above description, assuming that the client does not have a primary meta server configured, we can connect to the primary meta server in this way.

In this PR, we implement this function through the following steps.

  1. First parse the response, determine whether its errno is ERR_FORWARD_TO_OTHERS, and then parse it to get the primary meta server address.
    func (c *metaCall) getMetaServiceForwardAddress(resp metaResponse) *base.RPCAddress {
    rep, ok := resp.(*replication.QueryCfgResponse)
    if !ok || rep.GetErr().Errno != base.ERR_FORWARD_TO_OTHERS.String() {
    return nil
    } else if rep.GetPartitions() == nil || len(rep.GetPartitions()) == 0 {
    return nil
    } else {
    return rep.Partitions[0].Primary
    }
    }
  2. Secondly, determine whether the address is already in the client configuration. If it is already there, skip it directly. Otherwise, establish a connection and pull the configuration directly from the primary meta server.
    if forwardAddr == nil {
    return false
    }
    addr := forwardAddr.GetAddress()
    found := false
    for i := range c.metaIPAddrs {
    if addr == c.metaIPAddrs[i] {
    found = true
    break
    }
    }
    if !found {
    c.metaIPAddrs = append(c.metaIPAddrs, addr)
    c.metas = append(c.metas, &metaSession{
    NodeSession: newNodeSession(addr, NodeTypeMeta),
    logger: pegalog.GetLogger(),
    })
    curLeader = len(c.metas) - 1
    c.metas[curLeader].logger.Printf("add forward address %s as meta server", addr)
    }
    resp, err = c.callFunc(ctx, c.metas[curLeader])

It should be noted that the IP address and session do not have a one-to-one correspondence, because there may be situations where the IP address is unavailable.
This is why there is a priamry meta server configuration in the client, but the curllead cannot be used as the index of the metaIPAddrs array.

for i := range c.metaIPAddrs {
if addr == c.metaIPAddrs[i] {
found = true
break
}
}

Tests
  • Unit test
  • Manual test (add detailed scripts or steps below)
  1. Start onebox, and the primary meta server is not added to the go client configuration.
  2. The go client writes data to a certain partition and then kills the replica process.

…coredump.

Add one feature that the client would forward to the priamry when the metalist of client don't contain the primary.
Copy link
Member

@acelyc111 acelyc111 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the contribution!

go-client/session/meta_session_test.go Outdated Show resolved Hide resolved
// This a trick for testing. If metaCall issue to other meta, not only to the leader, this nil channel will cause panic.
call.backupCh = nil
metaResp, err := call.Run(context.Background())
assert.Nil(t, err)
assert.Equal(t, metaResp.GetErr().Errno, base.ERR_OK.String())
}
}

// This case mocks the case that the server primary meta is not in the client metalist.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The meta servers in the test are 0.0.0.0:3460{1..3}, which one is "not in the client metalist" ?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

When onebox starts, the primary meta server is randomized. Therefore, a loop is used, and only one meta server is passed to the go client each time. This ensures that redirection is required twice in the loop.

go-client/session/meta_call.go Outdated Show resolved Hide resolved
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants