Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support smooth swithover in standalone mode by reusing MOVED #319

Open
soloestoy opened this issue Apr 15, 2024 · 13 comments · May be fixed by #325
Open

Support smooth swithover in standalone mode by reusing MOVED #319

soloestoy opened this issue Apr 15, 2024 · 13 comments · May be fixed by #325
Labels
major-decision-pending Needs decision by core team

Comments

@soloestoy
Copy link
Member

soloestoy commented Apr 15, 2024

Issue: redis/redis#12097
PR: redis/redis#12192

In many scenarios, we need to perform the action of switching. Take a rolling version upgrade as an example, suppose there are two nodes A and B, where A is the master and B is a replica of A. Under normal circumstances, users only access the master node A. Then, during a version upgrade, users would first upgrade the version of the replica node B, then switch to make B the master and turn A into B's replica. Simultaneously, user access also needs to switch from node A to node B, followed by an upgrade of node A to complete the entire version upgrade process:

image

However, during this process, the switch of user access is not entirely smooth. In fact, there are different reactions under cluster mode and standalone mode:

  1. In cluster mode, after the switch, when the replica is accessed, Redis will return -MOVED slot ip:port. In cluster mode, this is a routine result. The client will not treat it as an error returned to the user. Instead, the client will automatically redirect to the new master node B, as indicated by the ip:port in the result, thus achieving a smooth switch.
  2. In standalone mode, after the switch, accessing the replica will return the error -READONLY You can't write against a read only replica. The client will pass this error to the user, and a smooth switch cannot be achieved.

As we can see, the switching experience in cluster mode is clearly superior to that of standalone mode. Therefore, to improve the switching process in standalone mode, we can utilize the MOVED redirect mechanism. The first step would require the valkey server to use MOVED to replace READONLY in standalone mode, and then support for handling the MOVED return value needs to be implemented in the client ecosystem.

The latest discussion is we want to introduce a new reply REDIRECT, so it would not breaking anything to cluster.

More details, the slot returned by the replica in the MOVE can be -1, and this can also be applied to cluster mode. It serves to clearly inform the client that it has accessed a replica and should be redirected to the master, rather than having accessed the wrong shard requiring redirection.

@zuiderkwast
Copy link
Contributor

I think we need to improve this scenario. I think you @soloestoy can be our Chief of Standalone Mode. It seems like many other in the core team are more focused on Cluster.

The main concern is clients. Cluster clients already handle MOVED but the slot is in the range 0-16383. What happens if they see -1 here? Worst case they crash. Can we make a small investigation to make sure some of the most popular clients don't crash?

Then I think we can add it, but the default configuration can be off. We need to update the docs for clients.

@zuiderkwast zuiderkwast added the major-decision-pending Needs decision by core team label Apr 16, 2024
@daniel-house
Copy link
Member

I like the general idea here. This could be a big help for applications that don't require clusters.

Are there any dumb clients still in use? (Dumb meaning that they won't even support MOVE.) Hmmm, I guess valkey-cli and hiredis both qualify as dumb clients. Am I right to assume that an application using a dumb client must provide its own methods for deciding how to redirect to the new master, B? How does such an application handle a hard crash of node A?

More details, the slot returned by the replica in the MOVE can be -1, and this can also be applied to cluster mode. It serves to clearly inform the client that it has accessed a replica and should be redirected to the master, rather than having accessed the wrong shard requiring redirection.

Do we need to do this part? Does the client need to know that this is "redirect from replica to master" and not "redirect to another shard"?

I can see that if we have a smart enough client, it can detect the -1, and redirect all future requests to the new master. That would be a good optimization and would apply to either standalone or cluster mode. But, is it anything more than an very good optimization for a rare event?

Initially, none of the clients will understand -1. Even if they don't crash, they won't work. It would be surprising if this crashed valkey-cli or hiredis. It might easily cause trouble for memteir_benchmark even if it didn't crash. (It might even be worse if it didn't crash, causing totally misleading results. I really love fail-fast.)

This feels like a breaking change. Perhaps the protocol could be extended to allow a client that understands the -1 to turn this on just for the client's connection.

All of this makes me think it would be nice for cluster V2 to be designed so that standalone was just a special case. A cluster with one node. Then standalone could be easily scaled up, and a cluster could be scaled down to standalone.

@zuiderkwast
Copy link
Contributor

@daniel-house Only cluster clients recognize -MOVED redirects. Far from all clients are cluster clients.

@PingXie
Copy link
Member

PingXie commented Apr 17, 2024

This is a very fine idea but the effectiveness of this feature apparently depends on clients being collaborative. Given the tricky client ecosystem situation, IMO, until we have a mature valkey client ecosystem, which is a much bigger fish to fry, few users will be able to take advantage of this feature. Btw, I agree with @daniel-house that this will be a breaking change so it shouldn't be enabled by default.

@soloestoy @CharlesChen888 what are your thoughts on the client support?

Alternatively, why wouldn't simply dropping the connection from the replica side work? It is not clear to me how the network is set up in this scenario. for instance, is there an ILB in front of the two nodes? or the two nodes are exposed directly to the client? if the two nodes are exposed to the client directly, how does the client tell who the primary is in the first place? I would assume somewhere in your system there is a way to help the client figure out who the primary is? if so and if the client has been set up to retry the request on errors, dropping the connections would be more compatible with the common error handling logic.

PS, cluster V2 also depends on us having a mature valkey-native client ecosystem.

@soloestoy
Copy link
Member Author

First, let's shift our focus away from -1, as it's not the main point. We can solely use -1 in standalone mode, while the cluster continues to maintain access to the slot : )

Indeed, this is an improvement that requires joint effort from both the client and server. In the typical pathway for applying new features, it is first implemented on the server side, followed by client-side support. Things are indeed moving in this direction, and in fact, it has already been accepted by the Redis core team. However, it is regrettable that Redis modified the protocol before the merge could happen.

On the client side, I believe we need to have our own Valkey client to push the development of the ecosystem. Major languages should have one or more clients. I think we can start with Java, either by forking Jedis or writing a new one, we can name it Jackey (Java client for Valkey), @yangbodong22011 is very interested in being the maintainer for Jackey. We can initiate a vote to decide.

Alternatively, why wouldn't simply dropping the connection from the replica side work?

As I know, dropping connection is not a smooth way. Imagine that after sending a command, the connection breaks before receiving the response, it's an error to clients. If the replica drops connection upon receiving a request, and then the client will attempt to reconnect (cannot redirect since no MOVED received), this would be an infinite loop.

for instance, is there an ILB in front of the two nodes? or the two nodes are exposed directly to the client? if the two nodes are exposed to the client directly, how does the client tell who the primary is in the first place? I would assume somewhere in your system there is a way to help the client figure out who the primary is? if so and if the client has been set up to retry the request on errors, dropping the connections would be more compatible with the common error handling logic.

I can somewhat grasp what you are saying, but I don't fully understand it. Let me explain based on my understanding:

  1. nodes are exposed to clients directly

    In such a scenario, when a switch occurs and the client attempts to access the replica, it will encounter an error. This is the situation I am looking to improve by helping the client to redirect to the master node.

  2. nodes are exposed to clients directly with config server like sentinel

    I'm not very familiar with Sentinel, but from what I know, there are two methods by which Sentinel can notify clients to access the new master.

    • The first method is where clients subscribe to switch events and, upon receiving a switch message, redirect traffic to the new master. However, there can be a delay in the message, which means traffic may still go to the replica and result in errors during that period.
    • The second method is when the client detects that it has accessed a replica, it proactively queries Sentinel for the new master's address. This requires an additional interaction with Sentinel. Additionally, the updates to Sentinel's configuration and the role switch of the nodes are not atomic operations, which means that errors may still possibly occur.

    Converging redirection within the node itself through MOVED can also optimize these scenarios.

  3. nodes behind LB

    I think you are referring to the scenario where a LB (Load Balancer) is bound to a VIP (Virtual IP) that ensures it always points to the master, thus clients wouldn't need to worry about the switch-over issue. However, there is a problem concerning how to handle the existing connections on the original node after the switch-over. If connections are immediately terminated, then requests that have not yet received a reply will result in errors. If the connections are kept alive, then a MOVED response (even if it's for the same VIP) would still be necessary to inform the client to redirect.

@zuiderkwast
Copy link
Contributor

zuiderkwast commented Apr 17, 2024

Although it is a breaking change, it believe it is mostly safe, because a client would get an error and now it would get a redirect, which also a kind of error reply. Most clients just return this to the user code.

The only risk I can see if there are any clients that behave like this:

  • A client connects to a node and assumes it is a standalone node, but it automatically switches to cluster mode when it sees a MOVED redirect. It is using MOVED to do autodetect of cluster mode.
  • When it sees a MOVED redirect, it assumes that the node is in cluster mode and does one or both of the following:
    • It tries to fetch slot mapping using CLUSTER SLOTS or SHARDS or NODES.
    • It assumes the slot number (-1) is a valid slot number and tries to use it like an array index or similar, and crashes the client.

I don't know if any clients behave like this, but it is a possible behaviour and therefore a risk.

We can completely avoid this by using a new, different redirect reply, such as -REDIRECT host:port, which has not been used so far. For a client that doesn't handle this, it is just treated as an error reply.

@soloestoy I don't remember exactly, but I think this was already discussed before. But maybe we didn't discuss why it is more safe?

@daniel-house
Copy link
Member

@daniel-house Only cluster clients recognize -MOVED redirects. Far from all clients are cluster clients.

I have heard of smart clients but never cluster clients. I am assuming that cluster=smart and smart clients are ones that understand ASK/ASKING and re-issue the query in the appropriate way to the shard indicated by the MOVED response. If that is not so, please correct me.

Please forgive me, but I am incredibly picky which it comes to jargon and definitions and stuff like that. I could have just assumed
that cluster=smart, but doing so is against my fundamental nature.

@zuiderkwast
Copy link
Contributor

I have never heard the term "smart" clients.

@daniel-house
Copy link
Member

Hmmm, I find lots of references to "smart client" using Google, but none of them are on any Redis page. I also see terms like "cluster aware" and "caching". There is definitely a need for better jargon.

@daniel-house
Copy link
Member

I like using a totally new message such as REDIRECT.

In the switchover there will be a period during which there is no server handling queries on the original master's port - what is a client supposed to do? Is there a risk of a network storm in this scenario? Would MOVED/REDIRECT increase or decrease this risk?

Should the scenario be extended to include one last step: switching back to the original host:port?

@PingXie
Copy link
Member

PingXie commented Apr 18, 2024

@yangbodong22011 is very interested in being the maintainer for Jackey. We can initiate a vote to decide.

Yeah, I am onboard with forking the clients now. With the amount of the new features that we are planning, I see this becoming the necessary evil. I think I was the only one holding out on this forking idea among the core team members so we should be able to close on this here async.

As I know, dropping connection is not a smooth way. Imagine that after sending a command, the connection breaks before receiving the response, it's an error to clients.

Technically speaking, the new message is an error for clients not recognizing the proposed new error string too. But yes, there is a chance to make the experience better with an "enlightened" client.

then the client will attempt to reconnect (cannot redirect since no MOVED received), this would be an infinite loop.

Yeah this is what I was trying to get at in my questions. I think your explanation answered them. Thanks

When it sees a MOVED redirect, it assumes that the node is in cluster mode and does one or both of the following

@zuiderkwast, curious, are you aware of a client actually doing this or this is just a hypothesis? In any case, agreed that a new message is cleaner.

In the switchover there will be a period during which there is no server handling queries on the original master's port - what is a client supposed to do? Is there a risk of a network storm in this scenario? Would MOVED/REDIRECT increase or decrease this risk?

I don't think so. This is essentially the cluster behavior today.

Should the scenario be extended to include one last step: switching back to the original host:port?

This is the same scenario too. Whenever the primary gets demoted, the client will get the notification with this proposal.

I am aligned with this proposal and the new message.

@soloestoy
Copy link
Member Author

soloestoy commented Apr 18, 2024

I'm not sure if there is really some clients using MOVED to detect what mode the server is, I don't think it is a good approach since cluster would not return MOVED when clients access master node. But there are some clients using SELECT or CLUSTER command to detect, so we cannot know all clients' behavior.

I'm glad with -REDIRECT host:port. In fact I suggested using this reply at beginning, but former core team thought we would develop unshared-cluster in the future and reuse MOVED is better. I personally prefer to keep things more separated, because there are significant differences in usage between cluster and standalone.

@zuiderkwast
Copy link
Contributor

@zuiderkwast, curious, are you aware of a client actually doing this or this is just a hypothesis? In any case, agreed that a new message is cleaner.

I'm not aware of any. As I mentioned earlier "I don't know if any clients behave like this, but it is a possible behavior and therefore a risk".

Example: valkey-cli -c can follow redirects but it doesn't remember slot mappings. It parses the slot using atoi() just to print a message. It would not crash on MOVED -1 127.0.0.1:7002 but it would print a message like

-> Redirected to slot [-1] located at 127.0.0.1:7002

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
major-decision-pending Needs decision by core team
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants