Support smooth swithover in standalone mode by reusing MOVED #319

soloestoy · 2024-04-15T11:59:26Z

Issue: redis/redis#12097
PR: redis/redis#12192

In many scenarios, we need to perform the action of switching. Take a rolling version upgrade as an example, suppose there are two nodes A and B, where A is the master and B is a replica of A. Under normal circumstances, users only access the master node A. Then, during a version upgrade, users would first upgrade the version of the replica node B, then switch to make B the master and turn A into B's replica. Simultaneously, user access also needs to switch from node A to node B, followed by an upgrade of node A to complete the entire version upgrade process:

However, during this process, the switch of user access is not entirely smooth. In fact, there are different reactions under cluster mode and standalone mode:

In cluster mode, after the switch, when the replica is accessed, Redis will return -MOVED slot ip:port. In cluster mode, this is a routine result. The client will not treat it as an error returned to the user. Instead, the client will automatically redirect to the new master node B, as indicated by the ip:port in the result, thus achieving a smooth switch.
In standalone mode, after the switch, accessing the replica will return the error -READONLY You can't write against a read only replica. The client will pass this error to the user, and a smooth switch cannot be achieved.

As we can see, the switching experience in cluster mode is clearly superior to that of standalone mode. Therefore, to improve the switching process in standalone mode, we can utilize the MOVED redirect mechanism. The first step would require the valkey server to use MOVED to replace READONLY in standalone mode, and then support for handling the MOVED return value needs to be implemented in the client ecosystem.

The latest discussion is we want to introduce a new reply REDIRECT, so it would not breaking anything to cluster.

More details, the slot returned by the replica in the MOVE can be -1, and this can also be applied to cluster mode. It serves to clearly inform the client that it has accessed a replica and should be redirected to the master, rather than having accessed the wrong shard requiring redirection.

The text was updated successfully, but these errors were encountered:

zuiderkwast · 2024-04-16T09:42:16Z

I think we need to improve this scenario. I think you @soloestoy can be our Chief of Standalone Mode. It seems like many other in the core team are more focused on Cluster.

The main concern is clients. Cluster clients already handle MOVED but the slot is in the range 0-16383. What happens if they see -1 here? Worst case they crash. Can we make a small investigation to make sure some of the most popular clients don't crash?

Then I think we can add it, but the default configuration can be off. We need to update the docs for clients.

daniel-house · 2024-04-16T20:32:30Z

I like the general idea here. This could be a big help for applications that don't require clusters.

Are there any dumb clients still in use? (Dumb meaning that they won't even support MOVE.) Hmmm, I guess valkey-cli and hiredis both qualify as dumb clients. Am I right to assume that an application using a dumb client must provide its own methods for deciding how to redirect to the new master, B? How does such an application handle a hard crash of node A?

More details, the slot returned by the replica in the MOVE can be -1, and this can also be applied to cluster mode. It serves to clearly inform the client that it has accessed a replica and should be redirected to the master, rather than having accessed the wrong shard requiring redirection.

Do we need to do this part? Does the client need to know that this is "redirect from replica to master" and not "redirect to another shard"?

I can see that if we have a smart enough client, it can detect the -1, and redirect all future requests to the new master. That would be a good optimization and would apply to either standalone or cluster mode. But, is it anything more than an very good optimization for a rare event?

Initially, none of the clients will understand -1. Even if they don't crash, they won't work. It would be surprising if this crashed valkey-cli or hiredis. It might easily cause trouble for memteir_benchmark even if it didn't crash. (It might even be worse if it didn't crash, causing totally misleading results. I really love fail-fast.)

This feels like a breaking change. Perhaps the protocol could be extended to allow a client that understands the -1 to turn this on just for the client's connection.

All of this makes me think it would be nice for cluster V2 to be designed so that standalone was just a special case. A cluster with one node. Then standalone could be easily scaled up, and a cluster could be scaled down to standalone.

zuiderkwast · 2024-04-16T22:45:20Z

@daniel-house Only cluster clients recognize -MOVED redirects. Far from all clients are cluster clients.

PingXie · 2024-04-17T08:09:15Z

This is a very fine idea but the effectiveness of this feature apparently depends on clients being collaborative. Given the tricky client ecosystem situation, IMO, until we have a mature valkey client ecosystem, which is a much bigger fish to fry, few users will be able to take advantage of this feature. Btw, I agree with @daniel-house that this will be a breaking change so it shouldn't be enabled by default.

@soloestoy @CharlesChen888 what are your thoughts on the client support?

Alternatively, why wouldn't simply dropping the connection from the replica side work? It is not clear to me how the network is set up in this scenario. for instance, is there an ILB in front of the two nodes? or the two nodes are exposed directly to the client? if the two nodes are exposed to the client directly, how does the client tell who the primary is in the first place? I would assume somewhere in your system there is a way to help the client figure out who the primary is? if so and if the client has been set up to retry the request on errors, dropping the connections would be more compatible with the common error handling logic.

PS, cluster V2 also depends on us having a mature valkey-native client ecosystem.

soloestoy · 2024-04-17T09:57:13Z

First, let's shift our focus away from -1, as it's not the main point. We can solely use -1 in standalone mode, while the cluster continues to maintain access to the slot : )

Indeed, this is an improvement that requires joint effort from both the client and server. In the typical pathway for applying new features, it is first implemented on the server side, followed by client-side support. Things are indeed moving in this direction, and in fact, it has already been accepted by the Redis core team. However, it is regrettable that Redis modified the protocol before the merge could happen.

On the client side, I believe we need to have our own Valkey client to push the development of the ecosystem. Major languages should have one or more clients. I think we can start with Java, either by forking Jedis or writing a new one, we can name it Jackey (Java client for Valkey), @yangbodong22011 is very interested in being the maintainer for Jackey. We can initiate a vote to decide.

Alternatively, why wouldn't simply dropping the connection from the replica side work?

As I know, dropping connection is not a smooth way. Imagine that after sending a command, the connection breaks before receiving the response, it's an error to clients. If the replica drops connection upon receiving a request, and then the client will attempt to reconnect (cannot redirect since no MOVED received), this would be an infinite loop.

for instance, is there an ILB in front of the two nodes? or the two nodes are exposed directly to the client? if the two nodes are exposed to the client directly, how does the client tell who the primary is in the first place? I would assume somewhere in your system there is a way to help the client figure out who the primary is? if so and if the client has been set up to retry the request on errors, dropping the connections would be more compatible with the common error handling logic.

I can somewhat grasp what you are saying, but I don't fully understand it. Let me explain based on my understanding:

nodes are exposed to clients directly

In such a scenario, when a switch occurs and the client attempts to access the replica, it will encounter an error. This is the situation I am looking to improve by helping the client to redirect to the master node.
nodes are exposed to clients directly with config server like sentinel

I'm not very familiar with Sentinel, but from what I know, there are two methods by which Sentinel can notify clients to access the new master.
- The first method is where clients subscribe to switch events and, upon receiving a switch message, redirect traffic to the new master. However, there can be a delay in the message, which means traffic may still go to the replica and result in errors during that period.
- The second method is when the client detects that it has accessed a replica, it proactively queries Sentinel for the new master's address. This requires an additional interaction with Sentinel. Additionally, the updates to Sentinel's configuration and the role switch of the nodes are not atomic operations, which means that errors may still possibly occur.
Converging redirection within the node itself through MOVED can also optimize these scenarios.
nodes behind LB

I think you are referring to the scenario where a LB (Load Balancer) is bound to a VIP (Virtual IP) that ensures it always points to the master, thus clients wouldn't need to worry about the switch-over issue. However, there is a problem concerning how to handle the existing connections on the original node after the switch-over. If connections are immediately terminated, then requests that have not yet received a reply will result in errors. If the connections are kept alive, then a MOVED response (even if it's for the same VIP) would still be necessary to inform the client to redirect.

zuiderkwast · 2024-04-17T13:21:51Z

Although it is a breaking change, it believe it is mostly safe, because a client would get an error and now it would get a redirect, which also a kind of error reply. Most clients just return this to the user code.

The only risk I can see if there are any clients that behave like this:

A client connects to a node and assumes it is a standalone node, but it automatically switches to cluster mode when it sees a MOVED redirect. It is using MOVED to do autodetect of cluster mode.
When it sees a MOVED redirect, it assumes that the node is in cluster mode and does one or both of the following:
- It tries to fetch slot mapping using CLUSTER SLOTS or SHARDS or NODES.
- It assumes the slot number (-1) is a valid slot number and tries to use it like an array index or similar, and crashes the client.

I don't know if any clients behave like this, but it is a possible behaviour and therefore a risk.

We can completely avoid this by using a new, different redirect reply, such as -REDIRECT host:port, which has not been used so far. For a client that doesn't handle this, it is just treated as an error reply.

@soloestoy I don't remember exactly, but I think this was already discussed before. But maybe we didn't discuss why it is more safe?

daniel-house · 2024-04-17T14:45:20Z

@daniel-house Only cluster clients recognize -MOVED redirects. Far from all clients are cluster clients.

I have heard of smart clients but never cluster clients. I am assuming that cluster=smart and smart clients are ones that understand ASK/ASKING and re-issue the query in the appropriate way to the shard indicated by the MOVED response. If that is not so, please correct me.

Please forgive me, but I am incredibly picky which it comes to jargon and definitions and stuff like that. I could have just assumed
that cluster=smart, but doing so is against my fundamental nature.

zuiderkwast · 2024-04-17T14:54:38Z

I have never heard the term "smart" clients.

daniel-house · 2024-04-17T15:03:42Z

Hmmm, I find lots of references to "smart client" using Google, but none of them are on any Redis page. I also see terms like "cluster aware" and "caching". There is definitely a need for better jargon.

daniel-house · 2024-04-17T15:21:39Z

I like using a totally new message such as REDIRECT.

In the switchover there will be a period during which there is no server handling queries on the original master's port - what is a client supposed to do? Is there a risk of a network storm in this scenario? Would MOVED/REDIRECT increase or decrease this risk?

Should the scenario be extended to include one last step: switching back to the original host:port?

PingXie · 2024-04-18T06:07:27Z

@yangbodong22011 is very interested in being the maintainer for Jackey. We can initiate a vote to decide.

Yeah, I am onboard with forking the clients now. With the amount of the new features that we are planning, I see this becoming the necessary evil. I think I was the only one holding out on this forking idea among the core team members so we should be able to close on this here async.

As I know, dropping connection is not a smooth way. Imagine that after sending a command, the connection breaks before receiving the response, it's an error to clients.

Technically speaking, the new message is an error for clients not recognizing the proposed new error string too. But yes, there is a chance to make the experience better with an "enlightened" client.

then the client will attempt to reconnect (cannot redirect since no MOVED received), this would be an infinite loop.

Yeah this is what I was trying to get at in my questions. I think your explanation answered them. Thanks

When it sees a MOVED redirect, it assumes that the node is in cluster mode and does one or both of the following

@zuiderkwast, curious, are you aware of a client actually doing this or this is just a hypothesis? In any case, agreed that a new message is cleaner.

In the switchover there will be a period during which there is no server handling queries on the original master's port - what is a client supposed to do? Is there a risk of a network storm in this scenario? Would MOVED/REDIRECT increase or decrease this risk?

I don't think so. This is essentially the cluster behavior today.

Should the scenario be extended to include one last step: switching back to the original host:port?

This is the same scenario too. Whenever the primary gets demoted, the client will get the notification with this proposal.

I am aligned with this proposal and the new message.

soloestoy · 2024-04-18T07:43:54Z

I'm not sure if there is really some clients using MOVED to detect what mode the server is, I don't think it is a good approach since cluster would not return MOVED when clients access master node. But there are some clients using SELECT or CLUSTER command to detect, so we cannot know all clients' behavior.

I'm glad with -REDIRECT host:port. In fact I suggested using this reply at beginning, but former core team thought we would develop unshared-cluster in the future and reuse MOVED is better. I personally prefer to keep things more separated, because there are significant differences in usage between cluster and standalone.

zuiderkwast · 2024-04-18T14:14:32Z

@zuiderkwast, curious, are you aware of a client actually doing this or this is just a hypothesis? In any case, agreed that a new message is cleaner.

I'm not aware of any. As I mentioned earlier "I don't know if any clients behave like this, but it is a possible behavior and therefore a risk".

Example: valkey-cli -c can follow redirects but it doesn't remember slot mappings. It parses the slot using atoi() just to print a message. It would not crash on MOVED -1 127.0.0.1:7002 but it would print a message like

-> Redirected to slot [-1] located at 127.0.0.1:7002

soloestoy linked a pull request Apr 16, 2024 that will close this issue

replica redirect read&write to master in standalone mode #325

Open

zuiderkwast added the major-decision-pending Needs decision by core team label Apr 16, 2024

enjoy-binbin linked a pull request Apr 17, 2024 that will close this issue

replica redirect read&write to master in standalone mode #325

Open

hpatro mentioned this issue Apr 17, 2024

Propagate sharded pubsub data via replication link #307

Open

yangbodong22011 mentioned this issue Apr 30, 2024

feature: support standalone seamless switch valkey-io/jackey#1

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Support smooth swithover in standalone mode by reusing MOVED #319

Support smooth swithover in standalone mode by reusing MOVED #319

soloestoy commented Apr 15, 2024 •

edited

zuiderkwast commented Apr 16, 2024

daniel-house commented Apr 16, 2024

zuiderkwast commented Apr 16, 2024

PingXie commented Apr 17, 2024

soloestoy commented Apr 17, 2024

zuiderkwast commented Apr 17, 2024 •

edited

daniel-house commented Apr 17, 2024

zuiderkwast commented Apr 17, 2024

daniel-house commented Apr 17, 2024

daniel-house commented Apr 17, 2024

PingXie commented Apr 18, 2024

soloestoy commented Apr 18, 2024 •

edited

zuiderkwast commented Apr 18, 2024

Support smooth swithover in standalone mode by reusing MOVED #319

Support smooth swithover in standalone mode by reusing MOVED #319

Comments

soloestoy commented Apr 15, 2024 • edited

zuiderkwast commented Apr 16, 2024

daniel-house commented Apr 16, 2024

zuiderkwast commented Apr 16, 2024

PingXie commented Apr 17, 2024

soloestoy commented Apr 17, 2024

zuiderkwast commented Apr 17, 2024 • edited

daniel-house commented Apr 17, 2024

zuiderkwast commented Apr 17, 2024

daniel-house commented Apr 17, 2024

daniel-house commented Apr 17, 2024

PingXie commented Apr 18, 2024

soloestoy commented Apr 18, 2024 • edited

zuiderkwast commented Apr 18, 2024

soloestoy commented Apr 15, 2024 •

edited

zuiderkwast commented Apr 17, 2024 •

edited

soloestoy commented Apr 18, 2024 •

edited