-
Notifications
You must be signed in to change notification settings - Fork 4.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
xds-failover: adding main logic #34215
Conversation
Signed-off-by: Adi Suissa-Peleg <adip@google.com>
Signed-off-by: Adi Suissa-Peleg <adip@google.com>
Harvey PTAL. LMK if you want more context (I've got an internal integration test the exercises this code). |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Approach LGTM modulo a few issues commented on.
Signed-off-by: Adi Suissa-Peleg <adip@google.com>
Signed-off-by: Adi Suissa-Peleg <adip@google.com>
Signed-off-by: Adi Suissa-Peleg <adip@google.com>
Signed-off-by: Adi Suissa-Peleg <adip@google.com>
Signed-off-by: Adi Suissa-Peleg <adip@google.com>
@nezdolik will you have some bandwidth to review this PR as a !Google reviewer? |
ENVOY_LOG_MISC(trace, "Attempting to reconnect to the failover gRPC source, as a connection " | ||
"to it was previously established."); | ||
if (connected_to_ != ConnectedState::Failover) { | ||
ASSERT(connected_to_ == ConnectedState::None); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
do we need to explicitly set connecting_to_primary_ = false;
here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Added an assert, as connecting_to_primary_
should always be false after being connected to failover. Added a similar assert for the primary part above.
} | ||
|
||
virtual ~GrpcMuxFailover() = default; | ||
|
||
// Attempts to establish a new stream to the either the primary or failover source. | ||
void establishNewStream() { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this method has certain degree of code duplication, wonder if it would be possible to refactor.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Added functions that abstract the calls. I think it is still more readable/understandable if the "ever-connected" parts are done before the rest of the code.
// determined similar to the primary variants. | ||
// Note that while Envoy can only be connected to a single source (mutually | ||
// exclusive), it can attempt connecting to more than one source at a time. | ||
bool connecting_to_primary_ : 1; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
what is the benefit of using bit fields here and not simply initilialise like:
bool connecting_to_primary_ {false};
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Force of habit (to keep bools small, which is good when there are many instances), but I agree that it makes little sense here.
Modified as you suggested.
ConnectedState connected_to_; | ||
|
||
// A flag that keeps track of whether Envoy successfully connected to either the | ||
// primary or failover source. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should we also clarify that successful connection here means "receiving discovery response from xds server"?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Added a note.
return; | ||
} | ||
// No prior connection was established, prefer the primary over the failover. | ||
if (connecting_to_failover_ || (connected_to_ == ConnectedState::Failover)) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
confused how (connected_to_ == ConnectedState::Failover)
could evaluate to true here, e.g. the code execution would get here if ever_connected_to_
is None and there is no established stream
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You are correct, it cannot. Removed that part.
failover_grpc_stream_->sendMessage(request); | ||
return; | ||
} | ||
// Either connecting/connected to the primary, or no connection was attempted. | ||
primary_grpc_stream_->sendMessage(request); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
do we need to check here if we are connected to primary prior to sending? Or would it just produce stream error logs when not connected?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I wanted to keep it the same as GrpcStream. We may want to ensure that in the future, but for now I think we can keep parity.
// connection was closed. | ||
ASSERT((parent_.connecting_to_primary_ || parent_.connected_to_ == ConnectedState::Primary) && | ||
!parent_.connecting_to_failover_ && | ||
(parent_.connected_to_ != ConnectedState::Failover)); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
do we need to repeat the parent_.connected_to_ != ConnectedState::Failover
assertion here? since it is also being evaluated in the line above
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, you are correct. Removed the prior ASSERT (same for the failover part).
// TODO(adisuissa): At the moment this is a pass-through method. Once the | ||
// implementation matures, this call will be updated. | ||
ASSERT( | ||
(parent_.connecting_to_primary_ || (parent_.connected_to_ == ConnectedState::Primary)) && |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: this bool expression (and maybe similar ones) appear frequently in the code:
(parent_.connecting_to_primary_ || (parent_.connected_to_ == ConnectedState::Primary)
, it could benefit from extracting it into small utility method like isConnectingToOrConnectedToPrimary()
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good idea!
I've refactored these and the similar primary counterparts into a function.
Signed-off-by: Adi Suissa-Peleg <adip@google.com>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the great comments @nezdolik!
} | ||
|
||
virtual ~GrpcMuxFailover() = default; | ||
|
||
// Attempts to establish a new stream to the either the primary or failover source. | ||
void establishNewStream() { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Added functions that abstract the calls. I think it is still more readable/understandable if the "ever-connected" parts are done before the rest of the code.
ENVOY_LOG_MISC(trace, "Attempting to reconnect to the failover gRPC source, as a connection " | ||
"to it was previously established."); | ||
if (connected_to_ != ConnectedState::Failover) { | ||
ASSERT(connected_to_ == ConnectedState::None); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Added an assert, as connecting_to_primary_
should always be false after being connected to failover. Added a similar assert for the primary part above.
return; | ||
} | ||
// No prior connection was established, prefer the primary over the failover. | ||
if (connecting_to_failover_ || (connected_to_ == ConnectedState::Failover)) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You are correct, it cannot. Removed that part.
failover_grpc_stream_->sendMessage(request); | ||
return; | ||
} | ||
// Either connecting/connected to the primary, or no connection was attempted. | ||
primary_grpc_stream_->sendMessage(request); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I wanted to keep it the same as GrpcStream. We may want to ensure that in the future, but for now I think we can keep parity.
// connection was closed. | ||
ASSERT((parent_.connecting_to_primary_ || parent_.connected_to_ == ConnectedState::Primary) && | ||
!parent_.connecting_to_failover_ && | ||
(parent_.connected_to_ != ConnectedState::Failover)); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, you are correct. Removed the prior ASSERT (same for the failover part).
// TODO(adisuissa): At the moment this is a pass-through method. Once the | ||
// implementation matures, this call will be updated. | ||
ASSERT( | ||
(parent_.connecting_to_primary_ || (parent_.connected_to_ == ConnectedState::Primary)) && |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good idea!
I've refactored these and the similar primary counterparts into a function.
// determined similar to the primary variants. | ||
// Note that while Envoy can only be connected to a single source (mutually | ||
// exclusive), it can attempt connecting to more than one source at a time. | ||
bool connecting_to_primary_ : 1; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Force of habit (to keep bools small, which is good when there are many instances), but I agree that it makes little sense here.
Modified as you suggested.
ConnectedState connected_to_; | ||
|
||
// A flag that keeps track of whether Envoy successfully connected to either the | ||
// primary or failover source. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Added a note.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks, exciting feature! There is also certain degree of code duplication across FailoverGrpcStreamCallbacks
and PrimaryGrpcStreamCallbacks
, but is also fine to leave it as it is if refactoring is too much effort.
Thanks! |
cc @htuch for final approval and merge |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM, thanks!
Commit Message: xds-failover: adding main logic
Additional Description:
Adding the main logic for the xDS-failover support.
It currently supports a primary and failover sources. Initially, the manager attempts to connect to the primary twice, and if both fail, alternates between the failover and the primary until one is successfully connected to.
Once a connection is made (a response arrives) from either source, the failover manager will stick to that source.
Risk Level: low - not plumbed into any GrpcMux object yet.
Testing: Added unit test.
Docs Changes: N/A
Release Notes: N/A
Platform Specific Features: N/A