-
Notifications
You must be signed in to change notification settings - Fork 881
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add graceful-restart support #2386
base: main
Are you sure you want to change the base?
Conversation
Note to discuss:
it seems it fails, and retries to apply that the speaker/frr-reloader. |
The above might be related to FRRouting/frr#8403
|
In my test: I achieved zero downtime if I have graceful restart either in global (Test= in runtime I delete the speaker and see if the client loses connectivity) The second command that enables the F -bit is only available in global mode, not available per neighbor. Therefore a solution which we add graceful-restart in neighbor level and f-bit in global (because is not safe to just have that there by default) seem not optimal. @fedepaol How much do you object to have it global (one for enable gracefull restart/one to on/off f-bit), and also have the option to opt out per neighbor? Any other ideas? |
FRRouting/frr#15880 created this issue upstream, even though we are not really blocked on that (well we will get an error but doe not look like it has an impact) |
Won't the global F-bit affect only peers with gr enabled? If so, doesn't sound too terrible to always enable it. |
My preference would be to be configurable to be in the safe side. I cannot find argument against always having it (or being default on) but no idea for which case this option exists. Should we go always having it in the config? In the same topic, should we not allow tuning the timer time? default is 120sec but seems long (ideally we need < holdtime) |
I'd try to keep the ux as best as we can, which means not having to set an extra parameter. We can either set the global when at least one peer needs GR, or just always enable it. I am leaning towards the second, as we'll have to deal with peers without GR and the global anyway, so if it doesn't work we'll have a problem.
You mean the graceful restart timer? That should be meaningful only for the helper side I guess (but we may need it in frr-k8s where we might be helpers). |
cc @oribon |
14ff603
to
0b43ca8
Compare
api/v1beta2/bgppeer_types.go
Outdated
// known routes while the routing protocol information is being restored. | ||
// Needed for FRR mode only. | ||
// +optional | ||
GracefulRestart bool `json:"gracefulRestart,omitempty"` |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: EnableGracefulRestart
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I can change that, sure
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit on the commit subject: this commit does more than adding the field to the api, we are adding the support end to end (which is totally fine, but let's change the subject to "add graceful restart support"
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have three commits and the e2e is added on its own commit, no?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done
err := ConfigUpdater.Update(resources) | ||
Expect(err).NotTo(HaveOccurred()) | ||
|
||
for _, c := range FRRContainers { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it'd make sense to have a real e2e test where we shut down a speaker and we check the traffic is still flowing.
First thing that comes to my mind is, focus on a node, exclude it from the daemonset selector, have a pod running only there with etp=local, ensure the traffic flows when the speaker is down
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's probably easier if we raise the graceful restart timers of the peered neighbors, so we'll have control
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
not sure, because at the end we test whether FRR supports the functionality. Nevertheless, we can implement that, is there any similar test?
Why you think we should raise the timers (default is 120 sec)?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let's start with the default and see if it's enough then
72e3c27
to
d65bb88
Compare
api/v1beta2/bgppeer_types.go
Outdated
// Needed for FRR mode only. | ||
// +optional | ||
// +kubebuilder:validation:XValidation:rule="self == oldSelf",message="EnableGracefulRestart cannot be changed after creation, because GR requires rest BGP session" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: "Needed" should be more about "Supported" here?
also I'd put in the description the point about "this field is immutable because it requires restart of the BGP session" and keep the validation message "EnableGracefulRestart cannot be changed after creation"
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
sounds good, done
internal/config/validation.go
Outdated
@@ -41,6 +41,9 @@ func DiscardFRROnly(c ClusterResources) error { | |||
if p.Spec.ConnectTime != nil { | |||
return fmt.Errorf("peer %s has connect time set on native bgp mode", p.Spec.Address) | |||
} | |||
if p.Spec.EnableGracefulRestart { | |||
return fmt.Errorf("peer %s has enabled GR (GracefulRestart) flag set on native bgp mode", p.Spec.Address) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
return fmt.Errorf("peer %s has enabled GR (GracefulRestart) flag set on native bgp mode", p.Spec.Address) | |
return fmt.Errorf("peer %s has GracefulRestart flag set on native bgp mode", p.Spec.Address) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done
50f6988
to
363fa20
Compare
BGP graceful restart functionality as defined in RFC-4724 defines the mechanisms that allows BGP speaker to continue to forward data packets along known routes while the routing protocol information is being restored. This allows DS to be updated without routes being retracted in the peer side. We enable by default if GR then the F-bit (preserve-fw-state) to be set. We make gracefulRestart immutable according to https://kubernetes.io/blog/2022/09/29/enforce-immutability-using-cel/#immutablility-after-first-modification Signed-off-by: karampok <karampok@gmail.com>
ef87f52
to
847394f
Compare
Signed-off-by: karampok <karampok@gmail.com>
63944e7
to
a7cbd33
Compare
- Restarts speaker and wait to be healthy - Once healthy cancel context to finish faster in happy path Signed-off-by: karampok <karampok@gmail.com>
Signed-off-by: karampok <karampok@gmail.com>
eg, ctx := errgroup.WithContext(ctx) | ||
for _, c := range FRRContainers { | ||
validateService(svc, allNodes.Items, c) | ||
cc := c // https://go.dev/doc/faq#closures_and_goroutines |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can you use cc above (or just c := c) and use c? Seems a bit less confusing.
|
||
ret := true | ||
for _, p := range pods { | ||
ret = ret && k8s.PodIsReady(p) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can you just exit early if the pod is not ready?
} | ||
|
||
f := func(context.Context) (bool, error) { | ||
pods, err := SpeakerPods(cs) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this may be quick enough to pick the pods being deleted (and seeing them as ready). We should check that the names changed, or add a non existing node selector to the daemonset in order to guarantee a full restart.
@@ -32,6 +32,38 @@ func init() { | |||
} | |||
} | |||
|
|||
func RestartSpeakerPods(cancel context.CancelFunc, cs clientset.Interface) error { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This will return nil the moment we see the pods restarted (bugs aside).
I'd keep this func context un-aware and let it do what it promises (restart the pods) and handle the asynch part from the calling site.
} | ||
} | ||
|
||
f := func(context.Context) (bool, error) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this can be embedded as anonymous function in the call below. No need to assign it to a variable.
}) | ||
} | ||
|
||
err = metallb.RestartSpeakerPods(cancel, cs) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'd try to simplify this, because it adds complexity and is a bit hard to follow.
How about restarting the pods, running an eventually assertion that checks all the pods are restarted, and inside the eventually body we check that the service keeps being accessible?
@@ -61,6 +62,21 @@ var _ = ginkgo.Describe("BGP metrics", func() { | |||
}) | |||
|
|||
ginkgo.BeforeEach(func() { | |||
clientconfig := k8sclient.RestConfig() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why this was moved?
@@ -98,9 +101,25 @@ var _ = ginkgo.Describe("BGP", func() { | |||
}) | |||
|
|||
ginkgo.BeforeEach(func() { | |||
|
|||
clientconfig := k8sclient.RestConfig() | |||
var err error |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why this was moved?
/kind feature
What this PR does / why we need it:
Issue covers #2368
Special notes for your reviewer:
Release note: