Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

docs: add in-place workload vertical scaling proposal #1336

Open
wants to merge 1 commit into
base: master
Choose a base branch
from

Conversation

LavenderQAQ
Copy link

@LavenderQAQ LavenderQAQ commented Jul 18, 2023

Describe what this PR does

Add a proposal for In-place Workload Vertical Scaling.

Fixes #1212

@sonatype-lift
Copy link

sonatype-lift bot commented Jul 18, 2023

Sonatype Lift is retiring

Sonatype Lift will be retiring on Sep 12, 2023, with its analysis stopping on Aug 12, 2023. We understand that this news may come as a disappointment, and Sonatype is committed to helping you transition off it seamlessly. If you’d like to retain your data, please export your issues from the web console.
We are extremely grateful and thank you for your support over the years.

📖 Read about the impacts and timeline

@kruise-bot
Copy link

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by:
Once this PR has been reviewed and has the lgtm label, please assign fillzpp for approval by writing /assign @fillzpp in a comment. For more information see:The Kubernetes Code Review Process.

The full list of commands accepted by this bot can be found here.

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@kruise-bot
Copy link

Welcome @LavenderQAQ! It looks like this is your first PR to openkruise/kruise 🎉

@kruise-bot kruise-bot added the size/XL size/XL: 500-999 label Jul 18, 2023
@LavenderQAQ
Copy link
Author

For the case of vscale failure, my current idea is to start the rollback mechanism, is it necessary to provide users with some configurable interfaces? Because for vscale, rolling back is more like starting another vscale.

@codecov-commenter
Copy link

codecov-commenter commented Jul 18, 2023

Codecov Report

All modified and coverable lines are covered by tests ✅

Comparison is base (5421ee7) 48.55% compared to head (f9116e7) 48.62%.

Additional details and impacted files
@@            Coverage Diff             @@
##           master    #1336      +/-   ##
==========================================
+ Coverage   48.55%   48.62%   +0.06%     
==========================================
  Files         157      157              
  Lines       22480    22480              
==========================================
+ Hits        10916    10930      +14     
+ Misses      10360    10349      -11     
+ Partials     1204     1201       -3     
Flag Coverage Δ
unittests 48.62% <ø> (+0.06%) ⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

@LavenderQAQ
Copy link
Author

Do I need to add more details to expectationDiffs? For example, which container needs vertical scaling. But this can make the structure too bloated.

@LavenderQAQ LavenderQAQ force-pushed the docs/inplace-workload-vertical-scaling branch from 1dd1eaf to d556c31 Compare July 18, 2023 07:36
@cubxxw
Copy link

cubxxw commented Jul 18, 2023

/lgtm

@cubxxw
Copy link

cubxxw commented Jul 18, 2023

/intive

@kruise-bot
Copy link

@cubxxw: changing LGTM is restricted to collaborators

In response to this:

/lgtm

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@cubxxw
Copy link

cubxxw commented Jul 18, 2023

/chat


##### Enhance Expectation

Create a new VScaleExpectations for use by different controllers.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why not reuse ResourceVersionExpectations ?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think there might be some judgment here about the vscale status, many Pods might be in the Proposed or InProgress state, and there seems to be no way to judge only resource_version.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think there might be some judgment here about the vscale status, many Pods might be in the Proposed or InProgress state, and there seems to be no way to judge only resource_version.

ok, I got it :)

type expectationDiffs struct {
...
// vscale identifies whether resources have changed
vscale bool
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

numVScale int32

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, I should change the quantity here. This design is not perfect for the time being, and we need to consider the vertical scaling of some Pods (not all) in the future. Then I'll update it with a commit.


#### Design Objectives

- Use vscale as a standalone operation rather than integrated in update. For example, for cloneset, three operations will be improved to scale, vscale, and update.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good idea! But we should consider how to avoid to break maxUnavailable limitation in some scene, for example, users update both resources and images fields, and settings RestartRequired of vscale.


- Users only need to modify template in workload to achieve vertical scaling for all managed Pods
- Simple and clear configuration strategy, support grayscale capabilities
- Provides the rollback function, and attempts to return to the previous state when the capacity expansion and shrinkage fails. If the rollback still fails, no further action is required
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe, we should provide more policies. For example. if a vscale proposal is rejected by kubelet (Infeasible), we should allow it to be demote to rebuild upgrade.


```go
// CloneSetVScaleStrategy defines strategies for pods vscale.
type CloneSetVScaleStrategy struct {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe, it's better to reuse UpdateStrategy filed for the following considers:

  1. vscale also should consider maxUnavailable;
  2. vscale also need grayscale capabilities (partition) ;

@LavenderQAQ LavenderQAQ force-pushed the docs/inplace-workload-vertical-scaling branch from d556c31 to 888d446 Compare July 26, 2023 11:52
@LavenderQAQ LavenderQAQ marked this pull request as ready for review July 26, 2023 11:54
@LavenderQAQ
Copy link
Author

LavenderQAQ commented Jul 26, 2023

Sorry, there may be some significant design updates, and after trying to implement a demo, I believe that treating workload vertical scaling as part of the update operation may be a better choice. We can discuss this new design in the community meeting. I have a wonderful demo here.

@LavenderQAQ
Copy link
Author

/hold

@LavenderQAQ LavenderQAQ force-pushed the docs/inplace-workload-vertical-scaling branch 2 times, most recently from 7ed0cf9 to 7bbca45 Compare July 27, 2023 07:56
@LavenderQAQ
Copy link
Author

/unhold

@LavenderQAQ LavenderQAQ changed the title docs: Add In-place Workload Vertical Scaling Proposal docs: add in-place workload vertical scaling proposal Jul 27, 2023
@LavenderQAQ
Copy link
Author

/hold

Copy link

stale bot commented Nov 1, 2023

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.

@stale stale bot added the wontfix This will not be worked on label Nov 1, 2023
@LavenderQAQ LavenderQAQ force-pushed the docs/inplace-workload-vertical-scaling branch 2 times, most recently from 60e4bd7 to 9b46a52 Compare November 1, 2023 13:20
@stale stale bot removed the wontfix This will not be worked on label Nov 1, 2023
@LavenderQAQ
Copy link
Author

/unhold

@LavenderQAQ LavenderQAQ force-pushed the docs/inplace-workload-vertical-scaling branch from 9b46a52 to 0c6aa76 Compare November 1, 2023 13:24
Copy link

stale bot commented Jan 30, 2024

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.

@stale stale bot added the wontfix This will not be worked on label Jan 30, 2024
Signed-off-by: LavenderQAQ <1254297317@qq.com>
@LavenderQAQ LavenderQAQ force-pushed the docs/inplace-workload-vertical-scaling branch from 0c6aa76 to f9116e7 Compare January 31, 2024 02:23
@stale stale bot removed the wontfix This will not be worked on label Jan 31, 2024
@zmberg zmberg modified the milestones: 1.8, 1.9 Mar 25, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[feature request] In-place udpate support resources
7 participants