Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

osd: skip upgrade if migration is pending #14196

Merged
merged 1 commit into from
May 29, 2024

Conversation

sp98
Copy link
Contributor

@sp98 sp98 commented May 14, 2024

Currently we allow upgrade of other OSDs while migration of OSD, due to change in backing store, is pending. This will not work if the updated OSD ceph version does not support the currently applied backing store. So rook will skip any OSD upgrade if the OSD migration is pending.

Testing:

  • Tested the changing the backing store from bluestore to bluestore-rdr and vice versa multiple times. OSDs are getting migrated without any issues.
  • Also verified the OSDs pods that are pending migration are not getting upgraded by updating the ceph image and the store type in the cephCluster CR at the same time.

operator-logs.txt

cephCluster CR status before upgrade:

status:
  ceph:
    capacity:
      bytesAvailable: 32146395136
      bytesTotal: 32212254720
      bytesUsed: 65859584
      lastUpdated: "2024-05-16T09:05:49Z"
    fsid: 07ce7de3-e339-4c1e-9092-2d8cab527c13
    health: HEALTH_OK
    lastChanged: "2024-05-16T09:04:45Z"
    lastChecked: "2024-05-16T09:05:49Z"
    previousHealth: HEALTH_WARN
    versions:
      mgr:
        ceph version 17.2.6-373-g5406f4b1 (5406f4b1729bf4f476338ad344cc86b126e0c5b3) quincy (stable): 2
      mon:
        ceph version 17.2.6-373-g5406f4b1 (5406f4b1729bf4f476338ad344cc86b126e0c5b3) quincy (stable): 3
      osd:
        ceph version 17.2.6-373-g5406f4b1 (5406f4b1729bf4f476338ad344cc86b126e0c5b3) quincy (stable): 3
      overall:
        ceph version 17.2.6-373-g5406f4b1 (5406f4b1729bf4f476338ad344cc86b126e0c5b3) quincy (stable): 8
  conditions:
  - lastHeartbeatTime: "2024-05-16T09:05:49Z"
    lastTransitionTime: "2024-05-16T08:54:40Z"
    message: Cluster created successfully
    reason: ClusterCreated
    status: "True"
    type: Ready
  message: Cluster created successfully
  observedGeneration: 3
  phase: Ready
  state: Created
  storage:
    deviceClasses:
    - name: hdd
    osd:
      storeType:
        bluestore-rdr: 3
  version:
    image: quay.io/guits/ceph-volume:bs-rdr
    version: 17.2.6-373

cephCluster CR status after upgrade.

status:
    ceph:
      capacity:
        bytesAvailable: 32126930944
        bytesTotal: 32212254720
        bytesUsed: 85323776
        lastUpdated: "2024-05-16T09:23:40Z"
      details:
        OSD_UPGRADE_FINISHED:
          message: all OSDs are running squid or later but require_osd_release < squid
          severity: HEALTH_WARN
        PG_DEGRADED:
          message: 'Degraded data redundancy: 2/6 objects degraded (33.333%), 1 pg
            degraded'
          severity: HEALTH_WARN
      fsid: 07ce7de3-e339-4c1e-9092-2d8cab527c13
      health: HEALTH_WARN
      lastChanged: "2024-05-16T09:23:40Z"
      lastChecked: "2024-05-16T09:23:40Z"
      previousHealth: HEALTH_OK
      versions:
        mgr:
          ceph version 19.0.0-1778-g5062c03f (5062c03f723d4318b100d525eeb7e0a85d579243) squid (dev): 2
        mon:
          ceph version 19.0.0-1778-g5062c03f (5062c03f723d4318b100d525eeb7e0a85d579243) squid (dev): 3
        osd:
          ceph version 19.0.0-1778-g5062c03f (5062c03f723d4318b100d525eeb7e0a85d579243) squid (dev): 3
        overall:
          ceph version 19.0.0-1778-g5062c03f (5062c03f723d4318b100d525eeb7e0a85d579243) squid (dev): 8
    conditions:
    - lastHeartbeatTime: "2024-05-16T09:24:39Z"
      lastTransitionTime: "2024-05-16T08:54:40Z"
      message: Cluster created successfully
      reason: ClusterCreated
      status: "True"
      type: Ready
    - lastHeartbeatTime: "2024-05-16T09:24:44Z"
      lastTransitionTime: "2024-05-16T09:24:44Z"
      message: Configuring Ceph Mons
      reason: ClusterProgressing
      status: "True"
      type: Progressing
    message: Configuring Ceph Mons
    observedGeneration: 4
    phase: Progressing
    state: Creating
    storage:
      deviceClasses:
      - name: hdd
      osd:
        storeType:
          bluestore: 3
    version:
      image: quay.ceph.io/ceph-ci/ceph:wip-aclamk-bs-compression-recompression-test
      version: 19.0.0-1778
kind: List

Ceph status

sh-4.4$ ceph status  
  cluster:
    id:     07ce7de3-e339-4c1e-9092-2d8cab527c13
    health: HEALTH_OK
 
  services:
    mon: 3 daemons, quorum a,b,c (age 14m)
    mgr: a(active, since 6m), standbys: b
    osd: 3 osds: 3 up (since 9m), 3 in (since 38m)
 
  data:
    pools:   1 pools, 1 pgs
    objects: 2 objects, 449 KiB
    usage:   83 MiB used, 30 GiB / 30 GiB avail
    pgs:     1 active+clean
 
sh-4.4$ 

Checklist:

  • Commit Message Formatting: Commit titles and messages follow guidelines in the developer guide.
  • Reviewed the developer guide on Submitting a Pull Request
  • Pending release notes updated with breaking and/or notable changes for the next minor release.
  • Documentation has been updated, if necessary.
  • Unit tests have been added, if necessary.
  • Integration tests have been added, if necessary.

@sp98 sp98 changed the title osd: skip upgrade if migration is pending [WIP] osd: skip upgrade if migration is pending May 14, 2024
// skip update of all the OSDs if the migration of OSD is pending.
if c.replaceOSD != nil {
logger.Info("skipping update of all the OSDs since OSD migration is pending")
updateQueue.Clear()
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We only want to skip the reconcile of OSDs that are still on the old/unsupported format. For example, what if all the OSD migration is completed, except one OSD gets stuck and fails the migration? We don't want to block the update of all the OSDs that have successfully migrated.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This might need some more changes. I'll take a look.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Made the changes to only skip reconcile of the OSDs that are still on the old/unsupported format and need migration. Testing looks good and added details in the description.

@sp98 sp98 force-pushed the skip-osd-uprade-during-migration branch 2 times, most recently from 37a4749 to a152bb4 Compare May 16, 2024 09:49
@sp98 sp98 requested a review from travisn May 16, 2024 09:50
@sp98 sp98 force-pushed the skip-osd-uprade-during-migration branch from a152bb4 to 5c9fca9 Compare May 16, 2024 09:52
@sp98 sp98 changed the title [WIP] osd: skip upgrade if migration is pending osd: skip upgrade if migration is pending May 16, 2024
Copy link
Member

@BlaineEXE BlaineEXE left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Overall, I think things here look good. I have some questions to make sure my understanding is correct, and one suggestion to avoid having to do nil pointer checks for any future changes.

Comment on lines 219 to 223
// identify OSDs that need migration to a new backend store
err := c.replaceOSDForNewStore()
osdsToSkipReconcile, err := controller.GetDaemonsToSkipReconcile(c.clusterInfo.Context, c.context, c.clusterInfo.Namespace, OsdIdLabelKey, AppName)
if err != nil {
return errors.Wrapf(err, "failed to replace an OSD that needs migration to new backend store in namespace %q", namespace)
logger.Warningf("failed to get osds to skip reconcile. %v", err)
}

osdsToSkipReconcile, err := controller.GetDaemonsToSkipReconcile(c.clusterInfo.Context, c.context, c.clusterInfo.Namespace, OsdIdLabelKey, AppName)
// replace OSDs for a new backing store
osdsToBeReplaced, err := c.replaceOSDForNewStore()
if err != nil {
logger.Warningf("failed to get osds to skip reconcile. %v", err)
return errors.Wrapf(err, "failed to replace OSD for new backing store %q in namespace %q", c.spec.Storage.Store.Type, namespace)
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It looks like these 2 blocks are changing positions. What is the behavior that results from switching the order here? I'm having trouble understanding what the purpose might be, so I imagine it must be related to side-effects of execution.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No side-effects. I'll move it back to the original order.

// getOSDReplaceInfo returns an existing OSD that needs to be replaced for a new backend store
func (c *Cluster) getOSDReplaceInfo() (*OSDReplaceInfo, error) {
osdReplaceInfo, err := GetOSDReplaceConfigMap(c.context, c.clusterInfo)
func (c *Cluster) replaceOSDForNewStore() (*OSDReplaceInfoList, error) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it would be best to have this return a non-pointer list. Looking at usage of this function's output, I see a lot of checks for nil followed immediately by some other check. I believe that future code maintenance will be slightly less likely to introduce bugs if we don't have to worry about non-nil return values in the caller.
 

Suggested change
func (c *Cluster) replaceOSDForNewStore() (*OSDReplaceInfoList, error) {
func (c *Cluster) replaceOSDForNewStore() (OSDReplaceInfoList, error) {

Comment on lines 827 to 830
// replaceOSDForNewStore deletes an existing OSD deployment and saves the information in the configmap
func (c *Cluster) deleteOSDDeployment(osdID int) error {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Function name in doc comment and actual function name don't match. It looks like the rest of the text is still correct based on my interpretation, right?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fixed.

Copy link
Member

@travisn travisn left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Here is a manual test to consider for validating these changes... Restart the operator in the middle of a migration, then confirm that the non-migrated OSDs will still be migrated, the migrated OSDs won't be migrated again, and the in-progress migration finishes.

@sp98 sp98 force-pushed the skip-osd-uprade-during-migration branch from 5c9fca9 to d4efb2b Compare May 20, 2024 08:15
@sp98 sp98 requested a review from BlaineEXE May 20, 2024 09:06
@sp98
Copy link
Contributor Author

sp98 commented May 20, 2024

Here is a manual test to consider for validating these changes... Restart the operator in the middle of a migration, then confirm that the non-migrated OSDs will still be migrated, the migrated OSDs won't be migrated again, and the in-progress migration finishes.

Tested with following scenarios:

  1. Operator POD is restarted after the OSD pod is migrated but the data is re-balancing.
  2. Operator POD is restarted after the OSD pod is deleted and the OSD prepare pod is in running state (cleaning the OSD disk)
  3. Operator POD is restarted at any other place.

1 and 3 works fine.

For 2, the config map is not deleted and in the next reconcile operator will migrate the same OSD again. This is not desirable (although all the OSDs get migrated without issues at the end). One way to fix is to always delete the configMap at the end of the OSD reconcile without checking if c.repalceOSD is nill or not. So doing that right and testing

@sp98 sp98 force-pushed the skip-osd-uprade-during-migration branch from d4efb2b to 3d76974 Compare May 21, 2024 04:48
@sp98 sp98 requested a review from travisn May 21, 2024 04:49
@sp98
Copy link
Contributor Author

sp98 commented May 21, 2024

For 2, the config map is not deleted and in the next reconcile operator will migrate the same OSD again. This is not desirable (although all the OSDs get migrated without issues at the end). One way to fix is to always delete the configMap at the end of the OSD reconcile without checking if c.repalceOSD is nill or not. So doing that right and testing

This fixes the migration of the OSD that was previously migrated. Pushed the changes in the PR.

Copy link
Member

@travisn travisn left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, thanks for working through those manual test scenarios. Just a couple nits to consider.

pkg/operator/ceph/cluster/osd/replace.go Outdated Show resolved Hide resolved
pkg/operator/ceph/cluster/osd/update.go Outdated Show resolved Hide resolved
@sp98 sp98 force-pushed the skip-osd-uprade-during-migration branch from 3d76974 to 30a0532 Compare May 27, 2024 06:03
Currently we allow upgrade of other OSDs while migration
of OSD, due to change in backing store, is pending. This
will not work if the updated OSD ceph version does not support
the currently applied backing store. So rook will skip any
OSD upgrade if the OSD migration is pending.

Signed-off-by: sp98 <sapillai@redhat.com>
@sp98 sp98 force-pushed the skip-osd-uprade-during-migration branch from 30a0532 to 322549c Compare May 28, 2024 05:57
Copy link
Member

@BlaineEXE BlaineEXE left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These changes look good.

While I have larger-scale thoughts that the OSD code is quite complicated and could be improved with some code tidying, that isn't really an issue that this PR can solve.

@sp98 sp98 merged commit f5ca09e into rook:master May 29, 2024
52 of 53 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants