Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

crimson/osd: bring support for EC, including recovery #57395

Open
wants to merge 71 commits into
base: main
Choose a base branch
from

Conversation

rzarzynski
Copy link
Contributor

Contribution Guidelines

  • To sign and title your commits, please refer to Submitting Patches to Ceph.

  • If you are submitting a fix for a stable branch (e.g. "quincy"), please refer to Submitting Patches to Ceph - Backports for the proper workflow.

  • When filling out the below checklist, you may click boxes directly in the GitHub web UI. When entering or editing the entire PR message in the GitHub web UI editor, you may also select a checklist item by adding an x between the brackets: [x]. Spaces and capitalization matter when checking off items this way.

Checklist

  • Tracker (select at least one)
    • References tracker ticket
    • Very recent bug; references commit where it was introduced
    • New feature (ticket optional)
    • Doc update (no ticket needed)
    • Code cleanup (no ticket needed)
  • Component impact
    • Affects Dashboard, opened tracker ticket
    • Affects Orchestrator, opened tracker ticket
    • No impact that needs to be tracked
  • Documentation (select at least one)
    • Updates relevant documentation
    • No doc update is appropriate
  • Tests (select at least one)
Show available Jenkins commands
  • jenkins retest this please
  • jenkins test classic perf
  • jenkins test crimson perf
  • jenkins test signed
  • jenkins test make check
  • jenkins test make check arm64
  • jenkins test submodules
  • jenkins test dashboard
  • jenkins test dashboard cephadm
  • jenkins test api
  • jenkins test docs
  • jenkins render docs
  • jenkins test ceph-volume all
  • jenkins test ceph-volume tox
  • jenkins test windows
  • jenkins test rook e2e

@rzarzynski rzarzynski requested review from a team as code owners May 10, 2024 14:43
Copy link

This pull request can no longer be automatically merged: a rebase is needed and changes have to be manually resolved

@rzarzynski rzarzynski force-pushed the wip-wip-crimson-ec-with-recovery branch from 6159ace to 2e14e06 Compare May 13, 2024 14:37
@rzarzynski rzarzynski force-pushed the wip-wip-crimson-ec-with-recovery branch 10 times, most recently from 7916fbe to 88a0e05 Compare May 16, 2024 12:05
The classical OSD never uses the the `encode_no_oid()`
and `decode_no_oid()` variants of object info's encoding
machinery. crimson departed from this to such an extent
on recovery it sends `OI_ATTR` with meaningless `soid`-
related bytes. The net result is the following crash in
an heterogenous cluster:

```
    -7> 2023-08-22T14:36:57.621+0200 7f25113c8700  0 osd.1 pg_epoch: 62 pg[1.0( v 42'43 lc 39'37 (0'0,42'43] local-lis/les=45/58 n=4 ec=6/6 lis/c=45/6 les/c/f=58/8/0 sis=45) [1] r=0 lpr=53 pi=[6,45)/1 crt=42'43 lcod 0'0 mlcod 0'0 active+recovering rops=2 m=2 mbc={255={(0+1)=2}}] get_object_context:11843: soid.pool=1
    -6> 2023-08-22T14:36:57.621+0200 7f25113c8700 10 osd.1 pg_epoch: 62 pg[1.0( v 42'43 lc 39'37 (0'0,42'43] local-lis/les=45/58 n=4 ec=6/6 lis/c=45/6 les/c/f=58/8/0 sis=45) [1] r=0 lpr=53 pi=[6,45)/1 crt=42'43 lcod 0'0 mlcod 0'0 active+recovering rops=2 m=2 mbc={255={(0+1)=2}}] get_object_context: obc NOT found in cache: 1:30306672:devicehealth::main.db.0000000000000000:head
    -5> 2023-08-22T14:36:57.621+0200 7f25113c8700  0 osd.1 pg_epoch: 62 pg[1.0( v 42'43 lc 39'37 (0'0,42'43] local-lis/les=45/58 n=4 ec=6/6 lis/c=45/6 les/c/f=58/8/0 sis=45) [1] r=0 lpr=53 pi=[6,45)/1 crt=42'43 lcod 0'0 mlcod 0'0 active+recovering rops=2 m=2 mbc={255={(0+1)=2}}] get_object_context:11865 bv came from OI_ATTR
    -4> 2023-08-22T14:36:57.621+0200 7f25113c8700  0 osd.1 pg_epoch: 62 pg[1.0( v 42'43 lc 39'37 (0'0,42'43] local-lis/les=45/58 n=4 ec=6/6 lis/c=45/6 les/c/f=58/8/0 sis=45) [1] r=0 lpr=53 pi=[6,45)/1 crt=42'43 lcod 0'0 mlcod 0'0 active+recovering rops=2 m=2 mbc={255={(0+1)=2}}] get_object_context:11897 just after bv decode oi.soid.pool=-9223372036854775808
    -3> 2023-08-22T14:36:57.621+0200 7f25113c8700  0 osd.1 pg_epoch: 62 pg[1.0( v 42'43 lc 39'37 (0'0,42'43] local-lis/les=45/58 n=4 ec=6/6 lis/c=45/6 les/c/f=58/8/0 sis=45) [1] r=0 lpr=53 pi=[6,45)/1 crt=42'43 lcod 0'0 mlcod 0'0 active+recovering rops=2 m=2 mbc={255={(0+1)=2}}] soid.pool=-9223372036854775808 info.pgid.pool()=1
    -2> 2023-08-22T14:36:57.641+0200 7f25113c8700 -1 ../src/osd/PrimaryLogPG.cc: In function 'ObjectContextRef PrimaryLogPG::get_object_context(const hobject_t&, bool, const std::map<std::__cxx11::basic_string<char>, ceph::buffer::v15_2_0::list, std::less<void> >*)' thread 7f25113c8700 time 2023-08-22T14:36:57.625282+0200
../src/osd/PrimaryLogPG.cc: 11906: FAILED ceph_assert(oi.soid.pool == (int64_t)info.pgid.pool())

 ceph version 18.0.0-4293-g6cd888f2315 (6cd888f) reef (dev)
 1: (ceph::__ceph_assert_fail(char const*, char const*, int, char const*)+0x127) [0x5602381712da]
 2: (ceph::register_assert_context(ceph::common::CephContext*)+0) [0x560238171505]
 3: (PrimaryLogPG::get_object_context(hobject_t const&, bool, std::map<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, ceph::buffer::v15_2_0::list, std::less<void>, std::allocator<std::pair<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const, ceph::buffer::v15_2_0::list> > > const*)+0xc9e) [0x560237b5aec6]
 4: (non-virtual thunk to PrimaryLogPG::get_obc(hobject_t const&, std::map<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, ceph::buffer::v15_2_0::list, std::less<void>, std::allocator<std::pair<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const, ceph::buffer::v15_2_0::list> > > const&)+0x30) [0x560237c2f9d5]
 5: (ReplicatedBackend::handle_pull_response(pg_shard_t, PushOp const&, PullOp*, std::__cxx11::list<ReplicatedBackend::pull_complete_info, std::allocator<ReplicatedBackend::pull_complete_info> >*, ceph::os::Transaction*)+0x5dc) [0x560237e9d116]
 6: (ReplicatedBackend::_do_pull_response(boost::intrusive_ptr<OpRequest>)+0x477) [0x560237e9e6d5]
 7: (ReplicatedBackend::_handle_message(boost::intrusive_ptr<OpRequest>)+0x1e1) [0x560237e9f2ef]
 8: (PGBackend::handle_message(boost::intrusive_ptr<OpRequest>)+0x59) [0x560237c3f3bb]
 9: (PrimaryLogPG::do_request(boost::intrusive_ptr<OpRequest>&, ThreadPool::TPHandle&)+0x93f) [0x560237bb7bdd]
 10: (OSD::dequeue_op(boost::intrusive_ptr<PG>, boost::intrusive_ptr<OpRequest>, ThreadPool::TPHandle&)+0x531) [0x5602379dcb41]
 11: (ceph::osd::scheduler::PGRecoveryMsg::run(OSD*, OSDShard*, boost::intrusive_ptr<PG>&, ThreadPool::TPHandle&)+0x22c) [0x560237d5b8f8]
 12: (OSD::ShardedOpWQ::_process(unsigned int, ceph::heartbeat_handle_d*)+0x3386) [0x5602379ff9f4]
 13: (ShardedThreadPool::shardedthreadpool_worker(unsigned int)+0x979) [0x56023815f9c3]
 14: (ShardedThreadPool::WorkThreadSharded::entry()+0x17) [0x560238163279]
 15: (Thread::entry_wrapper()+0x43) [0x56023814c0af]
 16: (Thread::_entry_func(void*)+0xd) [0x56023814c0cb]
 17: /lib/x86_64-linux-gnu/libpthread.so.0(+0x8609) [0x7f2531844609]
 18: clone()
```

Fixes: https://tracker.ceph.com/issues/62526
Signed-off-by: Radosław Zarzyński <rzarzyns@redhat.com>
Signed-off-by: Radosław Zarzyński <rzarzyns@redhat.com>
Signed-off-by: Radosław Zarzyński <rzarzyns@redhat.com>
Signed-off-by: Radosław Zarzyński <rzarzyns@redhat.com>
Signed-off-by: Radosław Zarzyński <rzarzyns@redhat.com>
Signed-off-by: Radosław Zarzyński <rzarzyns@redhat.com>
It's worth a note that `ReplicatedBackend` (derivate of `PGBackend`)
already holds `pg_shard_t`, so information is duplicated.
As the `ECBackend` starts needs `pg_shard_t` as well, let's rework.

Signed-off-by: Radosław Zarzyński <rzarzyns@redhat.com>
The reason behind this change is to avoid mixing responsibilities.
Although in the classical OSD `ECBackend` handles virually everything
related to handling e.g.

Signed-off-by: Radosław Zarzyński <rzarzyns@redhat.com>
Signed-off-by: Radosław Zarzyński <rzarzyns@redhat.com>
Signed-off-by: Radosław Zarzyński <rzarzyns@redhat.com>
Signed-off-by: Radoslaw Zarzynski <rzarzyns@redhat.com>
Signed-off-by: Radoslaw Zarzynski <rzarzyns@redhat.com>
Signed-off-by: Radoslaw Zarzynski <rzarzyns@redhat.com>
Signed-off-by: Radoslaw Zarzynski <rzarzyns@redhat.com>
This bypasses the crimson-msgr for local ops.

Signed-off-by: Radoslaw Zarzynski <rzarzyns@redhat.com>
Signed-off-by: Radoslaw Zarzynski <rzarzyns@redhat.com>
```
/home/rzarzynski/ceph2/src/crimson/osd/shard_services.cc:595:9:   required from here
/home/rzarzynski/ceph2/src/crimson/osd/shard_services.cc:596:10: warning: moving a temporary object prevents copy elision [-Wpessimizing-move]
  596 |     auto [pool, name, ec_profile] = std::move(std::get<0>(ret).get0());
      |          ^~~~~~~~~~~~~~~~~~~~~~~~
/home/rzarzynski/ceph2/src/crimson/osd/shard_services.cc:596:10: note: remove ‘std::move’ call
/home/rzarzynski/ceph2/src/crimson/osd/shard_services.cc:597:10: warning: moving a temporary object prevents copy elision [-Wpessimizing-move]
  597 |     auto coll = std::move(std::get<1>(ret).get0());
      |          ^~~~
/home/rzarzynski/ceph2/src/crimson/osd/shard_services.cc:597:10: note: remove ‘std::move’ call
```

Signed-off-by: Radoslaw Zarzynski <rzarzyns@redhat.com>
My understanding at the time of writing is this change is a pure
refactoring.

Signed-off-by: Radoslaw Zarzynski <rzarzyns@redhat.com>
Before this commit we were doing something like:

1. initialize `at_version` with PG::projected_last_update`
   **incremented by one**.
2. produce a log entry at such version.
3. increment `at_version` for the sake of a further production
   that may never come.

The problem is `osd_op_params::at_version` is higher by one
than the last log entry which hurts at later stages of
`osd_op_params` processing (I was hit in the shared EC code
by the assertion in `PG::op_applied`).

This patch changes the algorithm to:

A. initialize `at_version` with `PG::projected_last_update`
B. increment `at_version` for the sake of the very next production.
C. produce a log entry at this version.

Signed-off-by: Radoslaw Zarzynski <rzarzyns@redhat.com>
Signed-off-by: Radoslaw Zarzynski <rzarzyns@redhat.com>
It's for the sake of integrating ECRecoveryBackend with
crimson.

Signed-off-by: Radoslaw Zarzynski <rzarzyns@redhat.com>
Signed-off-by: Radoslaw Zarzynski <rzarzyns@redhat.com>
Signed-off-by: Radoslaw Zarzynski <rzarzyns@redhat.com>
Signed-off-by: Radoslaw Zarzynski <rzarzyns@redhat.com>
The EC recovery backend needs `inc_osd_stat_repaired()`.

Signed-off-by: Radoslaw Zarzynski <rzarzyns@redhat.com>
This loosens the coupling between loading and decoding making
the latter reusable with metadata coming from other source than
local object store.

Signed-off-by: Radoslaw Zarzynski <rzarzyns@redhat.com>
…_obc()

Signed-off-by: Radoslaw Zarzynski <rzarzyns@redhat.com>
Signed-off-by: Radoslaw Zarzynski <rzarzyns@redhat.com>
Signed-off-by: Radoslaw Zarzynski <rzarzyns@redhat.com>
Signed-off-by: Radoslaw Zarzynski <rzarzyns@redhat.com>
Signed-off-by: Radoslaw Zarzynski <rzarzyns@redhat.com>
Signed-off-by: Radoslaw Zarzynski <rzarzyns@redhat.com>
Signed-off-by: Radoslaw Zarzynski <rzarzyns@redhat.com>
Signed-off-by: Radoslaw Zarzynski <rzarzyns@redhat.com>
Signed-off-by: Radoslaw Zarzynski <rzarzyns@redhat.com>
It's just code movement; there is no changes apart that.

Signed-off-by: Radoslaw Zarzynski <rzarzyns@redhat.com>
Signed-off-by: Radoslaw Zarzynski <rzarzyns@redhat.com>
Signed-off-by: Radoslaw Zarzynski <rzarzyns@redhat.com>
@rzarzynski rzarzynski force-pushed the wip-wip-crimson-ec-with-recovery branch from 88a0e05 to 9b14128 Compare May 17, 2024 14:27
@Matan-B
Copy link
Contributor

Matan-B commented May 20, 2024

Can we enable some basic (librados) EC unit tests for sanity?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
2 participants