New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
crimson/osd: bring support for EC, including recovery #57395
Open
rzarzynski
wants to merge
71
commits into
ceph:main
Choose a base branch
from
rzarzynski:wip-wip-crimson-ec-with-recovery
base: main
Could not load branches
Branch not found: {{ refName }}
Could not load tags
Nothing to show
Are you sure you want to change the base?
Some commits from the old base branch may be removed from the timeline,
and old review comments may become outdated.
Open
crimson/osd: bring support for EC, including recovery #57395
rzarzynski
wants to merge
71
commits into
ceph:main
from
rzarzynski:wip-wip-crimson-ec-with-recovery
Conversation
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This pull request can no longer be automatically merged: a rebase is needed and changes have to be manually resolved |
rzarzynski
force-pushed
the
wip-wip-crimson-ec-with-recovery
branch
from
May 13, 2024 14:37
6159ace
to
2e14e06
Compare
rzarzynski
force-pushed
the
wip-wip-crimson-ec-with-recovery
branch
10 times, most recently
from
May 16, 2024 12:05
7916fbe
to
88a0e05
Compare
The classical OSD never uses the the `encode_no_oid()` and `decode_no_oid()` variants of object info's encoding machinery. crimson departed from this to such an extent on recovery it sends `OI_ATTR` with meaningless `soid`- related bytes. The net result is the following crash in an heterogenous cluster: ``` -7> 2023-08-22T14:36:57.621+0200 7f25113c8700 0 osd.1 pg_epoch: 62 pg[1.0( v 42'43 lc 39'37 (0'0,42'43] local-lis/les=45/58 n=4 ec=6/6 lis/c=45/6 les/c/f=58/8/0 sis=45) [1] r=0 lpr=53 pi=[6,45)/1 crt=42'43 lcod 0'0 mlcod 0'0 active+recovering rops=2 m=2 mbc={255={(0+1)=2}}] get_object_context:11843: soid.pool=1 -6> 2023-08-22T14:36:57.621+0200 7f25113c8700 10 osd.1 pg_epoch: 62 pg[1.0( v 42'43 lc 39'37 (0'0,42'43] local-lis/les=45/58 n=4 ec=6/6 lis/c=45/6 les/c/f=58/8/0 sis=45) [1] r=0 lpr=53 pi=[6,45)/1 crt=42'43 lcod 0'0 mlcod 0'0 active+recovering rops=2 m=2 mbc={255={(0+1)=2}}] get_object_context: obc NOT found in cache: 1:30306672:devicehealth::main.db.0000000000000000:head -5> 2023-08-22T14:36:57.621+0200 7f25113c8700 0 osd.1 pg_epoch: 62 pg[1.0( v 42'43 lc 39'37 (0'0,42'43] local-lis/les=45/58 n=4 ec=6/6 lis/c=45/6 les/c/f=58/8/0 sis=45) [1] r=0 lpr=53 pi=[6,45)/1 crt=42'43 lcod 0'0 mlcod 0'0 active+recovering rops=2 m=2 mbc={255={(0+1)=2}}] get_object_context:11865 bv came from OI_ATTR -4> 2023-08-22T14:36:57.621+0200 7f25113c8700 0 osd.1 pg_epoch: 62 pg[1.0( v 42'43 lc 39'37 (0'0,42'43] local-lis/les=45/58 n=4 ec=6/6 lis/c=45/6 les/c/f=58/8/0 sis=45) [1] r=0 lpr=53 pi=[6,45)/1 crt=42'43 lcod 0'0 mlcod 0'0 active+recovering rops=2 m=2 mbc={255={(0+1)=2}}] get_object_context:11897 just after bv decode oi.soid.pool=-9223372036854775808 -3> 2023-08-22T14:36:57.621+0200 7f25113c8700 0 osd.1 pg_epoch: 62 pg[1.0( v 42'43 lc 39'37 (0'0,42'43] local-lis/les=45/58 n=4 ec=6/6 lis/c=45/6 les/c/f=58/8/0 sis=45) [1] r=0 lpr=53 pi=[6,45)/1 crt=42'43 lcod 0'0 mlcod 0'0 active+recovering rops=2 m=2 mbc={255={(0+1)=2}}] soid.pool=-9223372036854775808 info.pgid.pool()=1 -2> 2023-08-22T14:36:57.641+0200 7f25113c8700 -1 ../src/osd/PrimaryLogPG.cc: In function 'ObjectContextRef PrimaryLogPG::get_object_context(const hobject_t&, bool, const std::map<std::__cxx11::basic_string<char>, ceph::buffer::v15_2_0::list, std::less<void> >*)' thread 7f25113c8700 time 2023-08-22T14:36:57.625282+0200 ../src/osd/PrimaryLogPG.cc: 11906: FAILED ceph_assert(oi.soid.pool == (int64_t)info.pgid.pool()) ceph version 18.0.0-4293-g6cd888f2315 (6cd888f) reef (dev) 1: (ceph::__ceph_assert_fail(char const*, char const*, int, char const*)+0x127) [0x5602381712da] 2: (ceph::register_assert_context(ceph::common::CephContext*)+0) [0x560238171505] 3: (PrimaryLogPG::get_object_context(hobject_t const&, bool, std::map<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, ceph::buffer::v15_2_0::list, std::less<void>, std::allocator<std::pair<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const, ceph::buffer::v15_2_0::list> > > const*)+0xc9e) [0x560237b5aec6] 4: (non-virtual thunk to PrimaryLogPG::get_obc(hobject_t const&, std::map<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, ceph::buffer::v15_2_0::list, std::less<void>, std::allocator<std::pair<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const, ceph::buffer::v15_2_0::list> > > const&)+0x30) [0x560237c2f9d5] 5: (ReplicatedBackend::handle_pull_response(pg_shard_t, PushOp const&, PullOp*, std::__cxx11::list<ReplicatedBackend::pull_complete_info, std::allocator<ReplicatedBackend::pull_complete_info> >*, ceph::os::Transaction*)+0x5dc) [0x560237e9d116] 6: (ReplicatedBackend::_do_pull_response(boost::intrusive_ptr<OpRequest>)+0x477) [0x560237e9e6d5] 7: (ReplicatedBackend::_handle_message(boost::intrusive_ptr<OpRequest>)+0x1e1) [0x560237e9f2ef] 8: (PGBackend::handle_message(boost::intrusive_ptr<OpRequest>)+0x59) [0x560237c3f3bb] 9: (PrimaryLogPG::do_request(boost::intrusive_ptr<OpRequest>&, ThreadPool::TPHandle&)+0x93f) [0x560237bb7bdd] 10: (OSD::dequeue_op(boost::intrusive_ptr<PG>, boost::intrusive_ptr<OpRequest>, ThreadPool::TPHandle&)+0x531) [0x5602379dcb41] 11: (ceph::osd::scheduler::PGRecoveryMsg::run(OSD*, OSDShard*, boost::intrusive_ptr<PG>&, ThreadPool::TPHandle&)+0x22c) [0x560237d5b8f8] 12: (OSD::ShardedOpWQ::_process(unsigned int, ceph::heartbeat_handle_d*)+0x3386) [0x5602379ff9f4] 13: (ShardedThreadPool::shardedthreadpool_worker(unsigned int)+0x979) [0x56023815f9c3] 14: (ShardedThreadPool::WorkThreadSharded::entry()+0x17) [0x560238163279] 15: (Thread::entry_wrapper()+0x43) [0x56023814c0af] 16: (Thread::_entry_func(void*)+0xd) [0x56023814c0cb] 17: /lib/x86_64-linux-gnu/libpthread.so.0(+0x8609) [0x7f2531844609] 18: clone() ``` Fixes: https://tracker.ceph.com/issues/62526 Signed-off-by: Radosław Zarzyński <rzarzyns@redhat.com>
Signed-off-by: Radosław Zarzyński <rzarzyns@redhat.com>
Signed-off-by: Radosław Zarzyński <rzarzyns@redhat.com>
Signed-off-by: Radosław Zarzyński <rzarzyns@redhat.com>
Signed-off-by: Radosław Zarzyński <rzarzyns@redhat.com>
Signed-off-by: Radosław Zarzyński <rzarzyns@redhat.com>
It's worth a note that `ReplicatedBackend` (derivate of `PGBackend`) already holds `pg_shard_t`, so information is duplicated. As the `ECBackend` starts needs `pg_shard_t` as well, let's rework. Signed-off-by: Radosław Zarzyński <rzarzyns@redhat.com>
The reason behind this change is to avoid mixing responsibilities. Although in the classical OSD `ECBackend` handles virually everything related to handling e.g. Signed-off-by: Radosław Zarzyński <rzarzyns@redhat.com>
Signed-off-by: Radosław Zarzyński <rzarzyns@redhat.com>
Signed-off-by: Radosław Zarzyński <rzarzyns@redhat.com>
Signed-off-by: Radoslaw Zarzynski <rzarzyns@redhat.com>
Signed-off-by: Radoslaw Zarzynski <rzarzyns@redhat.com>
Signed-off-by: Radoslaw Zarzynski <rzarzyns@redhat.com>
Signed-off-by: Radoslaw Zarzynski <rzarzyns@redhat.com>
This bypasses the crimson-msgr for local ops. Signed-off-by: Radoslaw Zarzynski <rzarzyns@redhat.com>
Signed-off-by: Radoslaw Zarzynski <rzarzyns@redhat.com>
``` /home/rzarzynski/ceph2/src/crimson/osd/shard_services.cc:595:9: required from here /home/rzarzynski/ceph2/src/crimson/osd/shard_services.cc:596:10: warning: moving a temporary object prevents copy elision [-Wpessimizing-move] 596 | auto [pool, name, ec_profile] = std::move(std::get<0>(ret).get0()); | ^~~~~~~~~~~~~~~~~~~~~~~~ /home/rzarzynski/ceph2/src/crimson/osd/shard_services.cc:596:10: note: remove ‘std::move’ call /home/rzarzynski/ceph2/src/crimson/osd/shard_services.cc:597:10: warning: moving a temporary object prevents copy elision [-Wpessimizing-move] 597 | auto coll = std::move(std::get<1>(ret).get0()); | ^~~~ /home/rzarzynski/ceph2/src/crimson/osd/shard_services.cc:597:10: note: remove ‘std::move’ call ``` Signed-off-by: Radoslaw Zarzynski <rzarzyns@redhat.com>
My understanding at the time of writing is this change is a pure refactoring. Signed-off-by: Radoslaw Zarzynski <rzarzyns@redhat.com>
Before this commit we were doing something like: 1. initialize `at_version` with PG::projected_last_update` **incremented by one**. 2. produce a log entry at such version. 3. increment `at_version` for the sake of a further production that may never come. The problem is `osd_op_params::at_version` is higher by one than the last log entry which hurts at later stages of `osd_op_params` processing (I was hit in the shared EC code by the assertion in `PG::op_applied`). This patch changes the algorithm to: A. initialize `at_version` with `PG::projected_last_update` B. increment `at_version` for the sake of the very next production. C. produce a log entry at this version. Signed-off-by: Radoslaw Zarzynski <rzarzyns@redhat.com>
Signed-off-by: Radoslaw Zarzynski <rzarzyns@redhat.com>
It's for the sake of integrating ECRecoveryBackend with crimson. Signed-off-by: Radoslaw Zarzynski <rzarzyns@redhat.com>
Signed-off-by: Radoslaw Zarzynski <rzarzyns@redhat.com>
Signed-off-by: Radoslaw Zarzynski <rzarzyns@redhat.com>
Signed-off-by: Radoslaw Zarzynski <rzarzyns@redhat.com>
The EC recovery backend needs `inc_osd_stat_repaired()`. Signed-off-by: Radoslaw Zarzynski <rzarzyns@redhat.com>
This loosens the coupling between loading and decoding making the latter reusable with metadata coming from other source than local object store. Signed-off-by: Radoslaw Zarzynski <rzarzyns@redhat.com>
…_obc() Signed-off-by: Radoslaw Zarzynski <rzarzyns@redhat.com>
Signed-off-by: Radoslaw Zarzynski <rzarzyns@redhat.com>
Signed-off-by: Radoslaw Zarzynski <rzarzyns@redhat.com>
Signed-off-by: Radoslaw Zarzynski <rzarzyns@redhat.com>
Signed-off-by: Radoslaw Zarzynski <rzarzyns@redhat.com>
Signed-off-by: Radoslaw Zarzynski <rzarzyns@redhat.com>
Signed-off-by: Radoslaw Zarzynski <rzarzyns@redhat.com>
Signed-off-by: Radoslaw Zarzynski <rzarzyns@redhat.com>
Signed-off-by: Radoslaw Zarzynski <rzarzyns@redhat.com>
It's just code movement; there is no changes apart that. Signed-off-by: Radoslaw Zarzynski <rzarzyns@redhat.com>
Signed-off-by: Radoslaw Zarzynski <rzarzyns@redhat.com>
Signed-off-by: Radoslaw Zarzynski <rzarzyns@redhat.com>
rzarzynski
force-pushed
the
wip-wip-crimson-ec-with-recovery
branch
from
May 17, 2024 14:27
88a0e05
to
9b14128
Compare
Can we enable some basic (librados) EC unit tests for sanity? |
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Contribution Guidelines
To sign and title your commits, please refer to Submitting Patches to Ceph.
If you are submitting a fix for a stable branch (e.g. "quincy"), please refer to Submitting Patches to Ceph - Backports for the proper workflow.
When filling out the below checklist, you may click boxes directly in the GitHub web UI. When entering or editing the entire PR message in the GitHub web UI editor, you may also select a checklist item by adding an
x
between the brackets:[x]
. Spaces and capitalization matter when checking off items this way.Checklist
Show available Jenkins commands
jenkins retest this please
jenkins test classic perf
jenkins test crimson perf
jenkins test signed
jenkins test make check
jenkins test make check arm64
jenkins test submodules
jenkins test dashboard
jenkins test dashboard cephadm
jenkins test api
jenkins test docs
jenkins render docs
jenkins test ceph-volume all
jenkins test ceph-volume tox
jenkins test windows
jenkins test rook e2e