Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

scylla-nodetool: add tablet support for ring command #18608

Merged

Conversation

denesb
Copy link
Contributor

@denesb denesb commented May 10, 2024

Currently, invoking nodetool ring on a tablet keyspace fails with an error, because it doesn't pass the required table parameter to /storage_service/ownership/{keyspace}. Further to this, the command will currently always output the vnode ring, regardless of the keyspace and table parameter. This series fixes this, adding tablet support to /storage_service/tokens_endpoint, which will now return the tablet ring (tablet token -> tablet primary replica mapping) if the new keyspace and table parameters are provided.
nodetool status also gets a touch-up, to provide the tablet ring's token count (the tablet count) when invoked with a tablet keyspace and table.

Fixes: #17889
Fixes: #18474

  • ** native-nodetool is new functionality, no backport is needed **

@denesb denesb requested a review from tchaikov May 10, 2024 07:22
@denesb denesb self-assigned this May 10, 2024
@denesb denesb requested a review from tgrabiec as a code owner May 10, 2024 07:22
@github-actions github-actions bot added tests/dtest status/release blocker Preventing from a release to be promoted labels May 10, 2024
Copy link
Contributor

@tchaikov tchaikov left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

apart from a typo lgtm

test/nodetool/test_status.py Outdated Show resolved Hide resolved
@denesb denesb force-pushed the scylla-nodetool-ring-tablets branch from 04b652a to 6d159bf Compare May 10, 2024 09:00
@denesb
Copy link
Contributor Author

denesb commented May 10, 2024

Update:

  • fixed "uknown" typo.

@scylladb-promoter
Copy link
Contributor

🟢 CI State: SUCCESS

✅ - Build
✅ - Unit Tests Custom
The following new/updated tests ran 100 times for each mode:
🔹 nodetool/test_ring
🔹 nodetool/test_status
🔹 nodetool/test_tablehistograms
🔹 nodetool/test_toppartitions
✅ - Container Test
✅ - dtest with topology changes
✅ - Unit Tests
✅ - dtest

Build Details:

  • Duration: 2 hr 59 min
  • Builder: spider1.cloudius-systems.com

The primary replica is an arbitrary replica of the tablet's, which is
considered to tbe the "main" owner of the tablet, similar to how
replicas own tokens in the vnode world.
To avoid aliasing the primary replicas with a certain DC or rack,
primary replicas are rotated among the tablet's replicas, selecting
tablet_id % replica_count as the primary replica.
The tablet variant of the existing get_token_to_endpoint_map(), which
returns a list of tablet tokens and the primary replica for each.
…ndpoint

Add a keyspace and cf parameter. When specified, the endpoint will
return token -> primary replica mapping for the table's tablet tokens,
not the vnodes.
Add a table parameter. Pass both keyspace and table (when provided) to
the /storage_service/tokens_endpoint API endpoint, so that the returned
(and printed) token ring is that of the table's tablets, not the vnode
ring.
Also pass the table param to the ownership API, which will complain if
this param is missing for a tablet keyspace.
Currently, the token count column is always based on the vnodes, which
makes no sense for tablet keyspaces. If a tablet keyspace is provided as
the keyspace argument, don't print the vnode token count. If the user
provided a table argument as well, print the tablet count, otherwise
print "?".
After the recent fixes 4 tests started failing with the java nodetool
implementation. We are about to ditch the java implementation, but until
we actually do, it is valuable to keep the tests passing with both the
native and java implementation.
So in this patch, these tests are fixed to pass with the java
implementation too.
There is one test, test_help.py, which fails only if run together with
all the tests. I couldn't confirm this 100%, but it seems like this is
due to JMX sending a rouge request on some timer, which happens to hit
this test. I don't think this is worth trying to fix.
@denesb
Copy link
Contributor Author

denesb commented May 13, 2024

Update:

  • Added yield to tablet loop
  • Expanded documentation of the changed API

@denesb denesb force-pushed the scylla-nodetool-ring-tablets branch from 6d159bf to a77796f Compare May 13, 2024 11:24
@scylladb-promoter
Copy link
Contributor

🟢 CI State: SUCCESS

✅ - Build
✅ - Unit Tests Custom
The following new/updated tests ran 100 times for each mode:
🔹 nodetool/test_ring
🔹 nodetool/test_status
🔹 nodetool/test_tablehistograms
🔹 nodetool/test_toppartitions
✅ - Container Test
✅ - dtest with topology changes
✅ - dtest
✅ - Unit Tests

Build Details:

  • Duration: 4 hr 10 min
  • Builder: i-04018ab96c5ec4251 (m5d.8xlarge)

@denesb
Copy link
Contributor Author

denesb commented May 15, 2024

@scylladb/scylla-maint ping.

@@ -357,6 +357,9 @@ public:
/// \throws std::logic_error If the given id does not belong to this instance.
dht::token_range get_token_range(tablet_id id) const;

/// Returns the primary replica for the tablet
host_id get_primary_replica(tablet_id id) const;

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@asias @bhalevy we should use this for repair. Does it need to be integrated into effective_replication_map?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@asias @bhalevy we should use this for repair. Does it need to be integrated into effective_replication_map?

you mean to have such a function for vnodes?
i don't think it's necesaary since tablet repair has it's own path that can diverge from vnodes.
Also, for repair, we also need a variant of this function (or an optional parameter to it, to select a primary replica within a given dc)

@scylladb-promoter scylladb-promoter merged commit a5fea84 into scylladb:master May 15, 2024
10 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
6 participants