Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Make cassandra.partition-size-for-batch-select work expectedly #21940

Open
wants to merge 2 commits into
base: master
Choose a base branch
from

Conversation

abicky
Copy link

@abicky abicky commented May 12, 2024

Description

This PR fixes cassandra.partition-size-for-batch-select behavior that uses one more partition than expected when executing CQLs.

Additional context and related issues

Steps to reproduce

Firstly, prepare a Cassandra table and records:

$ cqlsh
Connected to Test Cluster at 127.0.0.1:9042
[cqlsh 6.1.0 | Cassandra 4.1.3 | CQL spec 3.4.6 | Native protocol v5]
Use HELP for help.
cqlsh> CREATE TABLE test_cassandra_split_manager_keyspace.single_partition_key_column_table (partition_key int, clustering_key text, PRIMARY KEY(partition_key, clustering_key));
cqlsh> INSERT INTO test_cassandra_split_manager_keyspace.single_partition_key_column_table (partition_key, clustering_key) VALUES (0, '0');
cqlsh> INSERT INTO test_cassandra_split_manager_keyspace.single_partition_key_column_table (partition_key, clustering_key) VALUES (1, '1');
cqlsh> INSERT INTO test_cassandra_split_manager_keyspace.single_partition_key_column_table (partition_key, clustering_key) VALUES (2, '2');

Then, launch trino-esrver-dev with the following changes:

diff --git a/testing/trino-server-dev/etc/catalog/cassandra.properties b/testing/trino-server-dev/etc/catalog/cassandra.properties
index d62b8a86cd4..1fb8208b60f 100644
--- a/testing/trino-server-dev/etc/catalog/cassandra.properties
+++ b/testing/trino-server-dev/etc/catalog/cassandra.properties
@@ -3,3 +3,4 @@ connector.name=cassandra
 cassandra.contact-points=localhost
 cassandra.allow-drop-table=true
 cassandra.load-policy.dc-aware.local-dc=datacenter1
+cassandra.partition-size-for-batch-select=2
diff --git a/testing/trino-server-dev/etc/log.properties b/testing/trino-server-dev/etc/log.properties
index b615d661c74..8bec119a802 100644
--- a/testing/trino-server-dev/etc/log.properties
+++ b/testing/trino-server-dev/etc/log.properties
@@ -6,6 +6,7 @@
 #
 
 io.trino=INFO
+io.trino.plugin.cassandra.CassandraSession=DEBUG
 
 # show classpath for plugins
 io.trino.server.PluginManager=DEBUG

After the server launches, you can reproduce the bug by executing the following Trino query:

select * from cassandra.test_cassandra_split_manager_keyspace.single_partition_key_column_table where partition_key in (0, 1, 2);

Here is an output from the server, and, as you can see in the log, the CQL Trino executed includes three partitions even though we set cassandra.partition-size-for-batch-select to 2.

2024-05-12T21:33:20.427+0900	DEBUG	Query-20240512_123318_00000_vupj4-174	io.trino.plugin.cassandra.CassandraSession	Execute cql for partition keys with IN clauses: SELECT DISTINCT partition_key FROM test_cassandra_split_manager_keyspace.single_partition_key_column_table WHERE partition_key IN (0,1,2)
2024-05-12T21:33:21.003+0900	DEBUG	SplitRunner-20240512_123318_00000_vupj4.0.0.0-1-201	io.trino.plugin.cassandra.CassandraSession	Execute cql: SELECT partition_key,clustering_key FROM test_cassandra_split_manager_keyspace.single_partition_key_column_table WHERE "partition_key" in (0,1,2)

The expected output should be as follows (I got the log when I checked the behavior of this PR):

2024-05-12T21:45:20.869+0900	DEBUG	Query-20240512_124519_00000_gkm5v-172	io.trino.plugin.cassandra.CassandraSession	Execute cql for partition keys with IN clauses: SELECT DISTINCT partition_key FROM test_cassandra_split_manager_keyspace.single_partition_key_column_table WHERE partition_key IN (0,1,2)
2024-05-12T21:45:21.441+0900	DEBUG	SplitRunner-20240512_124519_00000_gkm5v.0.0.0-1-201	io.trino.plugin.cassandra.CassandraSession	Execute cql: SELECT partition_key,clustering_key FROM test_cassandra_split_manager_keyspace.single_partition_key_column_table WHERE "partition_key" in (0,1)
2024-05-12T21:45:21.441+0900	DEBUG	SplitRunner-20240512_124519_00000_gkm5v.0.0.0-2-202	io.trino.plugin.cassandra.CassandraSession	Execute cql: SELECT partition_key,clustering_key FROM test_cassandra_split_manager_keyspace.single_partition_key_column_table WHERE "partition_key" in (2)

Release notes

( ) This is not user-visible or is docs only, and no release notes are required.
( ) Release notes are required. Please propose a release note for me.
(x) Release notes are required, with the following suggested text:

# Cassandra connector
* Fix `cassandra.partition-size-for-batch-select` behavior that uses one more partition than expected when executing CQLs (#21940)

This commit fixes the bug where a single select for a single
partition key column table can include one more partition than
expected.
Copy link

cla-bot bot commented May 12, 2024

Thank you for your pull request and welcome to the Trino community. We require contributors to sign our Contributor License Agreement, and we don't seem to have you on file. Continue to work with us on the review and improvements in this PR, and submit the signed CLA to cla@trino.io. Photos, scans, or digitally-signed PDF files are all suitable. Processing may take a few days. The CLA needs to be on file before we merge your changes. For more information, see https://github.com/trinodb/cla

@abicky
Copy link
Author

abicky commented May 12, 2024

I submitted the signed CLA at 10:23 UTC on May 12.

Copy link

cla-bot bot commented May 13, 2024

Thank you for your pull request and welcome to the Trino community. We require contributors to sign our Contributor License Agreement, and we don't seem to have you on file. Continue to work with us on the review and improvements in this PR, and submit the signed CLA to cla@trino.io. Photos, scans, or digitally-signed PDF files are all suitable. Processing may take a few days. The CLA needs to be on file before we merge your changes. For more information, see https://github.com/trinodb/cla

@abicky
Copy link
Author

abicky commented May 17, 2024

@ebyhr @mayankvadariya Thank you for reviewing this pull request! Is there anything else I have to do? I'm not sure what I can do to make verification/cla-signed pass because I've already submitted the CLA.

@ebyhr
Copy link
Member

ebyhr commented May 17, 2024

Registering CLA is a manual process and usually happens per 2 weeks.

@martint Can you handle @abicky's CLA registration?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants