Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[DST] Data Inconsistency: Keys missing and counts varying when queried over time #22348

Open
1 task done
shamanthchandra-yb opened this issue May 10, 2024 · 0 comments
Open
1 task done
Assignees
Labels

Comments

@shamanthchandra-yb
Copy link

shamanthchandra-yb commented May 10, 2024

Jira Link: DB-11255

Description

UPD: There seems to be inconsistent data when queried at Yugabyte source itself.

Here's what's happening:

  • We are running the SqlDataLoad workload (only Inserts) on different tables for a few minutes, in iterations.
  • Verify the CDC streaming, by verifying count() in Yugabyte vs count() in CDC sink
  • There are nemesis happening in parallel.

Observation:

  • Even after stopping the workload, the count() results fluctuate and seems data loss.
  • The test logs show inconsistent count() outputs.
  • We have live universe we queried at two different times and observed that some keys present the first time were missing the second time, and vice versa.

Profile (15)

AssertionError: Change is the same for 30 minutes test_cdc_44a24e yb=80213 sink=80220. Replication stopped at 2024-05-09T07:43:46.562820

Source connector version

fourpointfour/ybdb-debezium:0.6

Connector configuration

add connector connector_name='ybconnector_cdc_9112e0_test_cdc_a3adee' stream_id='rs_cdc_9112e0_757a' db_name='cdc_9112e0' connector_host='172.151.24.246' table_list=['test_cdc_a3adee'] {'name': 'ybconnector_cdc_9112e0_test_cdc_a3adee', 'config': {'database.master.addresses': '172.151.17.57:7100,172.151.19.232:7100,172.151.22.169:7100', 'database.hostname': '172.151.17.57:5433,172.151.19.232:5433,172.151.22.169:5433', 'database.port': 5433, 'database.masterhost': '172.151.17.57', 'database.masterport': '7100', 'database.user': 'yugabyte', 'database.password': 'yugabyte', 'database.dbname': 'cdc_9112e0', 'snapshot.mode': 'initial', 'admin.operation.timeout.ms': 600000, 'socket.read.timeout.ms': 300000, 'max.connector.retries': '10', 'operation.timeout.ms': 600000, 'topic.creation.default.compression.type': 'lz4', 'topic.creation.default.cleanup.policy': 'delete', 'topic.creation.default.partitions': 2, 'topic.creation.default.replication.factor': '1', 'tasks.max': '5', 'connector.class': 'io.debezium.connector.postgresql.PostgresConnector', 'topic.prefix': 'ybconnector_cdc_9112e0_test_cdc_a3adee', 'plugin.name': 'pgoutput', 'slot.name': 'rs_cdc_9112e0_757a_from_con', 'publication.autocreate.mode': 'filtered', 'publication.name': 'pn_ybconnector_cdc_9112e0_test_cdc_a3adee', 'table.include.list': 'public.test_cdc_a3adee'}}

YugabyteDB version

2024.1.0.0-b123

Issue Type

kind/bug

Warning: Please confirm that this issue does not contain any sensitive information

  • I confirm this issue does not contain any sensitive information.
@yugabyte-ci yugabyte-ci added kind/bug This issue is a bug priority/medium Medium priority issue labels May 10, 2024
@yugabyte-ci yugabyte-ci added priority/highest Highest priority issue 2024.1_blocker and removed priority/medium Medium priority issue labels May 18, 2024
@shamanthchandra-yb shamanthchandra-yb changed the title [CDCSDK] [PG Parity] One of the run without tablet splitting and nemesis, saw more rows in PG sink than YB source [DST] Data Inconsistency: Keys missing and counts varying when queried over time May 18, 2024
@yugabyte-ci yugabyte-ci assigned es1024 and unassigned asrinivasanyb May 20, 2024
@yugabyte-ci yugabyte-ci added 2024.1.1_blocker and removed status/awaiting-triage Issue awaiting triage labels May 20, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

5 participants