Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

CI Failure (leader count on shard (2, 0) (3) is < 4) in AutomaticLeadershipBalancingTest.test_automatic_rebalance #17150

Open
vbotbuildovich opened this issue Mar 16, 2024 · 33 comments · May be fixed by #18497
Assignees
Labels
area/replication auto-triaged used to know which issues have been opened from a CI job ci-failure ci-rca/redpanda CI Root Cause Analysis - Redpanda Issue

Comments

@vbotbuildovich
Copy link
Collaborator

vbotbuildovich commented Mar 16, 2024

https://buildkite.com/redpanda/vtools/builds/12307

Module: rptest.tests.leadership_transfer_test
Class: AutomaticLeadershipBalancingTest
Method: test_automatic_rebalance
test_id:    AutomaticLeadershipBalancingTest.test_automatic_rebalance
status:     FAIL
run time:   91.740 seconds

AssertionError('leader count on shard (2, 0) (3) is < 4')
Traceback (most recent call last):
  File "/usr/local/lib/python3.10/dist-packages/ducktape/tests/runner_client.py", line 184, in _do_run
    data = self.run_test()
  File "/usr/local/lib/python3.10/dist-packages/ducktape/tests/runner_client.py", line 269, in run_test
    return self.test_context.function(self.test)
  File "/home/ubuntu/redpanda/tests/rptest/services/cluster.py", line 104, in wrapped
    r = f(self, *args, **kwargs)
  File "/home/ubuntu/redpanda/tests/rptest/tests/leadership_transfer_test.py", line 324, in test_automatic_rebalance
    assert count >= expected_min, \
AssertionError: leader count on shard (2, 0) (3) is < 4

JIRA Link: CORE-1883

@vbotbuildovich vbotbuildovich added auto-triaged used to know which issues have been opened from a CI job ci-failure labels Mar 16, 2024
@vbotbuildovich
Copy link
Collaborator Author

@michael-redpanda michael-redpanda changed the title CI Failure (key symptom) in AutomaticLeadershipBalancingTest.test_automatic_rebalance CI Failure (leader count on shard (2, 0) (3) is < 4) in AutomaticLeadershipBalancingTest.test_automatic_rebalance Mar 19, 2024
@vbotbuildovich
Copy link
Collaborator Author

@vbotbuildovich
Copy link
Collaborator Author

@vbotbuildovich
Copy link
Collaborator Author

1 similar comment
@vbotbuildovich
Copy link
Collaborator Author

@vbotbuildovich
Copy link
Collaborator Author

@vbotbuildovich
Copy link
Collaborator Author

@vbotbuildovich
Copy link
Collaborator Author

@vbotbuildovich
Copy link
Collaborator Author

@vbotbuildovich
Copy link
Collaborator Author

@ztlpn ztlpn added the ci-rca/redpanda CI Root Cause Analysis - Redpanda Issue label Apr 23, 2024
@ztlpn
Copy link
Contributor

ztlpn commented Apr 23, 2024

The test is failing because a newly restarted node is sending an incomplete health report (some partitions haven't started yet). I guess we can give newly restarted nodes a grace period and mute them for a bit before transferring leadership there (a good idea for other reasons as well).

@vbotbuildovich
Copy link
Collaborator Author

@vbotbuildovich
Copy link
Collaborator Author

@vbotbuildovich
Copy link
Collaborator Author

@vbotbuildovich
Copy link
Collaborator Author

@vbotbuildovich
Copy link
Collaborator Author

@vbotbuildovich
Copy link
Collaborator Author

@vbotbuildovich
Copy link
Collaborator Author

@vbotbuildovich
Copy link
Collaborator Author

@vbotbuildovich
Copy link
Collaborator Author

@vbotbuildovich
Copy link
Collaborator Author

@vbotbuildovich
Copy link
Collaborator Author

ztlpn added a commit to ztlpn/redpanda that referenced this issue May 15, 2024
@ztlpn ztlpn linked a pull request May 15, 2024 that will close this issue
7 tasks
@vbotbuildovich
Copy link
Collaborator Author

ztlpn added a commit to ztlpn/redpanda that referenced this issue May 16, 2024
@vbotbuildovich
Copy link
Collaborator Author

@vbotbuildovich
Copy link
Collaborator Author

@vbotbuildovich
Copy link
Collaborator Author

@vbotbuildovich
Copy link
Collaborator Author

@vbotbuildovich
Copy link
Collaborator Author

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area/replication auto-triaged used to know which issues have been opened from a CI job ci-failure ci-rca/redpanda CI Root Cause Analysis - Redpanda Issue
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants