Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[CI] DockerTests test500Readiness failing #108523

Closed
joegallo opened this issue May 10, 2024 · 7 comments · Fixed by #108681
Closed

[CI] DockerTests test500Readiness failing #108523

joegallo opened this issue May 10, 2024 · 7 comments · Fixed by #108681
Assignees
Labels
:Core/Infra/Node Lifecycle Node startup, bootstrapping, and shutdown :Delivery/Packaging RPM and deb packaging, tar and zip archives, shell and batch scripts medium-risk An open issue or test failure that is a medium risk to future releases Team:Core/Infra Meta label for core/infra team Team:Delivery Meta label for Delivery team >test-failure Triaged test failures from CI

Comments

@joegallo
Copy link
Contributor

Build scan:
https://gradle-enterprise.elastic.co/s/jfhv4jdyalmjm/tests/:qa:packaging:destructiveDistroTest.default-docker/org.elasticsearch.packaging.test.DockerTests/test500Readiness

Reproduction line:

null

Applicable branches:
main

Reproduces locally?:
Didn't try

Failure history:
Failure dashboard for org.elasticsearch.packaging.test.DockerTests#test500Readiness

Failure excerpt:

java.lang.AssertionError: (No message provided)

  at __randomizedtesting.SeedInfo.seed([C08005FCAB509171:10D51055CE402A98]:0)
  at org.junit.Assert.fail(Assert.java:87)
  at org.junit.Assert.assertTrue(Assert.java:42)
  at org.junit.Assert.assertTrue(Assert.java:53)
  at org.elasticsearch.packaging.test.DockerTests.test500Readiness(DockerTests.java:1222)
  at jdk.internal.reflect.DirectMethodHandleAccessor.invoke(DirectMethodHandleAccessor.java:103)
  at java.lang.reflect.Method.invoke(Method.java:580)
  at com.carrotsearch.randomizedtesting.RandomizedRunner.invoke(RandomizedRunner.java:1758)
  at com.carrotsearch.randomizedtesting.RandomizedRunner$8.evaluate(RandomizedRunner.java:946)
  at com.carrotsearch.randomizedtesting.RandomizedRunner$9.evaluate(RandomizedRunner.java:982)
  at com.carrotsearch.randomizedtesting.RandomizedRunner$10.evaluate(RandomizedRunner.java:996)
  at org.junit.rules.TestWatcher$1.evaluate(TestWatcher.java:61)
  at org.junit.rules.TestWatcher$1.evaluate(TestWatcher.java:61)
  at com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36)
  at com.carrotsearch.randomizedtesting.ThreadLeakControl$StatementRunner.run(ThreadLeakControl.java:390)
  at com.carrotsearch.randomizedtesting.ThreadLeakControl.lambda$forkTimeoutingTask$0(ThreadLeakControl.java:850)
  at java.lang.Thread.run(Thread.java:1583)

@joegallo joegallo added :Delivery/Packaging RPM and deb packaging, tar and zip archives, shell and batch scripts >test-failure Triaged test failures from CI labels May 10, 2024
@elasticsearchmachine elasticsearchmachine added Team:Delivery Meta label for Delivery team needs:risk Requires assignment of a risk label (low, medium, blocker) labels May 10, 2024
@elasticsearchmachine
Copy link
Collaborator

Pinging @elastic/es-delivery (Team:Delivery)

@joegallo
Copy link
Contributor Author

Some other failures

@joegallo
Copy link
Contributor Author

joegallo commented May 10, 2024

It got me on #108518 and @benwtrent on #108522 and #107047. And @ldematte on #108521.

mark-vieira added a commit that referenced this issue May 11, 2024
@mark-vieira mark-vieira added the :Core/Infra/Node Lifecycle Node startup, bootstrapping, and shutdown label May 11, 2024
@elasticsearchmachine
Copy link
Collaborator

Pinging @elastic/es-core-infra (Team:Core/Infra)

@elasticsearchmachine elasticsearchmachine added the Team:Core/Infra Meta label for core/infra team label May 11, 2024
@mark-vieira
Copy link
Contributor

This test is related to the readiness probe so adding the core/infra label to this. Not sure if any changes have been made in that area lately. Also muting for now.

@mark-vieira
Copy link
Contributor

@rjernst As far as I can tell the test waits for ES to report green, then attempts to connect on the readiness port and fails. Is there any scenario in which the node would report as green but the readiness port would not be listening? Is there some kind of race condition here?

@mark-vieira mark-vieira added medium-risk An open issue or test failure that is a medium risk to future releases and removed needs:risk Requires assignment of a risk label (low, medium, blocker) labels May 13, 2024
@rjernst
Copy link
Member

rjernst commented May 13, 2024

Readiness is orthogonal to health. It waits for 2 conditions to be met:

  • A master node is elected
  • File settings have been applied

If the cluster is green, a master node should be there. So that leaves file settings, but this test doesn't use file settings, so there shouldn't be any waiting.

I'm attempting to gather more information but dumping the ES log file before we fail the test:
#108587

@rjernst rjernst self-assigned this May 14, 2024
elasticsearchmachine pushed a commit that referenced this issue May 15, 2024
rjernst added a commit to rjernst/elasticsearch that referenced this issue May 15, 2024
Previously readiness waited only on a master node being elected.
Recently it was also made to wait on file settings being applied. Yet
the node may be fully started before those file settings are applied.
The test expected readiness was ok after the node finishes starting.

This commit retries the readiness check until it succeeds since
readiness state will be updated async to the node finishing starting.

closes elastic#108523
rjernst added a commit that referenced this issue May 15, 2024
Previously readiness waited only on a master node being elected.
Recently it was also made to wait on file settings being applied. Yet
the node may be fully started before those file settings are applied.
The test expected readiness was ok after the node finishes starting.

This commit retries the readiness check until it succeeds since
readiness state will be updated async to the node finishing starting.

closes #108523
parkertimmins pushed a commit to parkertimmins/elasticsearch that referenced this issue May 17, 2024
Previously readiness waited only on a master node being elected.
Recently it was also made to wait on file settings being applied. Yet
the node may be fully started before those file settings are applied.
The test expected readiness was ok after the node finishes starting.

This commit retries the readiness check until it succeeds since
readiness state will be updated async to the node finishing starting.

closes elastic#108523
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
:Core/Infra/Node Lifecycle Node startup, bootstrapping, and shutdown :Delivery/Packaging RPM and deb packaging, tar and zip archives, shell and batch scripts medium-risk An open issue or test failure that is a medium risk to future releases Team:Core/Infra Meta label for core/infra team Team:Delivery Meta label for Delivery team >test-failure Triaged test failures from CI
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants