Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Scaling System Test Logging Improvements. #6435

Open
co-jo opened this issue Nov 19, 2021 · 0 comments · May be fixed by #6445
Open

Scaling System Test Logging Improvements. #6435

co-jo opened this issue Nov 19, 2021 · 0 comments · May be fixed by #6445

Comments

@co-jo
Copy link
Member

co-jo commented Nov 19, 2021

Is your feature request related to a problem? Please describe.

When debugging an issue with the logs produced during the various system tests that involve the scaling of deployments, it can be confusing trying to keep track of the particular set of pods that are active during any given scaling operation. Having this information would make it easier to map what the relevant set of logs are during any given period in a test.

Furthermore, the scaling feature request by things such as the PravegaSegmentStoreK8sService depends on a waitUntilPodIsRunning call provided by K8sClient. One downside to the current implementation is that it only waits for a particular number of running pods to become active, but does not wait for the terminated pods to be removed. This can lead us to be unaware of times where resources are failing to be cleaned up and can have downstream effects on later system tests.

Another issue is that there is no bound on the number of tries the system test will take on waiting for the resource, but instead relies on timing out on the testing framework end. This can greatly increase the total amount of time it takes for a single system test deployment to take.

Describe the solution you'd like

  • Add some logs to list the active set of pods for the particular resource after a scale event.
  • Make the waitUntilPodIsRunning call wait for both the expected number of running pods to be active, as well as ensuring all non-running pods have been removed.
  • Implement some bound on the number of retries to take.
@co-jo co-jo linked a pull request Nov 23, 2021 that will close this issue
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

1 participant