Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix: gracefully close unused workers #30512

Closed

Conversation

NoamGaash
Copy link
Contributor

This patch suggest a fixing issue #30504 by gracefully terminating the worker before exiting the process

This patch fixes issue microsoft#30504 by gracefully terminating the worker before exiting the process

This comment has been minimized.

Copy link
Contributor

Test results for "tests 1"

27476 passed, 672 skipped
✔️✔️✔️

Merge workflow run.

Copy link
Contributor

@dgozman dgozman left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for the PR! Unfortunately, we cannot remove the process.exit() call after some timeout, because graceful termination may misbehave and leave zombie processes.

There should be a different fix for this, most likely the one that waits for graceful termination in the regular worker shutdown from the dispatcher. However, I'll have to experiment with it first to figure out the right fix. We'll look into this for the v1.45 release.

@NoamGaash
Copy link
Contributor Author

NoamGaash commented May 5, 2024

Investigation notes:

  • the dispatcher calls worker.stop() because _isWorkerRedundant(worker) evaluates as true.
  • the worker is considered to be redundant because isWorkerRedundant sees that the _queuedOrRunningHashCount is zero - there are no queued or running jobs, as the worker teardown doesn't count as a job

I'm a little stuck with this investigation, as I can't really figure out how exactly the worker-scoped fixtures are registered.

@dgozman
Copy link
Contributor

dgozman commented May 15, 2024

@NoamGaash I spent some time on the issue, and the fix is not that straightforward. Therefore, I went ahead and prepared a PR myself - #30769. Thank you for the PR and investigation!

@NoamGaash
Copy link
Contributor Author

@dgozman Thank you so very much! It was a real blocker, and I'm so glad for your help over here. Also - it's a great learning opportunity. Do you mind if I'll ask a little to get a better understanding? I'm sure I'll have future opportunities to contribute, and it's inspiring to see the clean code and architecture.

As you said yourself, graceful termination may leave zombie processes, therefore I thought the right approach to solve this issue would be investigating why it's called in the first place. The root cause for calling the worker termination was the _isWorkerRedundant method that iterates all worker slots, and sees whether any of them is occupied with the task assigned to the current worker.

It seems like slot.worker.didSendStop() of the _isWorkerRedundant method evaluates to be true, so I thought the real problem lays somewhere inside the test runner architecture and there's some stop command being sent when the last test is executed. That's why I'm surprised to see your solution includes conditioning the force exit - is it a temporary solution, or that's the "right thing to do"?

Thanks again, both for responding and solving this issue so quickly and for Playwright as whole 😺

@NoamGaash NoamGaash closed this May 16, 2024
@dgozman
Copy link
Contributor

dgozman commented May 16, 2024

@NoamGaash There is a difference between normal operation worker stop, and the case where something went wrong. So my change assumes that during normal worker stop, triggered by _isWorkerRedundant or dispatcher.stop(), worker teardown behaves. However, if something went wrong, upon Ctrl+C we'll disconnect and worker will force exit.

@NoamGaash
Copy link
Contributor Author

I see. Thanks for this clarification!

@NoamGaash NoamGaash deleted the fix/gracefully-close-unused-workers branch May 18, 2024 08:41
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants