Fix test runner --fail-fast test flake #51783
Merged
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
I am working on a couple of flakes that have been bugging me for a while.
This one is testing that
--fail-fast
forces an early return and doesn't run all the tests when running tests in parallel.Example of a failed test run
I've ran the test in a loop to test the variability.
Running the current version 50 times, I got these results:
{"3"=>1, "5"=>36, "4"=>8, "6"=>5}
The keys are the number of tests that ran and the value is the number of times we got that result in the outer loop (I ran the test 50 times). I've inserted the script I've used to get these numbers at the end.
A short sleep eliminates all variability:
{"4"=>50}
Sleeping in a test is not ideal. It is slow and it is not a true fix, but in this particular case I believe it could make sense:
1- The test is already not perfect. It seems prone to race condition / flake
2- Benchmark show negligible impact on test performance
3- A true fix would be quite complex
4- It's not a new pattern in the code base (Rails)
Since we eliminated variability, we can lower down the number of tests from 10 to 4. Making it, in some instances, even faster than the original test!
I ran a benchmark with 4 versions:
1- "Current test" (10 tests with no sleep)
2- "With sleep" (10 tests with sleep)
3- "Short" (4 tests)
4- "Short with sleep" (4 tests with sleep)
Here are the results:
Here are the results on variability:
Benchmark Script
Here's the (rough) benchmark script I used. It's a test in the same file as the original test.
Checklist
Before submitting the PR make sure the following are checked:
[Fix #issue-number]