New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
sql/logictest: TestParallel failed #123965
Comments
@andrewbaptist do we expect this form of ambiguous error? |
I don't think we do. Let me take a closer look at where this "failure to combine" error is coming from. |
I understand how this can happen now, and I was worried about this exact scenario when I was fixing #123146. Specifically the scenario of the "client", "proxy node" and "meta2" having 3 different views of the world can cause this problem to happen. I thought about a reasonably easy fix to this but didn't implement it as part of that PR as it was a little larger than I would have liked. The correct solution is to have 2 guarantees:
The way the code is structured, we almost have both these guarantees, but they are not airtight right now. This should be a very rare case where we are hitting these scenarios, but based on the fact that this test is failing, it needs to be stronger. Let me take a look at how easy it will be to tighten this up. |
@arulajmani - We should chat about this bug. As I’m looking through this issue, I’m realizing I might have made a mistake with the way proxying is done on the proxy node. Specifically there are 3 parts of the proxy code, and it is the third one where there is an issue:
For the So there are a couple options:
I prefer the first, but I need to be careful to ensure I handle things like Context cancellation and other errors that can happen. I don’t think the refactor will be very big and it would help reduce some of the complexity of the already complex |
Previously a proxy request was handled using the same logic within sendPartialBatch and sendToReplicas. There were short circuits to handle the different cases of retries, but this was still incorrect in some places. Instead this change intercepts proxy request at the start of Send and creates and sends to the transport directly. This greatly simplifies the code for proxy requests and additionally fixes complex cases where the routing information changes. Epic: none Fixes: cockroachdb#123965 Informs: cockroachdb#123146 Release note: None
Previously a proxy request was handled using the same logic within sendPartialBatch and sendToReplicas. There were short circuits to handle the different cases of retries, but this was still incorrect in some places. Instead this change intercepts proxy request at the start of Send and creates and sends to the transport directly. This greatly simplifies the code for proxy requests and additionally fixes complex cases where the routing information changes. Epic: none Fixes: cockroachdb#123965 Informs: cockroachdb#123146 Release note: None
Previously in DistSender, a proxy request was handled using the same logic within sendPartialBatch and sendToReplicas. There were short circuits to handle the different cases of retries, but in cases with changing range boundaries, it could return the wrong error. This change intercepts proxy request at the start of DistSender.Send and sends to the transport bypassing the rest of the DistSender retry logic. This simplifies the code for proxy requests. Epic: none Fixes: cockroachdb#123965 Informs: cockroachdb#123146 Release note: None
Previously in DistSender, a proxy request was handled using the same logic within sendPartialBatch and sendToReplicas. There were short circuits to handle the different cases of retries, but in cases with changing range boundaries, it could return the wrong error. This change intercepts proxy request at the start of DistSender.Send and sends to the transport bypassing the rest of the DistSender retry logic. This simplifies the code for proxy requests. Epic: none Fixes: cockroachdb#123965 Informs: cockroachdb#123146 Release note: None
sql/logictest.TestParallel failed with artifacts on master @ fd581e5f2458cf3585fefbf74a90c77d6710ea44:
Help
See also: How To Investigate a Go Test Failure (internal)
This test on roachdash | Improve this report!
Jira issue: CRDB-38630
The text was updated successfully, but these errors were encountered: