Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Return the actual error from unary RPCs when SendMsg() returns io.EOF #7230

Open
ash2k opened this issue May 13, 2024 · 8 comments
Open

Return the actual error from unary RPCs when SendMsg() returns io.EOF #7230

ash2k opened this issue May 13, 2024 · 8 comments

Comments

@ash2k
Copy link
Contributor

ash2k commented May 13, 2024

Use case(s) - what problem will this feature solve?

When a unary RPC is called, this code gets executed:

grpc-go/call.go

Lines 65 to 74 in 59954c8

func invoke(ctx context.Context, method string, req, reply any, cc *ClientConn, opts ...CallOption) error {
cs, err := newClientStream(ctx, unaryStreamDesc, cc, method, opts...)
if err != nil {
return err
}
if err := cs.SendMsg(req); err != nil {
return err
}
return cs.RecvMsg(reply)
}

SendMsg() docs say this:

grpc-go/stream.go

Lines 112 to 115 in 59954c8

// SendMsg is generally called by generated code. On error, SendMsg aborts
// the stream. If the error was generated by the client, the status is
// returned directly; otherwise, io.EOF is returned and the status of
// the stream may be discovered using RecvMsg.

I have a test for something unrelated where the server immediately returns an error. Very rarely, depending on the timing my test gets EOF instead of the error that the server returns. This is because server is occasionally faster than the client. It occurred to me that this is an issue for all unary RPCs - there is no way to get the actual error as the stream is not exposed to the user.

Proposed Solution

Check for io.EOF from SendMsg() and call RecvMsg() to get the actual error, return it.

Alternatives Considered

Leave it as is. This might be a requirement to be consistent with other implementations (I have not checked). Perhaps gRPC spec needs to be adjusted/clarified to change the behavior in this case.

Additional Context

gRPC v1.63.2.
Go 1.22.3

@ash2k ash2k added the Type: Feature New features or improvements in behavior label May 13, 2024
@dfawley dfawley added Type: Bug Status: Help Wanted and removed Type: Feature New features or improvements in behavior labels May 13, 2024
@dfawley
Copy link
Member

dfawley commented May 13, 2024

Yes, I think this is just an oversight in our code here. If SendMsg returns io.EOF, we should call RecvMsg and use the error there. I believe it's possible for the RPC to end successfully even if the message can't be sent, so if RecvMsg returns nil, then the RPC should end with a success. There should definitely be a test for this. The server handler needs to be a streaming RPC handler that immediately ends the RPC (both success & failure should be tested). I'm not sure if we can inject something into the connection to make sends from the client to the server slower in order to stimulate the race more reliably, but that's one possible option.

@ash2k
Copy link
Contributor Author

ash2k commented May 17, 2024

To be clear, I don't have bandwidth to work on the fix myself.

@dfawley
Copy link
Member

dfawley commented May 17, 2024

No problem! This is rare but probably important enough that we should prioritize this internally. Also the fix is probably quite easy.

@dfawley dfawley added the P1 label May 17, 2024
@Kailun2047
Copy link
Contributor

@dfawley Hi, is it possible for me pick this one up? Looks like we need to get the following done (please correct me if I'm wrong):

  1. create a test that can reliably reproduce the scenario where SendMsg returns io.EOF (server closes connection before client actually sends?) and the client doesn't get the expected actual error

  2. implement the proposed solution and make sure it make the test pass

@dfawley
Copy link
Member

dfawley commented May 17, 2024

@Kailun2047 Sure, that sounds good to me, as long as you think you can work on it pretty soon.

For the repro, I wonder if we can reproduce it by making a large request, and configure the server with a very small initial window size. That way the client won't be able to send the full request as long as the server handler treats the method as streaming. That should make it 100% reproducible, and be fairly simple to do.

@Kailun2047
Copy link
Contributor

@dfawley Got it. Will try to delve in and make an update here within a day or two.

@Kailun2047
Copy link
Contributor

Kailun2047 commented May 20, 2024

@dfawley Hi, I tried creating a test per earlier discussion to reproduce but had no luck 😅 On StubServer, SendMsg always succeeds even when the request payload size is increased to the max allowable (4194304).

But upon some initial inspection, looks like the EOF reported might not originate from SendMsg. Concretely, for invocation of an unary RPC,

Instead, it appears to me that the source of the EOF might be a different issue in RecvMsg. Concretely,

  • recvMsg might return EOF to indicate end of stream

  • the upper level RecvMsg gets the EOF returned and calls finish with it. In finish we convert EOF to nil, but the conversion isn't likely to be reflected in Recv because the original error is passed by value instead of by reference. Therefore it's possible for RecvMsg to return EOF.

I'm definitely willing to put more investigation into this, but let me know if you are concerned about the timing and need to prioritize internally 😄

@dfawley
Copy link
Member

dfawley commented May 21, 2024

Oh interesting. I thought it was something simpler, but you're right - some of the streaming code actually checks the type of the RPC and adjusts its behavior accordingly. If you're able to get to the bottom of this, we would definitely appreciate it. No worries on the timing at all - I don't think it's as urgent if it's a pretty rare occurrence.

@dfawley dfawley added P2 and removed P1 P2 labels May 21, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants