Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[FEATURE] handle auto-retries for requests failed before sending the message to the receiver - reqwest #4366

Open
2 tasks done
kashif-m opened this issue Apr 16, 2024 · 0 comments
Labels
C-feature Category: Feature request or enhancement S-awaiting-triage Status: New issues that have not been assessed yet

Comments

@kashif-m
Copy link
Contributor

kashif-m commented Apr 16, 2024

Feature Description

We intermittently face Connection reset by peer (os error 104) in sandbox while calling the connector APIs. This probably happens due to reqwest's hyper using stale connections - @Narayanbhat166

The APIs fail intermittently and does not follow a pattern. This makes it an inconvenience for the HyperSwitch consumers as all they can see is 500 errors with the message "Something went wrong". This is not ideal and steps can be taken to mitigate this scenario.

First steps would be probably to let the consumer know that this failed at the client's end (HyperSwitch being the client, and underlying connector being the server), and such errors are retryable. Once these kinds of errors are propogated to the consumer, they will be able to take a decision on whether or not to retry the failed operation.

Second step would be providing a config for controlling maximum number of retries that can be made on failed requests. As requests can be failed due to a lot of different reasons, it's important to consider retries only for the requests which had failed before completion of sending the full request. Such scenarios are listed below -

  • ChannelClosed - sender's channel was closed
  • BodyWriteAborted - sending message body was aborted
  • Connect - encompasses all the errors which were caused due to client's connection, i.e. any errors occurred while connecting

Possible Implementation

Propagate retryable errors to the consumer

  • This is straightforward - make sure the correct error is returned from handle_response here

Add retry functionality for connector API requests

  • This can be achieved by using agains retry_if. It lets us retry any Future based on the conditions provided
  • Preferred number of retries per connector API can be configured by consumers, which can be read from merchant_account and used while calling connector's API

Sample code

let retry_policy = RetryPolicy::fixed(Duration::from_millis(100))
    .with_max_retries(10)
    .with_jitter(true);

retry_policy.retry_if(
    || async {
        metrics_request::record_operation_time(
            async {
                request
                    .send()
                    .await
                    .attach_printable("Unable to send request to connector")
            },
            &metrics::EXTERNAL_REQUEST_TIME,
            &[metrics_tag.clone()],
        )
        .await
    },
    |err: &reqwest::Error| err.is_closed() || err.is_body_write_aborted() || err.is_connect(),
);

Have you spent some time checking if this feature request has been raised before?

  • I checked and didn't find a similar issue

Have you read the Contributing Guidelines?

Are you willing to submit a PR?

None

@kashif-m kashif-m added C-feature Category: Feature request or enhancement S-awaiting-triage Status: New issues that have not been assessed yet labels Apr 16, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
C-feature Category: Feature request or enhancement S-awaiting-triage Status: New issues that have not been assessed yet
Projects
None yet
Development

No branches or pull requests

1 participant