[FEATURE] handle auto-retries for requests failed before sending the message to the receiver - reqwest #4366

kashif-m · 2024-04-16T06:02:59Z

Feature Description

We intermittently face Connection reset by peer (os error 104) in sandbox while calling the connector APIs. This probably happens due to reqwest's hyper using stale connections - @Narayanbhat166

The APIs fail intermittently and does not follow a pattern. This makes it an inconvenience for the HyperSwitch consumers as all they can see is 500 errors with the message "Something went wrong". This is not ideal and steps can be taken to mitigate this scenario.

First steps would be probably to let the consumer know that this failed at the client's end (HyperSwitch being the client, and underlying connector being the server), and such errors are retryable. Once these kinds of errors are propogated to the consumer, they will be able to take a decision on whether or not to retry the failed operation.

Second step would be providing a config for controlling maximum number of retries that can be made on failed requests. As requests can be failed due to a lot of different reasons, it's important to consider retries only for the requests which had failed before completion of sending the full request. Such scenarios are listed below -

ChannelClosed - sender's channel was closed
BodyWriteAborted - sending message body was aborted
Connect - encompasses all the errors which were caused due to client's connection, i.e. any errors occurred while connecting

Possible Implementation

Propagate retryable errors to the consumer

This is straightforward - make sure the correct error is returned from handle_response here

Add retry functionality for connector API requests

This can be achieved by using agains retry_if. It lets us retry any Future based on the conditions provided
Preferred number of retries per connector API can be configured by consumers, which can be read from merchant_account and used while calling connector's API

Sample code

let retry_policy = RetryPolicy::fixed(Duration::from_millis(100))
    .with_max_retries(10)
    .with_jitter(true);

retry_policy.retry_if(
    || async {
        metrics_request::record_operation_time(
            async {
                request
                    .send()
                    .await
                    .attach_printable("Unable to send request to connector")
            },
            &metrics::EXTERNAL_REQUEST_TIME,
            &[metrics_tag.clone()],
        )
        .await
    },
    |err: &reqwest::Error| err.is_closed() || err.is_body_write_aborted() || err.is_connect(),
);

Have you spent some time checking if this feature request has been raised before?

I checked and didn't find a similar issue

Have you read the Contributing Guidelines?

I have read the Contributing Guidelines

Are you willing to submit a PR?

None

The text was updated successfully, but these errors were encountered:

kashif-m added C-feature Category: Feature request or enhancement S-awaiting-triage Status: New issues that have not been assessed yet labels Apr 16, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[FEATURE] handle auto-retries for requests failed before sending the message to the receiver - reqwest #4366

[FEATURE] handle auto-retries for requests failed before sending the message to the receiver - reqwest #4366

kashif-m commented Apr 16, 2024 •

edited

[FEATURE] handle auto-retries for requests failed before sending the message to the receiver - reqwest #4366

[FEATURE] handle auto-retries for requests failed before sending the message to the receiver - reqwest #4366

Comments

kashif-m commented Apr 16, 2024 • edited

Feature Description

Possible Implementation

Have you spent some time checking if this feature request has been raised before?

Have you read the Contributing Guidelines?

Are you willing to submit a PR?

kashif-m commented Apr 16, 2024 •

edited