Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Kuma mesh v2.4.2 timeout policy does not get applied #10104

Open
shafiquegavandi opened this issue Apr 28, 2024 · 4 comments
Open

Kuma mesh v2.4.2 timeout policy does not get applied #10104

shafiquegavandi opened this issue Apr 28, 2024 · 4 comments
Assignees
Labels
kind/bug A bug triage/needs-information Reviewed and some extra information was asked to the reporter

Comments

@shafiquegavandi
Copy link

What happened?

We have kuma mesh v2.4.2 integrated recently. After integration, our web app is resulting with "steam timeout" error after 5s. We expect the page to take around 40s to process the request. Tried implementing the Timeout policy with higher connectionTimeout value, but does not seems its taking effect. It always timeouts after 5s. Not seeing any errors in control plane logs or gateway logs.

I tried using MeshTimeout kind, and it does helps with app timeout issue after changing connectionTimeout value to 45s, but we observed that pods which are part of the mesh getting restarted. Looking at the control plane log, it is shows following error :
xds-server.dataplance-sync-watchdog onTick failed for the mesh.. error "invalid memory address or nil pointer dereference. "
I see the documentation states, the MeshTimeout is Beta for v2.4.2 but not sure how to get just default Timeout policy working.

We have feature to integrate latest version in next quarter, but we wanted to get existing setup working till then. Is there anything we can do to fix/work around the issue or what may be possibly causing this issue ? And why default Timeout policy is not taking effect ?

@shafiquegavandi shafiquegavandi added kind/bug A bug triage/pending This issue will be looked at on the next triage meeting labels Apr 28, 2024
@jakubdyszkiewicz
Copy link
Contributor

Triage: can you post more logs on the nil pointer dereference?
Can you post MeshTimeout policy?
Did you check if policy is applied by inspecting config dump?
I don't think "connectionTimeout" is the setting you are looking for. The client probably connects really fast, but processing takes time, so you need to adjust other value. Take a look here https://github.com/chemicL/envoy-timeouts for more info/examples on timeouts.

@jakubdyszkiewicz jakubdyszkiewicz added triage/needs-information Reviewed and some extra information was asked to the reporter and removed triage/pending This issue will be looked at on the next triage meeting labels Apr 29, 2024
@shafiquegavandi
Copy link
Author

Hi @jakubdyszkiewicz , the logs are not reveling much info. Here is the sample log line. It keeps triggering every second. its just flooding control plane logs with these logs for each of the application that are part of the mesh.

2024-04-30T07:12:29.927Z ERROR xds-server.dataplane-sync-watchdog OnTick() failed {"dataplaneKey" : "my-mesh", "Name": "myapp-559bb222-2ddfs.ns"}, "error": "runtime error : invalid memory address or nil pointer dereference"}

Here is the policy :

apiVersion: kuma.io/v1alpha1
kind: MeshTimeout
metadata:
name: my-mesh-timeout
namespace: kong-mesh-system
labels:
kuma.io/mesh: my-mesh
spec:
targetRef:
kind: MeshGateway
name: my-mesh-gateway
to:

  • targetRef:
    kind: Mesh
    default:
    idleTimeout: 60s
    connectionTimeout: 40s
    http:
    requestTimeout: 40s
    streamIdleTimeout: 1h

I do see the timeout settings getting set when I try to dump the configs.

i will have a look at the link you send and try few of those configs.

Thanks for looking.

@jakubdyszkiewicz jakubdyszkiewicz added triage/needs-reproducing Someone else should try to reproduce this and removed triage/needs-information Reviewed and some extra information was asked to the reporter labels May 6, 2024
@jakubdyszkiewicz
Copy link
Contributor

Triage: thanks for the example. We'll try to reproduce it on 2.4.2

@jakubdyszkiewicz jakubdyszkiewicz assigned slonka and Automaat and unassigned slonka May 6, 2024
@Automaat
Copy link
Contributor

I was able to reproduce this on kuma 2.4.2 version. I think this was fixed in 2.5.x version of Kuma since this policy work on this version and newer versions.

@shafiquegavandi could you try updating Kuma to newer version to verify if this fixes the issue?

@Automaat Automaat added triage/needs-information Reviewed and some extra information was asked to the reporter and removed triage/needs-reproducing Someone else should try to reproduce this labels May 14, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
kind/bug A bug triage/needs-information Reviewed and some extra information was asked to the reporter
Projects
None yet
Development

No branches or pull requests

4 participants