Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

HTTPRoute status field not populated at all, or takes a very long time #12310

Open
aminafshar opened this issue Mar 21, 2024 · 9 comments
Open
Assignees

Comments

@aminafshar
Copy link

aminafshar commented Mar 21, 2024

What is the issue?

HTTPRoute is not being picked up by Linkerd, hence status field is not populated at all, or takes a very long time, could take tens of minutes.
The policy controller container in the destination pod keeps throwing errors "Failed to patch HTTPRoute" with reason httproute NotFound.
policy container memory usage is quite high (several gigs) compared to the other components.

In cases where we created a high number of httproutes, say 1000, memory usage increases steeply until it hits the limit and OOMKilled, in our case 16Gi.

It backs to a normal working state with a restart:
kubectl rollout restart -n linkerd deployment linkerd-destination

How can it be reproduced?

Create new httproutes, or update/delete existing httproutes constantly.
Existing httproute, but we see NotFound errors for it (logs copied below):

kind: HTTPRoute
metadata:
  annotations:
    kubectl.kubernetes.io/last-applied-configuration: >
      {"apiVersion":"policy.linkerd.io/v1beta2","kind":"HTTPRoute","metadata":{"annotations":{},"labels":{"app.kubernetes.io/managed-by":"kustomize","app.kubernetes.io/name":"my-controller","app.kubernetes.io/part-of":"my-app"},"name":"controller-route-default","namespace":"my-sandbox"},"spec":{"parentRefs":[{"group":"core","kind":"Service","name":"my-controller","port":5051}]}}
  creationTimestamp: '2024-03-21T08:01:56Z'
  generation: 1
  labels:
    app.kubernetes.io/managed-by: kustomize
    app.kubernetes.io/name: my-controller
    app.kubernetes.io/part-of: my-app
  managedFields:
    - apiVersion: policy.linkerd.io/v1beta2
      fieldsType: FieldsV1
      fieldsV1:
        f:metadata:
          f:annotations:
            .: {}
            f:kubectl.kubernetes.io/last-applied-configuration: {}
          f:labels:
            .: {}
            f:app.kubernetes.io/managed-by: {}
            f:app.kubernetes.io/name: {}
            f:app.kubernetes.io/part-of: {}
        f:spec:
          .: {}
          f:parentRefs: {}
          f:rules: {}
      manager: kubectl-client-side-apply
      operation: Update
      time: '2024-03-21T08:01:56Z'
    - apiVersion: policy.linkerd.io/v1beta3
      fieldsType: FieldsV1
      fieldsV1:
        f:status:
          .: {}
          f:parents: {}
      manager: policy.linkerd.io
      operation: Update
      subresource: status
      time: '2024-03-21T10:32:10Z'
  name: controller-route-default
  namespace: my-sandbox
  resourceVersion: '647402861'
  uid: 2b3b0205-3f98-4f9f-a183-5c637e8f057b
  selfLink: >-
    /apis/policy.linkerd.io/v1beta3/namespaces/my-sandbox/httproutes/controller-route-default
status:
  parents:
    - conditions:
        - lastTransitionTime: '2024-03-21T10:21:58Z'
          message: ''
          reason: Accepted
          status: 'True'
          type: Accepted
        - lastTransitionTime: '2024-03-21T10:21:58Z'
          message: ''
          reason: BackendNotFound
          status: 'False'
          type: ResolvedRefs
      controllerName: linkerd.io/policy-controller
      parentRef:
        group: core
        kind: Service
        name: my-controller
        namespace: my-sandbox
spec:
  parentRefs:
    - group: core
      kind: Service
      name: my-controller
      port: 5051
  rules:
    - matches:
        - path:
            type: PathPrefix
            value: /

New httproute which its status not populated till we restart the linkerd destination pod:

kind: HTTPRoute
metadata:
  creationTimestamp: '2024-03-21T08:22:06Z'
  generation: 1
  managedFields:
    - apiVersion: policy.linkerd.io/v1beta3
      fieldsType: FieldsV1
      fieldsV1:
        f:spec:
          f:parentRefs: {}
          f:rules: {}
      manager: fabric8
      operation: Apply
      time: '2024-03-21T08:22:06Z'
    - apiVersion: policy.linkerd.io/v1beta3
      fieldsType: FieldsV1
      fieldsV1:
        f:status:
          .: {}
          f:parents: {}
      manager: policy.linkerd.io
      operation: Update
      subresource: status
      time: '2024-03-21T10:32:04Z'
  name: controller-route-user-476722
  namespace: my-sandbox
  resourceVersion: '647402570'
  uid: 5b09c67e-2cab-4e07-8dee-75049e6f1812
  selfLink: >-
    /apis/policy.linkerd.io/v1beta3/namespaces/my-sandbox/httproutes/controller-route-user-476722
status:
  parents:
    - conditions:
        - lastTransitionTime: '2024-03-21T10:21:52Z'
          message: ''
          reason: Accepted
          status: 'True'
          type: Accepted
        - lastTransitionTime: '2024-03-21T10:21:52Z'
          message: ''
          reason: ResolvedRefs
          status: 'True'
          type: ResolvedRefs
      controllerName: linkerd.io/policy-controller
      parentRef:
        group: core
        kind: Service
        name: my-controller
        namespace: my-sandbox
spec:
  parentRefs:
    - group: core
      kind: Service
      name: my-controller
      port: 5051
  rules:
    - backendRefs:
        - group: core
          kind: Service
          name: my-app-0
          port: 3004
          weight: 1
      matches:
        - headers:
            - name: x-user-id
              type: Exact
              value: '476722'
          path:
            type: PathPrefix
            value: /

Logs, error output, etc

{"timestamp":"2024-03-20T18:43:03.987665Z","level":"INFO","fields":{"message":"Lease already exists, no need to create it"},"target":"linkerd_policy_controller"}
{"timestamp":"2024-03-20T18:43:04.019044Z","level":"INFO","fields":{"message":"policy gRPC server listening","addr":"0.0.0.0:8090"},"target":"linkerd_policy_controller","spans":[{"port":"8090","name":"grpc"}]}
{"timestamp":"2024-03-21T08:01:18.459737Z","level":"ERROR","fields":{"message":"Failed to patch HTTPRoute","namespace":"my-sandbox","route":"GroupKindName { group: \"policy.linkerd.io\", kind: \"HTTPRoute\", name: \"controller-route-default\" }","error":"ApiError: httproutes.policy.linkerd.io \"controller-route-default\" not found: NotFound (ErrorResponse { status: \"Failure\", message: \"httproutes.policy.linkerd.io \\\"controller-route-default\\\" not found\", reason: \"NotFound\", code: 404 })"},"target":"linkerd_policy_controller_k8s_status::index","spans":[{"name":"status::Controller"}]}
{"timestamp":"2024-03-21T08:01:20.918075Z","level":"ERROR","fields":{"message":"Failed to patch HTTPRoute","namespace":"my-sandbox","route":"GroupKindName { group: \"policy.linkerd.io\", kind: \"HTTPRoute\", name: \"controller-route-default\" }","error":"ApiError: httproutes.policy.linkerd.io \"controller-route-default\" not found: NotFound (ErrorResponse { status: \"Failure\", message: \"httproutes.policy.linkerd.io \\\"controller-route-default\\\" not found\", reason: \"NotFound\", code: 404 })"},"target":"linkerd_policy_controller_k8s_status::index","spans":[{"name":"status::Controller"}]}
{"timestamp":"2024-03-21T08:01:23.317981Z","level":"ERROR","fields":{"message":"Failed to patch HTTPRoute","namespace":"my-sandbox","route":"GroupKindName { group: \"policy.linkerd.io\", kind: \"HTTPRoute\", name: \"controller-route-default\" }","error":"ApiError: httproutes.policy.linkerd.io \"controller-route-default\" not found: NotFound (ErrorResponse { status: \"Failure\", message: \"httproutes.policy.linkerd.io \\\"controller-route-default\\\" not found\", reason: \"NotFound\", code: 404 })"},"target":"linkerd_policy_controller_k8s_status::index","spans":[{"name":"status::Controller"}]}
{"timestamp":"2024-03-21T08:01:25.408224Z","level":"ERROR","fields":{"message":"Failed to patch HTTPRoute","namespace":"my-sandbox","route":"GroupKindName { group: \"policy.linkerd.io\", kind: \"HTTPRoute\", name: \"controller-route-default\" }","error":"ApiError: httproutes.policy.linkerd.io \"controller-route-default\" not found: NotFound (ErrorResponse { status: \"Failure\", message: \"httproutes.policy.linkerd.io \\\"controller-route-default\\\" not found\", reason: \"NotFound\", code: 404 })"},"target":"linkerd_policy_controller_k8s_status::index","spans":[{"name":"status::Controller"}]}
{"timestamp":"2024-03-21T08:01:28.438961Z","level":"ERROR","fields":{"message":"Failed to patch HTTPRoute","namespace":"my-sandbox","route":"GroupKindName { group: \"policy.linkerd.io\", kind: \"HTTPRoute\", name: \"controller-route-default\" }","error":"ApiError: httproutes.policy.linkerd.io \"controller-route-default\" not found: NotFound (ErrorResponse { status: \"Failure\", message: \"httproutes.policy.linkerd.io \\\"controller-route-default\\\" not found\", reason: \"NotFound\", code: 404 })"},"target":"linkerd_policy_controller_k8s_status::index","spans":[{"name":"status::Controller"}]}
{"timestamp":"2024-03-21T08:01:29.134401Z","level":"ERROR","fields":{"message":"Failed to patch HTTPRoute","namespace":"my-sandbox","route":"GroupKindName { group: \"policy.linkerd.io\", kind: \"HTTPRoute\", name: \"controller-route-default\" }","error":"ApiError: httproutes.policy.linkerd.io \"controller-route-default\" not found: NotFound (ErrorResponse { status: \"Failure\", message: \"httproutes.policy.linkerd.io \\\"controller-route-default\\\" not found\", reason: \"NotFound\", code: 404 })"},"target":"linkerd_policy_controller_k8s_status::index","spans":[{"name":"status::Controller"}]}
{"timestamp":"2024-03-21T08:01:31.952588Z","level":"ERROR","fields":{"message":"Failed to patch HTTPRoute","namespace":"my-sandbox","route":"GroupKindName { group: \"policy.linkerd.io\", kind: \"HTTPRoute\", name: \"controller-route-default\" }","error":"ApiError: httproutes.policy.linkerd.io \"controller-route-default\" not found: NotFound (ErrorResponse { status: \"Failure\", message: \"httproutes.policy.linkerd.io \\\"controller-route-default\\\" not found\", reason: \"NotFound\", code: 404 })"},"target":"linkerd_policy_controller_k8s_status::index","spans":[{"name":"status::Controller"}]}
{"timestamp":"2024-03-21T08:01:34.950543Z","level":"ERROR","fields":{"message":"Failed to patch HTTPRoute","namespace":"my-sandbox","route":"GroupKindName { group: \"policy.linkerd.io\", kind: \"HTTPRoute\", name: \"controller-route-default\" }","error":"ApiError: httproutes.policy.linkerd.io \"controller-route-default\" not found: NotFound (ErrorResponse { status: \"Failure\", message: \"httproutes.policy.linkerd.io \\\"controller-route-default\\\" not found\", reason: \"NotFound\", code: 404 })"},"target":"linkerd_policy_controller_k8s_status::index","spans":[{"name":"status::Controller"}]}
{"timestamp":"2024-03-21T08:01:35.602828Z","level":"ERROR","fields":{"message":"Failed to patch HTTPRoute","namespace":"my-sandbox","route":"GroupKindName { group: \"policy.linkerd.io\", kind: \"HTTPRoute\", name: \"controller-route-default\" }","error":"ApiError: httproutes.policy.linkerd.io \"controller-route-default\" not found: NotFound (ErrorResponse { status: \"Failure\", message: \"httproutes.policy.linkerd.io \\\"controller-route-default\\\" not found\", reason: \"NotFound\", code: 404 })"},"target":"linkerd_policy_controller_k8s_status::index","spans":[{"name":"status::Controller"}]}
{"timestamp":"2024-03-21T08:01:36.520973Z","level":"ERROR","fields":{"message":"Failed to patch HTTPRoute","namespace":"my-sandbox","route":"GroupKindName { group: \"policy.linkerd.io\", kind: \"HTTPRoute\", name: \"controller-route-default\" }","error":"ApiError: httproutes.policy.linkerd.io \"controller-route-default\" not found: NotFound (ErrorResponse { status: \"Failure\", message: \"httproutes.policy.linkerd.io \\\"controller-route-default\\\" not found\", reason: \"NotFound\", code: 404 })"},"target":"linkerd_policy_controller_k8s_status::index","spans":[{"name":"status::Controller"}]}
{"timestamp":"2024-03-21T08:01:38.480231Z","level":"ERROR","fields":{"message":"Failed to patch HTTPRoute","namespace":"my-sandbox","route":"GroupKindName { group: \"policy.linkerd.io\", kind: \"HTTPRoute\", name: \"controller-route-default\" }","error":"ApiError: httproutes.policy.linkerd.io \"controller-route-default\" not found: NotFound (ErrorResponse { status: \"Failure\", message: \"httproutes.policy.linkerd.io \\\"controller-route-default\\\" not found\", reason: \"NotFound\", code: 404 })"},"target":"linkerd_policy_controller_k8s_status::index","spans":[{"name":"status::Controller"}]}
{"timestamp":"2024-03-21T08:01:41.174658Z","level":"ERROR","fields":{"message":"Failed to patch HTTPRoute","namespace":"my-sandbox","route":"GroupKindName { group: \"policy.linkerd.io\", kind: \"HTTPRoute\", name: \"controller-route-default\" }","error":"ApiError: httproutes.policy.linkerd.io \"controller-route-default\" not found: NotFound (ErrorResponse { status: \"Failure\", message: \"httproutes.policy.linkerd.io \\\"controller-route-default\\\" not found\", reason: \"NotFound\", code: 404 })"},"target":"linkerd_policy_controller_k8s_status::index","spans":[{"name":"status::Controller"}]}
{"timestamp":"2024-03-21T08:01:42.805331Z","level":"ERROR","fields":{"message":"Failed to patch HTTPRoute","namespace":"my-sandbox","route":"GroupKindName { group: \"policy.linkerd.io\", kind: \"HTTPRoute\", name: \"controller-route-default\" }","error":"ApiError: httproutes.policy.linkerd.io \"controller-route-default\" not found: NotFound (ErrorResponse { status: \"Failure\", message: \"httproutes.policy.linkerd.io \\\"controller-route-default\\\" not found\", reason: \"NotFound\", code: 404 })"},"target":"linkerd_policy_controller_k8s_status::index","spans":[{"name":"status::Controller"}]}
{"timestamp":"2024-03-21T08:01:44.907200Z","level":"ERROR","fields":{"message":"Failed to patch HTTPRoute","namespace":"my-sandbox","route":"GroupKindName { group: \"policy.linkerd.io\", kind: \"HTTPRoute\", name: \"controller-route-default\" }","error":"ApiError: httproutes.policy.linkerd.io \"controller-route-default\" not found: NotFound (ErrorResponse { status: \"Failure\", message: \"httproutes.policy.linkerd.io \\\"controller-route-default\\\" not found\", reason: \"NotFound\", code: 404 })"},"target":"linkerd_policy_controller_k8s_status::index","spans":[{"name":"status::Controller"}]}
{"timestamp":"2024-03-21T08:01:46.396935Z","level":"ERROR","fields":{"message":"Failed to patch HTTPRoute","namespace":"my-sandbox","route":"GroupKindName { group: \"policy.linkerd.io\", kind: \"HTTPRoute\", name: \"controller-route-default\" }","error":"ApiError: httproutes.policy.linkerd.io \"controller-route-default\" not found: NotFound (ErrorResponse { status: \"Failure\", message: \"httproutes.policy.linkerd.io \\\"controller-route-default\\\" not found\", reason: \"NotFound\", code: 404 })"},"target":"linkerd_policy_controller_k8s_status::index","spans":[{"name":"status::Controller"}]}
{"timestamp":"2024-03-21T08:01:48.372032Z","level":"ERROR","fields":{"message":"Failed to patch HTTPRoute","namespace":"my-sandbox","route":"GroupKindName { group: \"policy.linkerd.io\", kind: \"HTTPRoute\", name: \"controller-route-default\" }","error":"ApiError: httproutes.policy.linkerd.io \"controller-route-default\" not found: NotFound (ErrorResponse { status: \"Failure\", message: \"httproutes.policy.linkerd.io \\\"controller-route-default\\\" not found\", reason: \"NotFound\", code: 404 })"},"target":"linkerd_policy_controller_k8s_status::index","spans":[{"name":"status::Controller"}]}
{"timestamp":"2024-03-21T08:01:50.534977Z","level":"ERROR","fields":{"message":"Failed to patch HTTPRoute","namespace":"my-sandbox","route":"GroupKindName { group: \"policy.linkerd.io\", kind: \"HTTPRoute\", name: \"controller-route-default\" }","error":"ApiError: httproutes.policy.linkerd.io \"controller-route-default\" not found: NotFound (ErrorResponse { status: \"Failure\", message: \"httproutes.policy.linkerd.io \\\"controller-route-default\\\" not found\", reason: \"NotFound\", code: 404 })"},"target":"linkerd_policy_controller_k8s_status::index","spans":[{"name":"status::Controller"}]}
{"timestamp":"2024-03-21T08:01:53.104461Z","level":"ERROR","fields":{"message":"Failed to patch HTTPRoute","namespace":"my-sandbox","route":"GroupKindName { group: \"policy.linkerd.io\", kind: \"HTTPRoute\", name: \"controller-route-default\" }","error":"ApiError: httproutes.policy.linkerd.io \"controller-route-default\" not found: NotFound (ErrorResponse { status: \"Failure\", message: \"httproutes.policy.linkerd.io \\\"controller-route-default\\\" not found\", reason: \"NotFound\", code: 404 })"},"target":"linkerd_policy_controller_k8s_status::index","spans":[{"name":"status::Controller"}]}
{"timestamp":"2024-03-21T08:01:56.102044Z","level":"ERROR","fields":{"message":"Failed to patch HTTPRoute","namespace":"my-sandbox","route":"GroupKindName { group: \"policy.linkerd.io\", kind: \"HTTPRoute\", name: \"controller-route-default\" }","error":"ApiError: httproutes.policy.linkerd.io \"controller-route-default\" not found: NotFound (ErrorResponse { status: \"Failure\", message: \"httproutes.policy.linkerd.io \\\"controller-route-default\\\" not found\", reason: \"NotFound\", code: 404 })"},"target":"linkerd_policy_controller_k8s_status::index","spans":[{"name":"status::Controller"}]}

output of linkerd check -o short

linkerd check -o short
linkerd-identity
----------------
‼ issuer cert is valid for at least 60 days
    issuer certificate will expire on 2024-03-23T09:34:37Z
    see https://linkerd.io/2/checks/#l5d-identity-issuer-cert-not-expiring-soon for hints

linkerd-version
---------------
‼ cli is up-to-date
    is running version 24.3.2 but the latest edge version is 24.3.3
    see https://linkerd.io/2/checks/#l5d-version-cli for hints

control-plane-version
---------------------
‼ control plane is up-to-date
    is running version 24.3.2 but the latest edge version is 24.3.3
    see https://linkerd.io/2/checks/#l5d-version-control for hints

linkerd-control-plane-proxy
---------------------------
‼ control plane proxies are up-to-date
    some proxies are not running the current version:
	* linkerd-destination-56f85576c7-tpx4h (edge-24.3.2)
	* linkerd-identity-575f48d794-9hmxb (edge-24.3.2)
	* linkerd-proxy-injector-678f5b6b99-kbzkk (edge-24.3.2)
    see https://linkerd.io/2/checks/#l5d-cp-proxy-version for hints

linkerd-viz
-----------
‼ linkerd-viz pods are injected
    could not find proxy container for linkerd-cni-bv45t pod
    see https://linkerd.io/2/checks/#l5d-viz-pods-injection for hints
‼ viz extension pods are running
    container "linkerd-proxy" in pod "metrics-api-544b76757-7zk8v" is not ready
    see https://linkerd.io/2/checks/#l5d-viz-pods-running for hints
‼ viz extension proxies are up-to-date
    some proxies are not running the current version:
	* linkerd-destination-56f85576c7-tpx4h (edge-24.3.2)
	* linkerd-identity-575f48d794-9hmxb (edge-24.3.2)
	* linkerd-proxy-injector-678f5b6b99-kbzkk (edge-24.3.2)
    see https://linkerd.io/2/checks/#l5d-viz-proxy-cp-version for hints

Status check results are √

Environment

  • Kubernetes version: v1.26.4
  • Cluster Environment: on-prem, kubeadm vanilla kubernetes
  • Host OS: RHEL 8.9
  • Linkerd version: edge-24.3.2

Possible solution

No response

Additional context

No response

Would you like to work on fixing this bug?

None

@aminafshar aminafshar added the bug label Mar 21, 2024
@aminafshar
Copy link
Author

Screenshot 2024-03-21 at 15 45 37

@aminafshar
Copy link
Author

aminafshar commented Mar 21, 2024

Another httproute controller-route-user-5186 created and the one above deleted controller-route-user-476722,
and policy controller keeps throwing hundreds of the same errors:

{"timestamp":"2024-03-21T12:00:12.113301Z","level":"ERROR","fields":{"message":"Failed to patch HTTPRoute","namespace":"my-sandbox","route":"GroupKindName { group: \"policy.linkerd.io\", kind: \"HTTPRoute\", name: \"controller-route-user-476722\" }","error":"ApiError: httproutes.policy.linkerd.io \"controller-route-user-476722\" not found: NotFound (ErrorResponse { status: \"Failure\", message: \"httproutes.policy.linkerd.io \\\"controller-route-user-476722\\\" not found\", reason: \"NotFound\", code: 404 })"},"target":"linkerd_policy_controller_k8s_status::index","spans":[{"name":"status::Controller"}]}
{"timestamp":"2024-03-21T12:00:12.430386Z","level":"ERROR","fields":{"message":"Failed to patch HTTPRoute","namespace":"my-sandbox","route":"GroupKindName { group: \"policy.linkerd.io\", kind: \"HTTPRoute\", name: \"controller-route-user-5186\" }","error":"ApiError: httproutes.policy.linkerd.io \"controller-route-user-5186\" not found: NotFound (ErrorResponse { status: \"Failure\", message: \"httproutes.policy.linkerd.io \\\"controller-route-user-5186\\\" not found\", reason: \"NotFound\", code: 404 })"},"target":"linkerd_policy_controller_k8s_status::index","spans":[{"name":"status::Controller"}]}

and it took about 40 minutes for the status field to be updated. note the creationTimestamp and status update time.

apiVersion: policy.linkerd.io/v1beta3
kind: HTTPRoute
metadata:
  creationTimestamp: '2024-03-21T13:09:58Z'
  generation: 1
  managedFields:
    - apiVersion: policy.linkerd.io/v1beta3
      fieldsType: FieldsV1
      fieldsV1:
        f:spec:
          f:parentRefs: {}
          f:rules: {}
      manager: fabric8
      operation: Apply
      time: '2024-03-21T13:09:58Z'
    - apiVersion: policy.linkerd.io/v1beta3
      fieldsType: FieldsV1
      fieldsV1:
        f:status:
          .: {}
          f:parents: {}
      manager: policy.linkerd.io
      operation: Update
      subresource: status
      time: '2024-03-21T13:49:16Z'
  name: controller-route-user-5186
  namespace: my-sandbox
  resourceVersion: '648189883'
  uid: d350c128-3751-4b20-8f85-d0959ffa6c21
  selfLink: >-
    /apis/policy.linkerd.io/v1beta3/namespaces/my-sandbox/httproutes/controller-route-user-5186
status:
  parents:
    - conditions:
        - lastTransitionTime: '2024-03-21T13:14:16Z'
          message: ''
          reason: Accepted
          status: 'True'
          type: Accepted
        - lastTransitionTime: '2024-03-21T13:14:16Z'
          message: ''
          reason: ResolvedRefs
          status: 'True'
          type: ResolvedRefs
      controllerName: linkerd.io/policy-controller
      parentRef:
        group: core
        kind: Service
        name: my-controller
        namespace: my-sandbox
spec:
  parentRefs:
    - group: core
      kind: Service
      name: my-controller
      port: 5051
  rules:
    - backendRefs:
        - group: core
          kind: Service
          name: my-app-0
          port: 3004
          weight: 1
      matches:
        - headers:
            - name: x-user-id
              type: Exact
              value: '5186'
          path:
            type: PathPrefix
            value: /

linkerd-destination-56f85576c7-tpx4h_policy.log

@adleong
Copy link
Member

adleong commented Mar 21, 2024

@aminafshar this looks like it is likely the same issue as #12104 and is fixed in #12215

@olix0r
Copy link
Member

olix0r commented Apr 3, 2024

This was fixed in https://github.com/linkerd/linkerd2/releases/tag/edge-24.3.4. Please let us know if issues persist.

@olix0r olix0r closed this as completed Apr 3, 2024
@aminafshar
Copy link
Author

aminafshar commented Apr 15, 2024

@adleong , @olix0r
Now we're running edge-24.4.1 (Kubernetes version: v1.28.8),
It seems resource-wise policy controller running normally and memory leak issue resolved
but still we are seeing a long delay of about several minutes between httproute creation and status field update
and we see lots of errors as below

{"timestamp":"2024-04-15T08:18:19.231788Z","level":"ERROR","fields":{"message":"Failed to send HTTPRoute patch","id.namespace":"pangolin","route":"GroupKindName { group: \"policy.linkerd.io\", kind: \"HTTPRoute\", name: \"controller-route-user-4288\" }","error":"no available capacity"},"target":"linkerd_policy_controller_k8s_status::index","spans":[{"name":"status::Index"}]}
Screenshot 2024-04-15 at 11 17 09

@aminafshar
Copy link
Author

As you see memory usage became a flat line for the last 7 hours, and seems the policy controller is just stuck in that state, keeps throwing the same error

2024-04-15T13:53:09+03:00 {"timestamp":"2024-04-15T10:53:09.703054Z","level":"ERROR","fields":{"message":"Failed to send HTTPRoute patch","id.namespace":"pangolin","route":"GroupKindName { group: \"policy.linkerd.io\", kind: \"HTTPRoute\", name: \"controller-route-user-0123\" }","error":"no available capacity"},"target":"linkerd_policy_controller_k8s_status::index","spans":[{"name":"httproutes.policy.linkerd.io"}]}

@olix0r olix0r reopened this Apr 15, 2024
@adleong
Copy link
Member

adleong commented Apr 17, 2024

Hi @aminafshar, sorry to hear you're still experiencing this.

Those error messages indicates that the policy controller is generating HTTPRoute status patches more quickly than the kubernetes API can keep up with. The policy controller will only generate a patch for an HTTPRoute if the HTTPRoute's status is out of date and needs to be updated. I've attempted to reproduce this with 1000 HTTPRoutes but I only see patches generated when the HTTTPRoutes are first created and not continuously like you seem to be experiencing. Are HTTPRoutes being created or updated rapidly by some controller or automated process?

If you can provide the output of linkerd diagnostics controller-metrics, it can help us confirm what we're seeing. If you can also share the yaml formatted output from one of these HTTPRoutes (e.g. kubectl get httproute/X -o yaml) we can see if anything seems unexpected about the resource itself or its status.

@aminafshar
Copy link
Author

aminafshar commented Apr 17, 2024

Hi @adleong , I asked our developers to provide info on how they create and manage httproutes.

At the time of writing, there are about ~60 httproutes on the cluster and only a few deleted/created recently.
linkerd-destination pods restarted, running for the last ~2hours. Logs and diagnostics output and some recent httproutes yaml output attached.
linkerd-diagnostics-controller-metrics.txt
policy_linkerd-destination-887769595-492pk.log
policy_linkerd-destination-887769595-hdmzp.log
policy_linkerd-destination-887769595-gttn5.log
httproutes.yml.txt

@adleong
Copy link
Member

adleong commented Apr 18, 2024

Thank you for this very helpful data. Using this, I was able to reproduce the issue and found the root cause to be a missing field in the HTTPRoute CRD schema. I've added the missing field here #12454 and confirmed that this resolves the issue.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants