Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Linkerd service profiles doesn't show any data in Linkerd route dashboards in grafana. #12480

Open
dewstyh opened this issue Apr 22, 2024 · 2 comments

Comments

@dewstyh
Copy link

dewstyh commented Apr 22, 2024

What is the issue?

I deployed service profiles for each service as CRD's in my eks cluster which is running along with kube-prometheus-stack includes grafana. Also when i check the data of my service in live calls at linkerd-viz dashboard. I see calls going to those service profiles paths, yet i don't see exact data under linkerd routes.

How can it be reproduced?

linkerd service profile:

apiVersion: linkerd.io/v1alpha2
kind: ServiceProfile
metadata:
  creationTimestamp: null
  name: assets-service.dev-ibwave.svc.cluster.local
  namespace: dev-ibwave
spec:
  routes:
  - condition:
      method: POST
      pathRegex: /internal/v1/assets
    name: POST /internal/v1/assets
    responseClasses:
    - condition:
        status:
          max: 550
          min: 200
  - condition:
      method: DELETE
      pathRegex: /internal/v1/assets/[^/]*
    name: DELETE /internal/v1/assets/{id}
    responseClasses:
    - condition:
        status:
          max: 550
          min: 200
  - condition:
      method: GET
      pathRegex: /internal/v1/assets/[^/]*
    name: GET /internal/v1/assets/{id}
    responseClasses:
    - condition:
        status:
          max: 550
          min: 200
  - condition:
      method: PUT
      pathRegex: /internal/v1/assets/[^/]*
    name: PUT /internal/v1/assets/{id}
    responseClasses:
    - condition:
        status:
          max: 550
          min: 200
  - condition:
      method: GET
      pathRegex: /readyz
    name: GET /readyz
    responseClasses:
    - condition:
        status:
          max: 550
          min: 200
  - condition:
      method: GET
      pathRegex: /assets-service/v1/assets
    name: GET /assets-service/v1/assets
    responseClasses:
    - condition:
        status:
          max: 550
          min: 200
  - condition:
      method: POST
      pathRegex: /assets-service/v1/assets
    name: POST /assets-service/v1/assets
    responseClasses:
    - condition:
        status:
          max: 550
          min: 200
  - condition:
      method: POST
      pathRegex: /assets-service/v1/assets/duplicate
    name: POST /assets-service/v1/assets/duplicate
    responseClasses:
    - condition:
        status:
          max: 550
          min: 200
  - condition:
      method: DELETE
      pathRegex: /assets-service/v1/assets/[^/]*
    name: DELETE /assets-service/v1/assets/{id}
    responseClasses:
    - condition:
        status:
          max: 550
          min: 200
  - condition:
      method: GET
      pathRegex: /assets-service/v1/assets/[^/]*
    name: GET /assets-service/v1/assets/{id}
    responseClasses:
    - condition:
        status:
          max: 550
          min: 200
  - condition:
      method: PATCH
      pathRegex: /assets-service/v1/assets/[^/]*
    name: PATCH /assets-service/v1/assets/{id}
    responseClasses:
    - condition:
        status:
          max: 550
          min: 200
  - condition:
      method: PUT
      pathRegex: /assets-service/v1/assets/[^/]*
    name: PUT /assets-service/v1/assets/{id}
    responseClasses:
    - condition:
        status:
          max: 550
          min: 200
  - condition:
      method: GET
      pathRegex: /assets-service/v1/assets/[^/]*/children
    name: GET /assets-service/v1/assets/{id}/children
    responseClasses:
    - condition:
        status:
          max: 550
          min: 200
  - condition:
      method: GET
      pathRegex: /assets-service/v1/assets/[^/]*/s3location
    name: GET /assets-service/v1/assets/{id}/s3location
    responseClasses:
    - condition:
        status:
          max: 550
          min: 200
  - condition:
      method: DELETE
      pathRegex: /assets-service/v1/assets/[^/]*/s3location/[^/]*
    name: DELETE /assets-service/v1/assets/{id}/s3location/{imageId}
    responseClasses:
    - condition:
        status:
          max: 550
          min: 200
  - condition:
      method: PUT
      pathRegex: /assets-service/v1/assets/[^/]*/s3location/[^/]*
    name: PUT /assets-service/v1/assets/{id}/s3location/{imageId}
    responseClasses:
    - condition:
        status:
          max: 550
          min: 200
  - condition:
      method: POST
      pathRegex: /assets-service/v1/layout-plan
    name: POST /assets-service/v1/layout-plan
    responseClasses:
    - condition:
        status:
          max: 550
          min: 200
  - condition:
      method: GET
      pathRegex: /assets-service/v1/layout-plan/asset/[^/]*
    name: GET /assets-service/v1/layout-plan/asset/{id}
    responseClasses:
    - condition:
        status:
          max: 550
          min: 200
  - condition:
      method: GET
      pathRegex: /assets-service/v1/layout-plan/[^/]*
    name: GET /assets-service/v1/layout-plan/{id}
    responseClasses:
    - condition:
        status:
          max: 550
          min: 200
  - condition:
      method: PATCH
      pathRegex: /assets-service/v1/layout-plan/[^/]*
    name: PATCH /assets-service/v1/layout-plan/{id}
    responseClasses:
    - condition:
        status:
          max: 550
          min: 200
  - condition:
      method: PUT
      pathRegex: /assets-service/v1/layout-plan/[^/]*
    name: PUT /assets-service/v1/layout-plan/{id}
    responseClasses:
    - condition:
        status:
          max: 550
          min: 200
  - condition:
      method: POST
      pathRegex: /assets-service/v1/layout-plan/[^/]*/elements
    name: POST /assets-service/v1/layout-plan/{layoutPlanId}/elements
    responseClasses:
    - condition:
        status:
          max: 550
          min: 200
  - condition:
      method: DELETE
      pathRegex: /assets-service/v1/layout-plan/[^/]*/elements/[^/]*
    name: DELETE /assets-service/v1/layout-plan/{layoutPlanId}/elements/{id}
    responseClasses:
    - condition:
        status:
          max: 550
          min: 200
  - condition:
      method: PUT
      pathRegex: /assets-service/v1/layout-plan/[^/]*/elements/[^/]*
    name: PUT /assets-service/v1/layout-plan/{layoutPlanId}/elements/{id}
    responseClasses:
    - condition:
        status:
          max: 550
          min: 200

Screenshot 2024-04-22 095945

Logs, error output, etc

Live calls in linkerd viz dashboards for command: linkerd viz top deployment/assets-service --namespace dev-ibwave

image
image

FROM	
10.55.2.8
GET	/livez	5	2 ms	5 ms	2 ms	
100.00%
TO	
[deploy/permissions-service](http://localhost:50750/namespaces/dev-ibwave/deployments/permissions-service)
POST	/internal/v1/policies/authorize	3	30 ms	401 ms	60 ms	
100.00%
FROM	
10.55.3.170
GET	/livez	3	2 ms	5 ms	3 ms	
100.00%
FROM	
10.55.1.48
GET	/livez	3	3 ms	12 ms	12 ms	
100.00%
TO	
[deploy/storage-service](http://localhost:50750/namespaces/dev-ibwave/deployments/storage-service)
GET	/internal/v1/storage	2	2.76 s	2.82 s	2.82 s	
100.00%
FROM	
10.55.1.48
GET	/assets-service/v1/assets/96589a3c-e3ab-44eb-bb7b-32ddd5ac6ecf/children	1	516 ms	516 ms	516 ms	
100.00%
FROM	
10.55.2.8
GET	/readyz	1	4 ms	4 ms	4 ms	
100.00%
FROM	
10.55.3.170
GET	/assets-service/v1/assets/f5f711e0-4a7f-3030-f58f-8ccf5c4d9e61/s3location	1	2.81 s	2.81 s	2.81 s	
100.00%
FROM	
10.55.3.170
GET	/assets-service/v1/assets/609087c5-c21a-8056-867c-755a8d061728/s3location	1	2.91 s	2.91 s	2.91 s	
100.00%

same command to see linkerd routes: linkerd viz routes deployment/assets-service --namespace dev-ibwave

Route Service Success Rate RPS P50 Latency P95 Latency P99 Latency
DELETE /assets-service/v1/assets/{id} assets-service --- --- 0 s 0 s 0 s
DELETE /assets-service/v1/assets/{id}/s3location/{imageId} assets-service --- --- 0 s 0 s 0 s
DELETE /assets-service/v1/layout-plan/{layoutPlanId}/elements/{id} assets-service --- --- 0 s 0 s 0 s
DELETE /internal/v1/assets/{id} assets-service --- --- 0 s 0 s 0 s
GET /assets-service/v1/assets assets-service --- --- 0 s 0 s 0 s
GET /assets-service/v1/assets/{id} assets-service --- --- 0 s 0 s 0 s
GET /assets-service/v1/assets/{id}/children assets-service --- --- 0 s 0 s 0 s
GET /assets-service/v1/assets/{id}/s3location assets-service --- --- 0 s 0 s 0 s
GET /assets-service/v1/layout-plan/asset/{id} assets-service --- --- 0 s 0 s 0 s
GET /assets-service/v1/layout-plan/{id} assets-service --- --- 0 s 0 s 0 s
GET /internal/v1/assets/{id} assets-service --- --- 0 s 0 s 0 s
GET /readyz assets-service --- --- 0 s 0 s 0 s
PATCH /assets-service/v1/assets/{id} assets-service --- --- 0 s 0 s 0 s
PATCH /assets-service/v1/layout-plan/{id} assets-service --- --- 0 s 0 s 0 s
POST /assets-service/v1/assets assets-service --- --- 0 s 0 s 0 s
POST /assets-service/v1/assets/duplicate assets-service --- --- 0 s 0 s 0 s
POST /assets-service/v1/layout-plan assets-service --- --- 0 s 0 s 0 s
POST /assets-service/v1/layout-plan/{layoutPlanId}/elements assets-service --- --- 0 s 0 s 0 s
POST /internal/v1/assets assets-service --- --- 0 s 0 s 0 s
PUT /assets-service/v1/assets/{id} assets-service --- --- 0 s 0 s 0 s
PUT /assets-service/v1/assets/{id}/s3location/{imageId} assets-service --- --- 0 s 0 s 0 s
PUT /assets-service/v1/layout-plan/{id} assets-service --- --- 0 s 0 s 0 s
PUT /assets-service/v1/layout-plan/{layoutPlanId}/elements/{id} assets-service --- --- 0 s 0 s 0 s
PUT /internal/v1/assets/{id} assets-service --- --- 0 s 0 s 0 s
[DEFAULT] assets-service --- --- 0 s 0 s 0 s

output of linkerd check -o short

$ linkerd viz check
linkerd-viz

√ linkerd-viz Namespace exists
√ can initialize the client
√ linkerd-viz ClusterRoles exist
√ linkerd-viz ClusterRoleBindings exist
√ tap API server has valid cert
√ tap API server cert is valid for at least 60 days
√ tap API service is running
√ linkerd-viz pods are injected
√ viz extension pods are running
√ viz extension proxies are healthy
‼ viz extension proxies are up-to-date
some proxies are not running the current version:
* metrics-api-6846899c48-jdx9j (stable-2.14.9)
* tap-75d5f946b9-mt7p7 (stable-2.14.9)
* tap-injector-6997bbcb5d-whq8v (stable-2.14.9)
* web-86bc8d5ffc-md59z (stable-2.14.9)
see https://linkerd.io/2.14/checks/#l5d-viz-proxy-cp-version for hints
√ viz extension proxies and cli versions match
√ viz extension self-check

Status check results are √

rithagoni@iB1033 MINGW64 ~/Desktop/repositories/BaseInfrastructure/ServiceProfiles (RII/DEVOPS-238/LinkerdServiceProfiles)
$ linkerd jaeger check
linkerd-jaeger

√ linkerd-jaeger extension Namespace exists
√ jaeger extension pods are injected
√ jaeger injector pods are running
√ jaeger extension proxies are healthy
‼ jaeger extension proxies are up-to-date
some proxies are not running the current version:
* collector-7f5977685-x24dn (stable-2.14.9)
* jaeger-f79786c67-cgnns (stable-2.14.9)
* jaeger-injector-64f95564c8-8588h (stable-2.14.9)
see https://linkerd.io/2.14/checks/#l5d-jaeger-proxy-cp-version for hints
√ jaeger extension proxies and cli versions match

Status check results are √

rithagoni@iB1033 MINGW64 ~/Desktop/repositories/BaseInfrastructure/ServiceProfiles (RII/DEVOPS-238/LinkerdServiceProfiles)
$ linkerd check
kubernetes-api

√ can initialize the client
√ can query the Kubernetes API

kubernetes-version

√ is running the minimum Kubernetes API version

linkerd-existence

√ 'linkerd-config' config map exists
√ heartbeat ServiceAccount exist
√ control plane replica sets are ready
√ no unschedulable pods
√ control plane pods are ready
√ cluster networks contains all pods
√ cluster networks contains all services

linkerd-config

√ control plane Namespace exists
√ control plane ClusterRoles exist
√ control plane ClusterRoleBindings exist
√ control plane ServiceAccounts exist
√ control plane CustomResourceDefinitions exist
√ control plane MutatingWebhookConfigurations exist
√ control plane ValidatingWebhookConfigurations exist
√ proxy-init container runs as root user if docker container runtime is used

linkerd-cni-plugin

√ cni plugin ConfigMap exists
√ cni plugin ClusterRole exists
√ cni plugin ClusterRoleBinding exists
√ cni plugin ServiceAccount exists
√ cni plugin DaemonSet exists
√ cni plugin pod is running on all nodes

linkerd-identity

√ certificate config is valid
√ trust anchors are using supported crypto algorithm
√ trust anchors are within their validity period
√ trust anchors are valid for at least 60 days
√ issuer cert is using supported crypto algorithm
√ issuer cert is within its validity period
‼ issuer cert is valid for at least 60 days
issuer certificate will expire on 2024-06-19T10:58:39Z
see https://linkerd.io/2.14/checks/#l5d-identity-issuer-cert-not-expiring-soon for hints
√ issuer cert is issued by the trust anchor

linkerd-webhooks-and-apisvc-tls

√ proxy-injector webhook has valid cert
√ proxy-injector cert is valid for at least 60 days
√ sp-validator webhook has valid cert
√ sp-validator cert is valid for at least 60 days
√ policy-validator webhook has valid cert
√ policy-validator cert is valid for at least 60 days

linkerd-version

√ can determine the latest version
‼ cli is up-to-date
is running version 2.14.9 but the latest stable version is 2.14.10
see https://linkerd.io/2.14/checks/#l5d-version-cli for hints

control-plane-version

√ can retrieve the control plane version
‼ control plane is up-to-date
is running version 2.14.9 but the latest stable version is 2.14.10
see https://linkerd.io/2.14/checks/#l5d-version-control for hints
√ control plane and cli versions match

linkerd-control-plane-proxy

√ control plane proxies are healthy
‼ control plane proxies are up-to-date
some proxies are not running the current version:
* linkerd-destination-88c44495f-72sb5 (stable-2.14.9)
* linkerd-identity-5b84b6544c-4v95q (stable-2.14.9)
* linkerd-proxy-injector-68dcb64d7f-gk7tf (stable-2.14.9)
see https://linkerd.io/2.14/checks/#l5d-cp-proxy-version for hints
√ control plane proxies and cli versions match

Environment

AWS EKS 1.29
Linkerd stable 2.14.9

Possible solution

LInkerd viz or linkerd route dashboard should show all of the live calls triggering the paths mentioned in linkerd service profiles but only some of the routes are visible in Linkerd routes and linkerd routes dashboards in grafana.

Additional context

No response

Would you like to work on fixing this bug?

None

@dewstyh dewstyh added the bug label Apr 22, 2024
@dewstyh
Copy link
Author

dewstyh commented Apr 22, 2024

Here now, I observed that some of my services "authority" value is different than what i mentioned in service-profile.yaml

end id=3:25 proxy=in  src=10.55.3.170:51188 dst=10.55.3.54:5000 tls=no_tls_from_remote duration=295µs response-length=5B
req id=3:27 proxy=in  src=10.55.1.48:11603 dst=10.55.3.54:5000 tls=no_tls_from_remote :method=GET :authority=api.nextgen.devcloud.ibwave.com :path=/assets-service/v1/assets/96589a3c-e3ab-44eb-bb7b-32ddd5ac6ecf/children
req id=3:28 proxy=out src=10.55.3.54:34682 dst=10.55.2.45:5000 tls=true :method=POST :authority=permissions-service:5000 :path=/internal/v1/policies/authorize
rsp id=3:28 proxy=out src=10.55.3.54:34682 dst=10.55.2.45:5000 tls=true :status=200 latency=164113µs
end id=3:28 proxy=out src=10.55.3.54:34682 dst=10.55.2.45:5000 tls=true duration=187µs response-length=20B
rsp id=3:27 proxy=in  src=10.55.1.48:11603 dst=10.55.3.54:5000 tls=no_tls_from_remote :status=200 latency=269367µs
end id=3:27 proxy=in  src=10.55.1.48:11603 dst=10.55.3.54:5000 tls=no_tls_from_remote duration=264µs response-length=1246B
req id=3:29 proxy=in  src=10.55.3.170:51697 dst=10.55.3.54:5000 tls=no_tls_from_remote :method=GET :authority=api.nextgen.devcloud.ibwave.com :path=/assets-service/v1/assets/171cf51b-facd-b095-8e11-739d0be60a0f/s3location
req id=3:30 proxy=out src=10.55.3.54:34682 dst=10.55.2.45:5000 tls=true :method=POST :authority=permissions-service:5000 :path=/internal/v1/policies/authorize
rsp id=3:30 proxy=out src=10.55.3.54:34682 dst=10.55.2.45:5000 tls=true :status=200 latency=37367µs
end id=3:30 proxy=out src=10.55.3.54:34682 dst=10.55.2.45:5000 tls=true duration=146µs response-length=20B
req id=3:31 proxy=out src=10.55.3.54:51238 dst=10.55.3.39:5000 tls=true :method=GET :authority=storage-service:5000 :path=/internal/v1/storage
rsp id=3:31 proxy=out src=10.55.3.54:51238 dst=10.55.3.39:5000 tls=true :status=200 latency=2102704µs
end id=3:31 proxy=out src=10.55.3.54:51238 dst=10.55.3.39:5000 tls=true duration=169µs response-length=1742B
rsp id=3:29 proxy=in  src=10.55.3.170:51697 dst=10.55.3.54:5000 tls=no_tls_from_remote :status=200 latency=2158371µs
end id=3:29 proxy=in  src=10.55.3.170:51697 dst=10.55.3.54:5000 tls=no_tls_from_remote duration=208µs response-length=1696B
req id=3:32 proxy=in  src=10.55.1.48:55429 dst=10.55.3.54:5000 tls=no_tls_from_remote :method=GET :authority=api.nextgen.devcloud.ibwave.com :path=/assets-service/v1/assets/d88fbc00-2bf6-9cf5-eef2-a57362f74f73/children
req id=3:33 proxy=out src=10.55.3.54:34682 dst=10.55.2.45:5000 tls=true :method=POST :authority=permissions-service:5000 :path=/internal/v1/policies/authorize
req id=3:34 proxy=in  src=10.55.3.170:42485 dst=10.55.3.54:5000 tls=no_tls_from_remote :method=GET :authority=api.nextgen.devcloud.ibwave.com :path=/assets-service/v1/assets/38490470-5938-4285-81f9-cd1ff27d08d7/children
req id=3:35 proxy=in  src=10.55.2.8:23258 dst=10.55.3.54:5000 tls=no_tls_from_remote :method=GET :authority=api.nextgen.devcloud.ibwave.com :path=/assets-service/v1/assets/38490470-5938-4285-81f9-cd1ff27d08d7/children
req id=3:36 proxy=out src=10.55.3.54:34666 dst=10.55.2.45:5000 tls=true :method=POST :authority=permissions-service:5000 :path=/internal/v1/policies/authorize
req id=3:37 proxy=in  src=10.55.1.48:57852 dst=10.55.3.54:5000 tls=no_tls_from_remote :method=GET :authority=api.nextgen.devcloud.ibwave.com :path=/assets-service/v1/assets/d88fbc00-2bf6-9cf5-eef2-a57362f74f73/children
req id=3:38 proxy=in  src=10.55.2.8:25961 dst=10.55.3.54:5000 tls=no_tls_from_remote :method=GET :authority=api.nextgen.devcloud.ibwave.com :path=/assets-service/v1/assets/a9b52b00-582a-502b-35f0-be52924b7331
req id=3:39 proxy=out src=10.55.3.54:45306 dst=10.55.2.45:5000 tls=true :method=POST :authority=permissions-service:5000 :path=/internal/v1/policies/authorize

so now how do i modify my service-profile to get the routes which have different "authority" value?

@alpeb
Copy link
Member

alpeb commented Apr 25, 2024

The hostname in the authority must match the first segment of the ServiceProfile name. So you need to create separate ServiceProfiles for each authority, like "permissions-service.<namespace>.svc.cluster.local", "storage-service.<namespace>.svc.cluster.local", etc. When using linkerd viz tap make sure to add the -o wide flag to see if requests are getting associated with a route (should be visible in the rt_route annotation).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants