You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
When using something like mimir for long-term metric retention, the podMonitor's metrics are scraped by Prometheus and directly sent to mimir. Mimir will reject massive amounts of the metrics with the following.
failed pushing to ingester mimir-distributed-ingester-zone-b-0: user=anonymous: the sample has been rejected because another sample with the same timestamp, but a different value, has already been ingested
Prometheus relabelling has been configured and it causes series to clash after the relabelling. Check the error message for information about which series has received a duplicate sample.
Disabling this podmonitor, stops the errors.
How can it be reproduced?
Install prometheus
Install mimir
Tell prometheus to remote write to mimir
Logs, error output, etc
{
"caller": "dedupe.go:112",
"component": "remote",
"count": 2000,
"err": "server returned HTTP status 400 Bad Request: failed pushing to ingester mimir-distributed-ingester-zone-b-0: user=anonymous: the sample has been rejected because another sample with the same timestamp, but a different value, has already been ingested (err-mimir-sample-duplicate-timestamp). The affected sample has timestamp 2024-05-14T22:16:17.61Z and is from series tcp_close_total{app_kubernetes_io_instance=\"kube-prometheus-stack-prometheus\", app_kubernetes_io_managed_by=\"prometheus-operator\", app_kubernetes_io_name=\"prometheus\", app_kubernetes_io_version=\"2.51.2\", apps_kubernetes_io_pod_index=\"0\", container=\"linkerd-proxy\", control_plane_ns=\"linkerd\", controller_revision_hash=\"prometheus-kube-prometheus-stack-prometheus-647889d8c\", direction=\"outbound\", dst_control_plane_ns=\"linkerd\", dst_daemonset=\"promtail\", dst_namespace=\"promtail\", dst_pod=\"promtail-f4kms\", dst_serviceaccount=\"promtail\", instance=\"10.2.25.220:4191\", job=\"linkerd/linkerd-proxy\", namespace=\"monitoring\", operator_prometheus_io_name=\"kube-prometheus-stack-prometheus\", operator_promethe",
"exemplarCount": 0,
"level": "error",
"msg": "non-recoverable error",
"remote_name": "2cbc3b",
"ts": "2024-05-14T22:16:19.070Z",
"url": "http://mimir-distributed-nginx.mimir.svc:80/api/v1/push"
}
output of linkerd check -o short
❯ linkerd check -o short
linkerd-config
--------------
× control plane CustomResourceDefinitions exist
missing grpcroutes.gateway.networking.k8s.io
see https://linkerd.io/2/checks/#l5d-existence-crd for hints
linkerd-jaeger
--------------
‼ jaeger extension proxies are up-to-date
some proxies are not running the current version:
* jaeger-injector-7566699689-44tfd (stable-2.14.10)
see https://linkerd.io/2/checks/#l5d-jaeger-proxy-cp-version for hints
‼ jaeger extension proxies and cli versions match
jaeger-injector-7566699689-44tfd running stable-2.14.10 but cli running edge-24.5.2
see https://linkerd.io/2/checks/#l5d-jaeger-proxy-cli-version for hints
linkerd-viz
-----------
‼ viz extension proxies are up-to-date
some proxies are not running the current version:
* metrics-api-7fd4bb899-5wczd (edge-24.5.1)
* metrics-api-7fd4bb899-srcxk (edge-24.5.1)
* tap-988849cc4-5drh4 (edge-24.5.1)
* tap-988849cc4-htdg5 (edge-24.5.1)
* tap-injector-84f85cb756-gglv7 (edge-24.5.1)
* tap-injector-84f85cb756-zhs2n (edge-24.5.1)
* web-5d484bb4f-xvzfs (edge-24.5.1)
* web-5d484bb4f-zmfbh (edge-24.5.1)
see https://linkerd.io/2/checks/#l5d-viz-proxy-cp-version for hints
‼ viz extension proxies and cli versions match
metrics-api-7fd4bb899-5wczd running edge-24.5.1 but cli running edge-24.5.2
see https://linkerd.io/2/checks/#l5d-viz-proxy-cli-version for hints
‼ prometheus is installed and configured correctly
missing ClusterRoles: linkerd-linkerd-viz-prometheus
see https://linkerd.io/2/checks/#l5d-viz-prometheus for hints
Status check results are ×
Environment
EKS 1.28
Possible solution
I honestly do not know enough about prometheus metric relabeling, but I can indicate that of the 40+ servicemonitors we have, only this specific podMonitor causes the errors.
Additional context
No response
Would you like to work on fixing this bug?
no
The text was updated successfully, but these errors were encountered:
What is the issue?
When using something like
mimir
for long-term metric retention, the podMonitor's metrics are scraped by Prometheus and directly sent to mimir. Mimir will reject massive amounts of the metrics with the following.Mimir Docs: https://grafana.com/docs/mimir/latest/manage/mimir-runbooks/#err-mimir-sample-duplicate-timestamp
Disabling this podmonitor, stops the errors.
How can it be reproduced?
Logs, error output, etc
output of
linkerd check -o short
Environment
EKS 1.28
Possible solution
I honestly do not know enough about prometheus metric relabeling, but I can indicate that of the 40+ servicemonitors we have, only this specific podMonitor causes the errors.
Additional context
No response
Would you like to work on fixing this bug?
no
The text was updated successfully, but these errors were encountered: