vmalert pod restart when promrules refresh #6201

ALEX-yinhao · 2024-04-28T02:50:39Z

Is your question request related to a specific component?

vmalert

Describe the question in detail

i find the vmalert will be restart when the prometheusrules refresh. but sometimes i dont want to restart the pod of vmalert. so how can i fix it?

my vm version is : vmalert:v1.89.1

Troubleshooting docs

General - https://docs.victoriametrics.com/troubleshooting/
vmagent - https://docs.victoriametrics.com/vmagent/#troubleshooting
vmalert - https://docs.victoriametrics.com/vmalert/#troubleshooting

Haleygo · 2024-04-29T05:38:30Z

Hello,
vmalert supports hot config reload by calling /-/reload endpoint or using -configCheckInterval flag.
I'd recommend to add a config reloader sidecar in your vmalert pod, which watches the rule files and calls /-/reload when there is config update.
You can also use vm-operator to manage vmalert, which contains config reloader by default.

ALEX-yinhao · 2024-05-08T08:16:18Z

Hello, vmalert supports hot config reload by calling /-/reload endpoint or using -configCheckInterval flag. I'd recommend to add a config reloader sidecar in your vmalert pod, which watches the rule files and calls /-/reload when there is config update. You can also use vm-operator to manage vmalert, which contains config reloader by default.

thanks for your reply.

but it cant help me. my vmalert has already use vm-operator to manage, and set the flag -configCheckInterval. but pod also restart.

my helmchart config like this.

  # bare k8s deployment for vmalert
  vmalert:
    enable: true
    serviceAccount:
      # Specifies whether a service account should be created
      create: true
      # Annotations to add to the service account
      annotations: {}
      # The name of the service account to use.
      # If not set and create is true, a name is generated using the fullname template
      name: ""
    autoscaling:
      enabled: false
      minReplicas: 1
      maxReplicas: 100
      targetCPUUtilizationPercentage: 80
      # targetMemoryUtilizationPercentage: 80
    spec:
      replicaCount: 2
      image:
        repository:
        pullPolicy: Always
        tag: "v1.89.1"
      imagePullSecrets: []
      podAnnotations: {}
      podSecurityContext: {}
      #  fsGroup: 2000
      securityContext: {}
      #  capabilities:
      #    drop:
      #    - ALL
      #  readOnlyRootFilesystem: true
      #  runAsNonRoot: true
      #  runAsUser: 1000
      resources:
        limits:
          cpu: 2
          memory: 2Gi
        requests:
          cpu: 100m
          memory: 128Mi
      # Allowed values: `soft` or `hard`
      podAntiAffinityPreset: hard
      # configMap name of the prometheusRules
      promRules:
      - prometheus-app-telemetry-middleware-prometheus-rulefiles-.+
      extraArgs: {}
        # Lookback defines how far into the past to look when evaluating queries. For example, if the datasource.lookback=5m then param "time" with value now()-5m will be added to every query.
        # datasource.lookback: 5m
        # How far a value can fallback to when evaluating queries. For example, if -datasource.queryStep=15s then param "step" with value "15s" will be added to every query. If set to 0, rule's evaluation interval will be used instead. (default 5m0s)
        # datasource.queryStep: 5m
      # Interval for checking for changes in '-rule' or '-notifier.config' files. 
      # By default the checking is disabled. Send SIGHUP signal in order to force config check for changes.
      configCheckInterval: 60s
      # How often to evaluate the rules (default 1m0s)
      evaluationInterval: 30s
      # External label to be applied for each rule
      externalLabels: []
      # - "prometheus=plat-diamond-metric/diamond-monitor-prometheus"
      # - "prometheus_replica=prometheus-cluster-monitor-diamond-mo-prometheus-0"
    service:
      type: ClusterIP
      port: 8080
    notifierConfig:
      dns_sd_configs:
      - names:
          - alertmanager-operated
        type: 'A'
        port: 9093

Haleygo · 2024-05-09T07:31:28Z

@ALEX-yinhao , that's not expected.

Although, from this pic, I don't see vmalert pods got restarted when rules are modified, but pods prometheus-rulefiles-(I assum they are not vmalert) restarted. What pods prometheus-rulefiles- do here?
Do you see any logs when vmalert pod got terminated?

ALEX-yinhao · 2024-05-09T09:00:53Z

@ALEX-yinhao , that's not expected. Although, from this pic, I don't see vmalert pods got restarted when rules are modified, but pods prometheus-rulefiles-(I assum they are not vmalert) restarted. What pods prometheus-rulefiles- do here? Do you see any logs when vmalert pod got terminated?

prometheus-app-telemetry-middleware-prometheus-rulefiles- is configmap, this is created by prometheus-opeartor. in the past ,i use prometheus to archive alert. now i use vmalert instead of promethues , but want to use the rules of prometheus .
this rulefiles will be refresh all by prometheus-operator sometimes, and when the rulefiles refresh , vmalert wiil be stop and start a new pod , so you can see the pod restart count status is 0.

in the vmalert pod ,i cant see any error message. i only can see the log like this

2024-05-09T03:20:48.670Z	info	VictoriaMetrics/app/vmalert/main.go:189	service received signal terminated

Haleygo · 2024-05-10T05:59:41Z

From the log, someone is sending terminate signal to vmalert. And since there is no config-reloader in vmalert pod, I'd guess you have some external service to do it.

my helmchart config like this.
...
# configMap name of the prometheusRules
promRules:
- prometheus-app-telemetry-middleware-prometheus-rulefiles-.+

If you already mount all the rules configMap in vmalert pod, you can just call /-/reload endpoint.

And if you're using vm-opertor, I'd suggest to use vmrule[vm-operator can auto-convert prometheusRule to vmrule] and enable ruleSelector in VMAlertSpec, which brings automatically config reload.

ALEX-yinhao · 2024-05-10T08:16:41Z

From the log, someone is sending terminate signal to vmalert. And since there is no config-reloader in vmalert pod, I'd guess you have some external service to do it.

my helmchart config like this.
...

configMap name of the prometheusRules

promRules:

prometheus-app-telemetry-middleware-prometheus-rulefiles-.+

If you already mount all the rules configMap in vmalert pod, you can just call /-/reload endpoint.

And if you're using vm-opertor, I'd suggest to use vmrule[vm-operator can auto-convert prometheusRule to vmrule] and enable ruleSelector in VMAlertSpec, which brings automatically config reload.

yes，about the config-reloader, i set vmalert config reloader env in my vm-operator chart, but in the vmalert pod, i cant find the config-reloader
like this

env:
  - name: VM_VMAGENTDEFAULT_CONFIGRELOADIMAGE
    value: registry.sensetime.com/diamond/prometheus-operator/prometheus-config-reloader:v0.48.1
  - name: VM_VMAUTHDEFAULT_CONFIGRELOADIMAGE
    value: registry.sensetime.com/diamond/prometheus-operator/prometheus-config-reloader:v0.48.1
  - name: VM_VMALERTDEFAULT_CONFIGRELOADIMAGE
    value: registry.sensetime.com/diamond/jimmidyson/configmap-reload:v0.3.0
  - name: VM_PODWAITREADYTIMEOUT
    value: "180s"
  - name: VM_PODWAITREADYINTERVALCHECK
    value: "15s"
  - name: VM_PODWAITREADYINITDELAY
    value: "30s"

ALEX-yinhao added the question The question issue label Apr 28, 2024

Haleygo self-assigned this Apr 29, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

vmalert pod restart when promrules refresh #6201

vmalert pod restart when promrules refresh #6201

ALEX-yinhao commented Apr 28, 2024

Haleygo commented Apr 29, 2024

ALEX-yinhao commented May 8, 2024

Haleygo commented May 9, 2024

ALEX-yinhao commented May 9, 2024 •

edited

Haleygo commented May 10, 2024

ALEX-yinhao commented May 10, 2024

configMap name of the prometheusRules

vmalert pod restart when promrules refresh #6201

vmalert pod restart when promrules refresh #6201

Comments

ALEX-yinhao commented Apr 28, 2024

Is your question request related to a specific component?

Describe the question in detail

Troubleshooting docs

Haleygo commented Apr 29, 2024

ALEX-yinhao commented May 8, 2024

Haleygo commented May 9, 2024

ALEX-yinhao commented May 9, 2024 • edited

Haleygo commented May 10, 2024

ALEX-yinhao commented May 10, 2024

configMap name of the prometheusRules

ALEX-yinhao commented May 9, 2024 •

edited