Improve KubernetesExecutor Observability #39215
Labels
good first issue
kind:feature
Feature Requests
pending-response
provider:cncf-kubernetes
Kubernetes provider related issues
Description
During our adoption of Airflow, the scheduler might create hundreds of pods during main scheduling loop. I propose to add two kind of metrics: the response code of k8s client and latency of creating/patching/deleting the pod.
Use case/motivation
Airflow executor create one pod for each individual task. During peak time, we saw 800+ tasks were scheduled and the latency of underlying K8s API increased. The executor's heartbeat might be delayed due to the creation of task pods, potentially affecting the scheduler's heartbeat. It will be good to have metrics to monitor the response code and the latency of k8s API for creating/patching/deleting the pod.
Related issues
N/A
Are you willing to submit a PR?
Code of Conduct
The text was updated successfully, but these errors were encountered: