Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[gha-runner-scale-set-controller] metrics not exposed for the listener #3510

Closed
4 tasks done
isatfg opened this issue May 9, 2024 · 3 comments
Closed
4 tasks done
Labels
gha-runner-scale-set Related to the gha-runner-scale-set mode question Further information is requested

Comments

@isatfg
Copy link

isatfg commented May 9, 2024

Checks

Controller Version

0.9.1

Deployment Method

Helm

Checks

  • This isn't a question or user support case (For Q&A and community support, go to Discussions).
  • I've read the Changelog before submitting this issue and I'm sure it's not due to any recently-introduced backward-incompatible changes

To Reproduce

1. In the values.yaml enable the metrics
metrics:
  controllerManagerAddr: ":8080"
  listenerAddr: ":8080"
  listenerEndpoint: "/metrics"

2. Now port-forward to the listener pod on the port configured (8080)

3. In a browser got to localhost:8080/metrics

You will get an EOF Error.

Describe the bug

I have enabled metrics in the gha-runner-scale-set-controller
metrics: controllerManagerAddr: ":8080" listenerAddr: ":8080" listenerEndpoint: "/metrics"

I can see that the controller pod is exposing metrics on port 8080/metrics

`
gha_controller_failed_ephemeral_runners
gha_controller_pending_ephemeral_runners
gha_controller_running_ephemeral_runners
gha_controller_running_listeners

According to the documentation the listner is the owner of some metrics E.g.

gha_assigned_jobs gha_running_jobs
However these metrics are not exposed on the controller or the listner. When I port-forward to the listner and go to the metrics endpoint e.g. localhost:8080/metrics I get an error

an error occurred forwarding 8080

Describe the expected behavior

When I port-forward to the listener I should get metrics in the same way I get metrics from the controller.

Additional Context

# Default values for gha-runner-scale-set-controller.
# This is a YAML-formatted file.
# Declare variables to be passed into your templates.
labels: {}

# leaderElection will be enabled when replicaCount>1,
# So, only one replica will in charge of reconciliation at a given time
# leaderElectionId will be set to {{ define gha-runner-scale-set-controller.fullname }}.
replicaCount: 1

image:
  repository: "ghcr.io/actions/gha-runner-scale-set-controller"
  pullPolicy: IfNotPresent
  # Overrides the image tag whose default is the chart appVersion.
  tag: ""

imagePullSecrets: []
nameOverride: ""
fullnameOverride: ""

env:
## Define environment variables for the controller pod
#  - name: "ENV_VAR_NAME_1"
#    value: "ENV_VAR_VALUE_1"
#  - name: "ENV_VAR_NAME_2"
#    valueFrom:
#      secretKeyRef:
#        key: ENV_VAR_NAME_2
#        name: secret-name
#        optional: true

serviceAccount:
  # Specifies whether a service account should be created for running the controller pod
  create: true
  # Annotations to add to the service account
  annotations: {}
  # The name of the service account to use.
  # If not set and create is true, a name is generated using the fullname template
  # You can not use the default service account for this.
  name: ""

podAnnotations:
  prometheus.io/scrape: "true"
  prometheus.io/port: "8080"

podLabels: {}

podSecurityContext: {}
# fsGroup: 2000

securityContext: {}
# capabilities:
#   drop:
#   - ALL
# readOnlyRootFilesystem: true
# runAsNonRoot: true
# runAsUser: 1000

resources: {}
## We usually recommend not to specify default resources and to leave this as a conscious
## choice for the user. This also increases chances charts run on environments with little
## resources, such as Minikube. If you do want to specify resources, uncomment the following
## lines, adjust them as necessary, and remove the curly braces after 'resources:'.
# limits:
#   cpu: 100m
#   memory: 128Mi
# requests:
#   cpu: 100m
#   memory: 128Mi

nodeSelector: {}

tolerations: []

affinity: {}

# Mount volumes in the container.
volumes: []
volumeMounts: []

# Leverage a PriorityClass to ensure your pods survive resource shortages
# ref: https://kubernetes.io/docs/concepts/configuration/pod-priority-preemption/
# PriorityClass: system-cluster-critical
priorityClassName: ""

## If `metrics:` object is not provided, or commented out, the following flags 
## will be applied the controller-manager and listener pods with empty values: 
## `--metrics-addr`, `--listener-metrics-addr`, `--listener-metrics-endpoint`. 
## This will disable metrics.
##
## To enable metrics, uncomment the following lines.
metrics:
  controllerManagerAddr: ":8080"
  listenerAddr: ":8080"
  listenerEndpoint: "/metrics"

flags:
  ## Log level can be set here with one of the following values: "debug", "info", "warn", "error".
  ## Defaults to "debug".
  logLevel: "debug"
  ## Log format can be set with one of the following values: "text", "json"
  ## Defaults to "text"
  logFormat: "text"

  ## Restricts the controller to only watch resources in the desired namespace.
  ## Defaults to watch all namespaces when unset.
  # watchSingleNamespace: ""

  ## Defines how the controller should handle upgrades while having running jobs.
  ##
  ## The strategies available are:
  ## - "immediate": (default) The controller will immediately apply the change causing the
  ##   recreation of the listener and ephemeral runner set. This can lead to an
  ##   overprovisioning of runners, if there are pending / running jobs. This should not
  ##   be a problem at a small scale, but it could lead to a significant increase of
  ##   resources if you have a lot of jobs running concurrently.
  ##
  ## - "eventual": The controller will remove the listener and ephemeral runner set
  ##   immediately, but will not recreate them (to apply changes) until all
  ##   pending / running jobs have completed.
  ##   This can lead to a longer time to apply the change but it will ensure
  ##   that you don't have any overprovisioning of runners.
  updateStrategy: "immediate"

Controller Logs

https://gist.github.com/isatfg/ad4246cdd8f93a2059569885e11f8729

Runner Pod Logs

https://gist.github.com/isatfg/91b656dd58d1ebe2b2316608daf87a33
@isatfg isatfg added bug Something isn't working gha-runner-scale-set Related to the gha-runner-scale-set mode needs triage Requires review from the maintainers labels May 9, 2024
Copy link
Contributor

github-actions bot commented May 9, 2024

Hello! Thank you for filing an issue.

The maintainers will triage your issue shortly.

In the meantime, please take a look at the troubleshooting guide for bug reports.

If this is a feature request, please review our contribution guidelines.

@nikola-jokic
Copy link
Member

Hey @isatfg,

Please correct me if I'm wrong, but the error says that port forwarding is the problem. Is it possible that you tried to forward both the controller and the listener on the same port? I successfully forwarded both the controller and the listener metrics.

@nikola-jokic nikola-jokic added question Further information is requested and removed bug Something isn't working needs triage Requires review from the maintainers labels May 13, 2024
@isatfg
Copy link
Author

isatfg commented May 13, 2024

Hey @nikola-jokic

hmm, so was trying to reproduce the issue again to explain the steps and now I see metrics on the listener as expected. I honestly have no idea what happened.

So now I have the metrics enabled and I can port-forward to the controller and get controller metrics and port-forward to the listener and get listener metrics. Apologies for wasting you time

Thank you

@isatfg isatfg closed this as completed May 13, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
gha-runner-scale-set Related to the gha-runner-scale-set mode question Further information is requested
Projects
None yet
Development

No branches or pull requests

2 participants