Tag job pods with context information #248

fmg-john · 2025-09-03T01:31:21Z

Context

I have several runner scale sets running in my Kubernetes cluster, where each scale set allocates different resources to the workflow pods that it manages. Workflow jobs are then able to effectively specify a runner size to run on (i.e small, medium, large etc) and the cluster will scale out the nodes to accommodate the demand of the runner workloads.

I want to ensure that workflow jobs are selecting appropriate runner sizes to prevent resource wastage via unused compute and excess workload based dynamic scaling. To achieve this I want to gather metrics on runner resource utilization and be able to tie those metrics back to the repo/workflow/job that initiated the run from some reporting/visualization tool (i.e Grafana). Currently describing the pods does not give any indication of the what repo/workflow/trigger etc is related to that pod.

Additions

Change the debug log that outputs the job container image to be an info log, the official GitHub hosted runners output the image used for the job without needing debug to be enabled, changing this debug to be an info updates the output to be closer to the official GitHub runners.
Add labels to job and step pods, with the prefix arc-context- to provide additional information about the context

eg:

Labels:
  arc-context-event-name=pull_request
  arc-context-job=build-services
  arc-context-repository=example-repo
  arc-context-repository-owner=example-org
  arc-context-run-attempt=1
  arc-context-run-id=16253714786
  arc-context-run-number=23204
  arc-context-sha=ae9b1b887bf31a940a5c21d59b789fed9d659f15
  arc-context-workflow=BuildApplication

Benefits

Gain additional information when debugging, when describing failing pods, it is now trivial to determine the repo/workflow that triggered the run.
Get a better view of the metrics, build better dashboards. i.e view CPU/Memory usage grouped by repo, workflow, job and compare usage to allocation

I have been running a custom build of the runner container hooks internally now for several months now that includes these changes, they've been instrumental in optimizing the resource usage and cost of the GitHub runners.

fmg-john requested a review from nikola-jokic as a code owner September 3, 2025 01:31

John Dreyer added 3 commits September 3, 2025 09:35

feat: label job pods with context info

a3d6427

feat: label step pods

d0b8334

resolve conflicts

d2d2df4

fmg-john force-pushed the tag-job-pods branch from ce0c86e to d2d2df4 Compare September 3, 2025 01:40

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Tag job pods with context information #248

Tag job pods with context information #248

Uh oh!

fmg-john commented Sep 3, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Tag job pods with context information #248

Are you sure you want to change the base?

Tag job pods with context information #248

Uh oh!

Conversation

fmg-john commented Sep 3, 2025

Context

Additions

Benefits

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant