integration-race: bump timeout from 20 to 30 minutes #35669

pohly · 2025-10-09T17:50:41Z

There were still a few jobs runs were some tests (most recently: test/integration/scheduler_perf/misc) timed out. We could split that up a bit more, but as integration testing with race detection isn't something that needs to complete quickly it's simpler to raise the timeout.

/assign @BenTheElder

BenTheElder · 2025-10-09T18:11:45Z

config/jobs/kubernetes/sig-testing/integration.yaml

        env:
        - name: KUBE_TIMEOUT
-          value: "-timeout=20m"
+          value: "-timeout=30m"


TODO: set the job-level timeout in case this hangs, right now it's the 2h default we have config-wide

Sorry, I'm not following. Are you suggesting to increase the job timeout?

The job passes in ~70min pretty consistently: https://testgrid.k8s.io/sig-testing-canaries#integration-race-master&graph-metrics=test-duration-minutes

The problem is that scheduler_perf/misc and to a lesser extend scheduler_perf.affinity are close to the 20min per-directory limit, which leads to rare flakes.

I might have understood what you meant: I can reduce the job limit to e.g 90min safely based on how long it takes in practice. Then if each individual test times out after 30min, we abort after 90min instead of 120min.

Doesn't look like a significant change, though?

Hmm, how do I actually set the job-level timeout? https://docs.prow.k8s.io/docs/jobs/ doesn't mention it.

I see

decorate: true decoration_config: timeout: 5h

but where is that documented?

https://docs.prow.k8s.io/docs/components/pod-utilities/ mentions "decorate: true" and links to https://docs.prow.k8s.io/docs/components/deprecated/plank/

Sorry, I digress. Let's just use copy-and-paste... done.

Prow could do with revamped docs amongst other things, sig testing are really light on maintainers there at the moment.

But yes, decoration_config timeout is it.

So is the PR okay now? I already reduced the timeout to 90 minutes.

I would suggest placing it much closer to the intended timeout of the workload, but the PR is fine to merge.

Closer would be something like this:

decoration_config: timeout: 90m spec: containers: - image: us-central1-docker.pkg.dev/k8s-staging-test-infra/images/kubekins-e2e:v20250925-95b5a2c7a5-master command: - runner.sh env: - name: KUBE_TIMEOUT value: "-timeout=30m" - name: KUBE_RACE value: "-race"

IMHO that's not "close enough" to make the connection. Some comments would have been better.

It's also an unusual place for decoration_config compared to other jobs. Remember that at some point something besides timeout might need to be configured there.

I prefer to keep it as is and don't want to delay further to add comments.

/hold cancel

There were still a few jobs runs were some tests (most recently: test/integration/scheduler_perf/misc) timed out. We could split that up a bit more, but as integration testing with race detection isn't something that needs to complete quickly it's simpler to raise the timeout. To prevent accidental long job runs when this individual timeout gets reached by a higher number of packages, the job timeout gets reduced from 2h (the default) to 90m.

BenTheElder · 2025-10-16T19:42:21Z

/lgtm
/approve
/hold

k8s-ci-robot · 2025-10-16T19:42:32Z

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: BenTheElder, pohly

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

~~config/OWNERS~~ [BenTheElder]

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

k8s-ci-robot · 2025-10-17T07:56:46Z

@pohly: Updated the job-config configmap in namespace default at cluster test-infra-trusted using the following files:

key integration.yaml using file config/jobs/kubernetes/sig-testing/integration.yaml

In response to this:

There were still a few jobs runs were some tests (most recently: test/integration/scheduler_perf/misc) timed out. We could split that up a bit more, but as integration testing with race detection isn't something that needs to complete quickly it's simpler to raise the timeout.

/assign @BenTheElder

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.

k8s-ci-robot assigned BenTheElder Oct 9, 2025

k8s-ci-robot added cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. size/XS Denotes a PR that changes 0-9 lines, ignoring generated files. labels Oct 9, 2025

k8s-ci-robot requested review from Priyankasaggu11929 and rjsadow October 9, 2025 17:50

k8s-ci-robot added area/config Issues or PRs related to code in /config area/jobs sig/testing Categorizes an issue or PR as relevant to SIG Testing. labels Oct 9, 2025

BenTheElder reviewed Oct 9, 2025

View reviewed changes

pohly force-pushed the integration-race-timeout branch from dbff3b4 to 94dee86 Compare October 10, 2025 06:22

k8s-ci-robot added do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. lgtm "Looks good to me", indicates that a PR is ready to be merged. labels Oct 16, 2025

k8s-ci-robot added approved Indicates a PR has been approved by an approver from all required OWNERS files. and removed do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. labels Oct 16, 2025

k8s-ci-robot merged commit e3ec1c4 into kubernetes:master Oct 17, 2025
6 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

integration-race: bump timeout from 20 to 30 minutes #35669

integration-race: bump timeout from 20 to 30 minutes #35669

pohly commented Oct 9, 2025

Uh oh!

BenTheElder Oct 9, 2025

Uh oh!

pohly Oct 10, 2025

Uh oh!

pohly Oct 10, 2025

Uh oh!

BenTheElder Oct 16, 2025

Uh oh!

pohly Oct 16, 2025

Uh oh!

BenTheElder Oct 16, 2025

Uh oh!

pohly Oct 17, 2025

Uh oh!

BenTheElder commented Oct 16, 2025

Uh oh!

k8s-ci-robot commented Oct 16, 2025

Uh oh!

Uh oh!

k8s-ci-robot commented Oct 17, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

integration-race: bump timeout from 20 to 30 minutes #35669

integration-race: bump timeout from 20 to 30 minutes #35669

Conversation

pohly commented Oct 9, 2025

Uh oh!

BenTheElder Oct 9, 2025

Choose a reason for hiding this comment

Uh oh!

pohly Oct 10, 2025

Choose a reason for hiding this comment

Uh oh!

pohly Oct 10, 2025

Choose a reason for hiding this comment

Uh oh!

BenTheElder Oct 16, 2025

Choose a reason for hiding this comment

Uh oh!

pohly Oct 16, 2025

Choose a reason for hiding this comment

Uh oh!

BenTheElder Oct 16, 2025

Choose a reason for hiding this comment

Uh oh!

pohly Oct 17, 2025

Choose a reason for hiding this comment

Uh oh!

BenTheElder commented Oct 16, 2025

Uh oh!

k8s-ci-robot commented Oct 16, 2025

Uh oh!

Uh oh!

k8s-ci-robot commented Oct 17, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants