Skip to content

Conversation

ForVic
Copy link

@ForVic ForVic commented Aug 18, 2025

What changes were proposed in this pull request?

  1. Provide a way for users to configure applications which run on kubernetes to store diagnostics.

Why are the changes needed?

For jobs which run on kubernetes there is no native concept of diagnostics (like there is in YARN), which means that for debugging and triaging errors users must go to logs. For jobs which run on YARN this is often not necessary, since the diagnostics contains the root cause reason for failure. Additionally, for platforms which provide automation of failure insights, or make decisions based on failures, there must be a custom solution or deciding why the application failed (e.g. log and stack trace parsing).

We use a similar mechanism as #23599 to load custom implementations in order to avoid the dependency on the k8s module from SparkSubmit.

Does this PR introduce any user-facing change?

Yes, a config, which is defaulted to false.

How was this patch tested?

unit tested + verified in production k8s cluster.

Was this patch authored or co-authored using generative AI tooling?

No

@ForVic ForVic changed the title [WIP][K8S] Optionally capture diagnostics for jobs on Kubernetes [SPARK-53335][K8S] Optionally capture diagnostics for jobs on Kubernetes Aug 20, 2025
@ForVic ForVic marked this pull request as ready for review August 20, 2025 03:50
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant