docs: adding example to run spark job using kubeflow pipelines v2 #12137

vikas-saxena02 · 2025-08-14T06:03:29Z

This PR adds an example to run spark jobs in kubeflow pipleines
https://github.com/vikas-saxena02/KubeflowPipelines/tree/sparkKFPExample/samples/contrib/sparkjob-kubeflowpipeline

Checklist:

You have signed off your commits
The title for your pull request (PR) should follow our title convention. Learn more about the pull request title convention used in this repository.

Signed-off-by: Vikas Saxena <[email protected]>

google-oss-prow · 2025-08-14T06:03:34Z

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by:
Once this PR has been reviewed and has the lgtm label, please assign animeshsingh for approval. For more information see the Kubernetes Code Review Process.

The full list of commands accepted by this bot can be found here.

Needs approval from an approver in each of these files:

samples/contrib/OWNERS

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

samples/contrib/sparkjob-kubeflowpipeline/README.md

juliusvonkohout · 2025-08-27T14:30:54Z

samples/contrib/sparkjob-kubeflowpipeline/README.md

+
+## vikassaxena02/vikas-kfpv2-python310-kubectl-nokfp-image:0.4 image
+This base image is used to run the spark code and generate logs for the same. 
+The dockerfile to generate the same is in `dockerDir` directory


Suggested change

The dockerfile to generate the same is in `dockerDir` directory

The containerfile to generate the same is in `containerfile` directory

juliusvonkohout · 2025-08-27T14:31:39Z

samples/contrib/sparkjob-kubeflowpipeline/dockerDir/Dockerfile

lets move this one level up and get rid of "Dockerdir" we also only support OCI containers.

juliusvonkohout · 2025-08-27T14:31:57Z

samples/contrib/sparkjob-kubeflowpipeline/dockerDir/Dockerfile

@@ -0,0 +1,15 @@
+# Use official Python 3.12 slim image
+FROM python:3.10-slim


May we use 3.12 ?

juliusvonkohout · 2025-08-27T14:33:02Z

samples/contrib/sparkjob-kubeflowpipeline/dockerDir/Dockerfile

+    rm -rf /var/lib/apt/lists/*
+
+# Install kubectl (v1.28.0)
+RUN curl -LO "https://dl.k8s.io/release/v1.28.0/bin/linux/amd64/kubectl" && \


let us use 1.33

juliusvonkohout · 2025-08-27T14:34:11Z

samples/contrib/sparkjob-kubeflowpipeline/optional/spark-operator-customrole.yaml

is this not available as default-editor per namespace ?

or pipeline runner is only needed if you do not install kubeflow platfrom

juliusvonkohout · 2025-08-27T14:34:56Z

samples/contrib/sparkjob-kubeflowpipeline/optional/spark-operator-rolebinding.yaml

is this not available as default-editor per namespace ?

or pipeline runner is only needed if you do not install kubeflow platfrom

juliusvonkohout · 2025-08-27T14:35:20Z

samples/contrib/sparkjob-kubeflowpipeline/spark_pi_pipeline.py

+"""
+
+@dsl.component(
+    base_image="vikassaxena02/vikas-kfpv2-python310-kubectl-nokfp-image:0.4"


lets use python 3.12

juliusvonkohout · 2025-08-27T14:35:35Z

samples/contrib/sparkjob-kubeflowpipeline/spark_pi_pipeline.py

+    return spec["metadata"]["name"]
+
+@dsl.component(
+    base_image="vikassaxena02/vikas-kfpv2-python310-kubectl-nokfp-image:0.4"


lets use python 3.12

juliusvonkohout · 2025-08-27T14:36:14Z

samples/contrib/sparkjob-kubeflowpipeline/README.md

+# Introduction
+This repo has bare minimum code to run a SparkPipeline on Kubeflow through Kubeflow Pipelines and the spark pipelines run in `default` namespace
+Please note that this repo uses an old version of KFP SDK to use `ResourceOp` which is deprecated in the KFP SDK v2 onwards.
+The code in the repo has been tested on a local  cluster running on `kind` that runs both `Kubeflow Piplines` and `Kubeflow SparkOperator`


Suggested change

The code in the repo has been tested on a local cluster running on `kind` that runs both `Kubeflow Piplines` and `Kubeflow SparkOperator`

The code in this repository has been tested on a local cluster running on `kind` that runs both `Kubeflow Pipelines` and `Kubeflow SparkOperator`

juliusvonkohout · 2025-08-27T14:36:49Z

samples/contrib/sparkjob-kubeflowpipeline/README.md

+
+## vikassaxena02/vikas-kfpv2-python310-kubectl-nokfp-image:0.4 image
+This base image is used to run the spark code and generate logs for the same. 
+The dockerfile to generate the same is in `dockerDir` directory


Suggested change

The dockerfile to generate the same is in `dockerDir` directory

The containerfile to generate the same is in the same directory.

juliusvonkohout · 2025-08-27T14:40:19Z

samples/contrib/sparkjob-kubeflowpipeline/spark_pi_pipeline.py

+    with open(spark_driver_logs.path, "w") as f:
+        f.write(logs)


Suggested change

with open(spark_driver_logs.path, "w") as f:

f.write(logs)

with open(spark_driver_logs.path, "w") as file:

file.write(logs)

juliusvonkohout · 2025-08-27T14:41:19Z

samples/contrib/sparkjob-kubeflowpipeline/spark_pi_pipeline.py

+    # Wait for SparkApplication to complete
+    print("Waiting for SparkApplication to complete...")
+    for attempt in range(60):
+        try:
+            get_status_cmd = [
+                "kubectl", "get", "sparkapplication", spark_app_name,
+                "-n", "default", "-o", "json"
+            ]
+            output = subprocess.check_output(get_status_cmd, text=True)
+            status_json = json.loads(output)
+            app_state = status_json.get("status", {}).get("applicationState", {}).get("state", "")
+            print(f"Attempt {attempt+1}: SparkApplication state: {app_state}")
+            if app_state in ["COMPLETED", "FAILED"]:
+                break
+        except Exception as e:
+            print("Error checking SparkApplication status:", str(e))
+        time.sleep(10)
+    else:
+        raise RuntimeError("Timed out waiting for SparkApplication to complete.")


Please use the kfp python package instead of sub process if possible.

This way you can also use the latest v2 sdk

Co-authored-by: Julius von Kohout <[email protected]> Signed-off-by: Vikas Saxena <[email protected]>

vikas-saxena02 added 2 commits August 14, 2025 15:58

adding example to run spark job using kube flow piplines v2

6165eb4

Signed-off-by: Vikas Saxena <[email protected]>

adding optional files that might be required

ff8a0e8

Signed-off-by: Vikas Saxena <[email protected]>

google-oss-prow bot requested review from animeshsingh and numerology August 14, 2025 06:03

google-oss-prow bot added the size/L label Aug 14, 2025

juliusvonkohout reviewed Aug 27, 2025

View reviewed changes

samples/contrib/sparkjob-kubeflowpipeline/README.md Outdated Show resolved Hide resolved

juliusvonkohout reviewed Aug 27, 2025

View reviewed changes

Update samples/contrib/sparkjob-kubeflowpipeline/README.md

340806a

Co-authored-by: Julius von Kohout <[email protected]> Signed-off-by: Vikas Saxena <[email protected]>

	The dockerfile to generate the same is in `dockerDir` directory
	The containerfile to generate the same is in `containerfile` directory

		@@ -0,0 +1,15 @@
		# Use official Python 3.12 slim image
		FROM python:3.10-slim

	The code in the repo has been tested on a local cluster running on `kind` that runs both `Kubeflow Piplines` and `Kubeflow SparkOperator`
	The code in this repository has been tested on a local cluster running on `kind` that runs both `Kubeflow Pipelines` and `Kubeflow SparkOperator`

docs: adding example to run spark job using kubeflow pipelines v2 #12137

Are you sure you want to change the base?

docs: adding example to run spark job using kubeflow pipelines v2 #12137

Uh oh!

Conversation

vikas-saxena02 commented Aug 14, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

google-oss-prow bot commented Aug 14, 2025

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

juliusvonkohout Aug 27, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

juliusvonkohout Aug 28, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

vikas-saxena02 commented Aug 14, 2025 •

edited

Loading

juliusvonkohout Aug 27, 2025 •

edited

Loading

juliusvonkohout Aug 28, 2025 •

edited

Loading