pytest plugin to run the tests with support of pyspark (Apache Spark).
This plugin will allow to specify SPARK_HOME directory in pytest.ini
and thus to make "pyspark" importable in your tests which are executed
by pytest.
You can also define "spark_options" in pytest.ini to customize pyspark,
including "spark.jars.packages" option which allows to load external
libraries (e.g. "com.databricks:spark-xml").
pytest-spark provides session scope fixtures spark_context and
spark_session which can be used in your tests.
$ pip install pytest-sparkTo run tests with required spark_home location you need to define it by using one of the following methods:
Specify command line option "--spark_home":
$ pytest --spark_home=/opt/spark
Add "spark_home" value to
pytest.iniin your project directory:[pytest] spark_home = /opt/spark
Set the "SPARK_HOME" environment variable.
pytest-spark will try to import pyspark from provided location.
Note
"spark_home" will be read in the specified order. i.e. you can
override pytest.ini value by command line option.
Just define "spark_options" in your pytest.ini, e.g.:
[pytest]
spark_home = /opt/spark
spark_options =
spark.app.name: my-pytest-spark-tests
spark.executor.instances: 1
spark.jars.packages: com.databricks:spark-xml_2.12:0.5.0
Use fixture spark_context in your tests as a regular pyspark fixture.
SparkContext instance will be created once and reused for the whole test
session.
Example:
def test_my_case(spark_context):
test_rdd = spark_context.parallelize([1, 2, 3, 4])
# ...
Use fixture spark_session in your tests as a regular pyspark fixture.
A SparkSession instance with Hive support enabled will be created once and reused for the whole test
session.
Example:
def test_spark_session_dataframe(spark_session):
test_df = spark_session.createDataFrame([[1,3],[2,4]], "a: int, b: int")
# ...