I noticed that when launching this bunch of code with only one action, I have three jobs that are launched.
from pyspark.sql import DataFrame from pyspark.sql.types import StructType, StructField, StringType from pyspark.sql.functions import avg
data: List = [("Diamant_1A", "TopDiamant", "300", "rouge"), ("Diamant_2B", "Diamants pour toujours", "45", "jaune"), ("Diamant_3C", "Mes diamants préférés", "78", "rouge"), ("Diamant_4D", "Diamants que j'aime", "90", "jaune"), ("Diamant_5E", "TopDiamant", "89", "bleu") ]
schema: StructType = StructType([ \ StructField("reference", StringType(), True), \ StructField("marque", StringType(), True), \ StructField("prix", StringType(), True), \ StructField("couleur", StringType(), True) ])
dataframe: DataFrame = spark.createDataFrame(data=data,schema=schema)
dataframe_filtree:DataFrame = dataframe.filter("prix > 50")
From my understanding, I should get only one. One action corresponds to one job.
I don't know why I have 3 jobs.
Here is the first one :
Here is the second one :
And this is the last one :