Hello
I having an issue with using a window in pyspark. I created a simple example that reproduce the error, basically i want to define a window with a dynamic size (say from another column ) instead of a fix value
from pyspark.sql import SparkSession
from pyspark.sql.functions import col, when
from pyspark.sql.window import Window
from pyspark.sql import functions as F
# Create a Spark session
spark = SparkSession.builder.appName("example").getOrCreate()
data = [(3, 1), (3, 1), (3, 1), (3, 1), (2, 0), (2, 1), (2, 1), (2, 0)]
df = spark.createDataFrame(data, ["col1", "col2"])
print(df.show())
# Add a column 'col3' based on the second column with a dynamic window size
win = Window().orderBy("col1").rowsBetween(0, F.col("col1"))
df_with_custom_column = df.withColumn(
"col3",
F.when(
F.count("col2").over(win)
== F.col("col1"),
True
).otherwise(False)
)
df_with_custom_column.show()
I keep getting this error : ValueError: Cannot convert column into bool: please use '&' for 'and', '|' for 'or', '~' for 'not' when building DataFrame boolean expressions.
Thank you so much for your replies
Amine