Hi @memo , Let’s call this function pivot_udf
. Here’s how you can implement it:
from pyspark.sql import functions as F
def pivot_udf(df, *cols):
mydf = df.select('id').drop_duplicates()
for c in cols:
mydf = mydf.join(
df.withColumn('combcol', F.concat(F.lit(f'{c}_'), df['day']))
.groupby('id')
.pivot('combcol')
.agg(F.first(c)),
'id'
)
return mydf
# Example usage:
d = [
(100, 1, 23, 10),
(100, 2, 45, 11),
# ... other data ...
]
mydf = spark.createDataFrame(d, ['id', 'day', 'price', 'units'])
# Pivot on 'price' and 'units'
result_df = pivot_udf(mydf, 'price', 'units')
result_df.show()
The resulting DataFrame will have columns for each combination of day and column (e.g., price_1
, price_2
, units_1
, etc.).
This approach avoids creating separate DataFrames for each pivot and joining them, leading to a more...
Feel free to adapt the pivot_udf
function to your specific use case by adding more columns as needed. Happy pivoting! 🚀