cancel
Showing results forย 
Search instead forย 
Did you mean:ย 
Community Discussions
Connect with fellow community members to discuss general topics related to the Databricks platform, industry trends, and best practices. Share experiences, ask questions, and foster collaboration within the community.
cancel
Showing results forย 
Search instead forย 
Did you mean:ย 

Pivot on multiple columns

memo
New Contributor II

I want to pass multiple column as argument to pivot a dataframe  in pyspark pivot like

mydf.groupBy("id").pivot("day","city").agg(F.sum("price").alias("price"),F.sum("units").alias("units")).show().
 
One way I found is to create multiple df with different pivot and join them which will result in multiple scan. But is there any other way to do this?
3 REPLIES 3

Kaniz_Fatma
Community Manager
Community Manager

Hi @memo , Certainly! You can achieve a more compact solution for pivoting multiple columns in PySpark. Instead of creating separate dataframes for each pivot, you can use a function to handle the pivoting process.

memo
New Contributor II

Like how will the pass multiple values to the pivot function? It only takes one argument. I tried with sending an array, list. But it is throwing errors

Kaniz_Fatma
Community Manager
Community Manager

Hi @memo , Letโ€™s call this function pivot_udf. Hereโ€™s how you can implement it:

from pyspark.sql import functions as F

def pivot_udf(df, *cols):
    mydf = df.select('id').drop_duplicates()
    for c in cols:
        mydf = mydf.join(
            df.withColumn('combcol', F.concat(F.lit(f'{c}_'), df['day']))
            .groupby('id')
            .pivot('combcol')
            .agg(F.first(c)),
            'id'
        )
    return mydf

# Example usage:
d = [
    (100, 1, 23, 10),
    (100, 2, 45, 11),
    # ... other data ...
]
mydf = spark.createDataFrame(d, ['id', 'day', 'price', 'units'])

# Pivot on 'price' and 'units'
result_df = pivot_udf(mydf, 'price', 'units')
result_df.show()

The resulting DataFrame will have columns for each combination of day and column (e.g., price_1, price_2, units_1, etc.). 

This approach avoids creating separate DataFrames for each pivot and joining them, leading to a more...

Feel free to adapt the pivot_udf function to your specific use case by adding more columns as needed. Happy pivoting! ๐Ÿš€

Join 100K+ Data Experts: Register Now & Grow with Us!

Excited to expand your horizons with us? Click here to Register and begin your journey to success!

Already a member? Login and join your local regional user group! If there isn’t one near you, fill out this form and we’ll create one for you to join!