cancel
Showing results for 
Search instead for 
Did you mean: 
Data Engineering
cancel
Showing results for 
Search instead for 
Did you mean: 

How can I create a new calculated field in databricks by using pyspark.

kazinahian
New Contributor III

Hello:

Great people. I am new to Databricks and pyspark learning. How can I create a new column called "sub_total"? Where I want to group by "category" "subcategory" and "monthly" sales value. 

Appreciate your empathic solution. 

1 ACCEPTED SOLUTION

Accepted Solutions

Kaniz
Community Manager
Community Manager

Hi @kazinahian, To create a new column called "sub_total" where you want to group by "category", "subcategory", and "monthly" sales value, you can use the groupBy().applyInPandas() function in PySpark. This function implements the "split-apply-combine" pattern, where the data is first split into groups, a process is applied to each group, and the results are combined into a new DataFrame.

View solution in original post

1 REPLY 1

Kaniz
Community Manager
Community Manager

Hi @kazinahian, To create a new column called "sub_total" where you want to group by "category", "subcategory", and "monthly" sales value, you can use the groupBy().applyInPandas() function in PySpark. This function implements the "split-apply-combine" pattern, where the data is first split into groups, a process is applied to each group, and the results are combined into a new DataFrame.

Welcome to Databricks Community: Lets learn, network and celebrate together

Join our fast-growing data practitioner and expert community of 80K+ members, ready to discover, help and collaborate together while making meaningful connections. 

Click here to register and join today! 

Engage in exciting technical discussions, join a group with your peers and meet our Featured Members.