How can I create a new calculated field in databricks by using pyspark.

kazinahian
New Contributor III

Hello:

Great people. I am new to Databricks and pyspark learning. How can I create a new column called "sub_total"? Where I want to group by "category" "subcategory" and "monthly" sales value. 

Appreciate your empathic solution. 

Miguel_Suarez
Databricks Employee
Databricks Employee

Hi @kazinahian,

I believe what you're looking for is the .withColumn() Dataframe method in PySpark. It will allow you to create a new column with aggregations on other columns: https://docs.databricks.com/en/pyspark/basics.html#create-columns

Best

NandiniN
Databricks Employee
Databricks Employee

I want to group by "category" "subcategory" and "monthly" sales value. 

sub_total_df = df.groupBy("category", "subcategory", "monthly").agg(sum("sales_value").alias("sub_total"))

You could always type in your query in the Databricks notebook, by clicking on the generate link in cell, which will help you with Databricks Assistant.