Databricks Community

kazinahian · ‎09-20-2023

Hello:

Great people. I am new to Databricks and pyspark learning. How can I create a new column called "sub_total"? Where I want to group by "category" "subcategory" and "monthly" sales value.

Appreciate your empathic solution.

Miguel_Suarez · ‎01-22-2025

Hi @kazinahian,

I believe what you're looking for is the .withColumn() Dataframe method in PySpark. It will allow you to create a new column with aggregations on other columns: https://docs.databricks.com/en/pyspark/basics.html#create-columns

Best

NandiniN · ‎01-31-2025

I want to group by "category" "subcategory" and "monthly" sales value.

sub_total_df = df.groupBy("category", "subcategory", "monthly").agg(sum("sales_value").alias("sub_total"))

You could always type in your query in the Databricks notebook, by clicking on the generate link in cell, which will help you with Databricks Assistant.