How can I create a new calculated field in databricks by using pyspark.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
09-20-2023 02:21 PM
Hello:
Great people. I am new to Databricks and pyspark learning. How can I create a new column called "sub_total"? Where I want to group by "category" "subcategory" and "monthly" sales value.
Appreciate your empathic solution.
- Labels:
-
Delta Lake
-
Spark
-
Workflows
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
01-22-2025 08:31 PM
Hi @kazinahian,
I believe what you're looking for is the .withColumn() Dataframe method in PySpark. It will allow you to create a new column with aggregations on other columns: https://docs.databricks.com/en/pyspark/basics.html#create-columns
Best
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
01-31-2025 11:29 PM
I want to group by "category" "subcategory" and "monthly" sales value.
sub_total_df = df.groupBy("category", "subcategory", "monthly").agg(sum("sales_value").alias("sub_total"))
You could always type in your query in the Databricks notebook, by clicking on the generate link in cell, which will help you with Databricks Assistant.

