How can I create a new calculated field in databricks by using pyspark.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
09-20-2023 02:21 PM
Hello:
Great people. I am new to Databricks and pyspark learning. How can I create a new column called "sub_total"? Where I want to group by "category" "subcategory" and "monthly" sales value.
Appreciate your empathic solution.
- Labels:
-
Delta Lake
-
Spark
-
Workflows
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
3 weeks ago
Hi @kazinahian,
I believe what you're looking for is the .withColumn() Dataframe method in PySpark. It will allow you to create a new column with aggregations on other columns: https://docs.databricks.com/en/pyspark/basics.html#create-columns
Best
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
2 weeks ago
I want to group by "category" "subcategory" and "monthly" sales value.
sub_total_df = df.groupBy("category", "subcategory", "monthly").agg(sum("sales_value").alias("sub_total"))
You could always type in your query in the Databricks notebook, by clicking on the generate link in cell, which will help you with Databricks Assistant.
![](/skins/images/97567C72181EBE789E1F0FD869E4C89B/responsive_peak/images/icon_anonymous_message.png)
![](/skins/images/97567C72181EBE789E1F0FD869E4C89B/responsive_peak/images/icon_anonymous_message.png)