<?xml version="1.0" encoding="UTF-8"?>
<rss xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:taxo="http://purl.org/rss/1.0/modules/taxonomy/" version="2.0">
  <channel>
    <title>topic Need help in a pyspark code in Databricks to calculate a new measure column. in Data Engineering</title>
    <link>https://community.databricks.com/t5/data-engineering/need-help-in-a-pyspark-code-in-databricks-to-calculate-a-new/m-p/26583#M18610</link>
    <description>&lt;P&gt;Details of the requirement is as below:&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;I have a table with below structure:&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;&lt;span class="lia-inline-image-display-wrapper" image-alt="Sample Data"&gt;&lt;img src="https://community.databricks.com/t5/image/serverpage/image-id/1339i45E0D4AD85D21E85/image-size/large?v=v2&amp;amp;px=999" role="button" title="Sample Data" alt="Sample Data" /&gt;&lt;/span&gt;So i have to write a code in pyspark to calculate a new column.&lt;/P&gt;&lt;P&gt;Logic for new column is Sum of &lt;B&gt;Magnitude &lt;/B&gt;for different &lt;B&gt;Categories &lt;/B&gt;divided by the total &lt;B&gt;Magnitude&lt;/B&gt;.And it should be multiplied with 100 to show it in percentage.&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;For Example for &lt;B&gt;Category OAE &lt;/B&gt;New Column should show (23.98+50.54+84.95)/Sum(Total Magnitude).&lt;/P&gt;&lt;P&gt;So there should be one row for each Date and Category.&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;Please help me in framing the code.&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;Let me know if you have any question.&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;I have attached sample data in the excel.&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;I am struct at this code. Basically how to divide the Sum for Each Category with the total sum of Magnitude.&lt;/P&gt;&lt;P&gt;import pyspark&lt;/P&gt;&lt;P&gt;from pyspark.sql import SparkSession&lt;/P&gt;&lt;P&gt;from pyspark.sql.functions import col,sum&lt;/P&gt;&lt;P&gt;from pyspark.sql import Window&lt;/P&gt;&lt;P&gt;from pyspark.sql import functions &lt;/P&gt;&lt;P&gt;df = sqlContext.sql(" select * from Table")&lt;/P&gt;&lt;P&gt;df1=df.withColumn("NewColumn",functions.sum("Magnitude").over(Window.partitionBy("Category")))&lt;/P&gt;&lt;P&gt;display(df1)&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;Thanks&lt;/P&gt;&lt;P&gt;Faizan&lt;/P&gt;&lt;P&gt;&lt;/P&gt;</description>
    <pubDate>Wed, 19 Oct 2022 11:33:18 GMT</pubDate>
    <dc:creator>farefin</dc:creator>
    <dc:date>2022-10-19T11:33:18Z</dc:date>
    <item>
      <title>Need help in a pyspark code in Databricks to calculate a new measure column.</title>
      <link>https://community.databricks.com/t5/data-engineering/need-help-in-a-pyspark-code-in-databricks-to-calculate-a-new/m-p/26583#M18610</link>
      <description>&lt;P&gt;Details of the requirement is as below:&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;I have a table with below structure:&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;&lt;span class="lia-inline-image-display-wrapper" image-alt="Sample Data"&gt;&lt;img src="https://community.databricks.com/t5/image/serverpage/image-id/1339i45E0D4AD85D21E85/image-size/large?v=v2&amp;amp;px=999" role="button" title="Sample Data" alt="Sample Data" /&gt;&lt;/span&gt;So i have to write a code in pyspark to calculate a new column.&lt;/P&gt;&lt;P&gt;Logic for new column is Sum of &lt;B&gt;Magnitude &lt;/B&gt;for different &lt;B&gt;Categories &lt;/B&gt;divided by the total &lt;B&gt;Magnitude&lt;/B&gt;.And it should be multiplied with 100 to show it in percentage.&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;For Example for &lt;B&gt;Category OAE &lt;/B&gt;New Column should show (23.98+50.54+84.95)/Sum(Total Magnitude).&lt;/P&gt;&lt;P&gt;So there should be one row for each Date and Category.&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;Please help me in framing the code.&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;Let me know if you have any question.&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;I have attached sample data in the excel.&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;I am struct at this code. Basically how to divide the Sum for Each Category with the total sum of Magnitude.&lt;/P&gt;&lt;P&gt;import pyspark&lt;/P&gt;&lt;P&gt;from pyspark.sql import SparkSession&lt;/P&gt;&lt;P&gt;from pyspark.sql.functions import col,sum&lt;/P&gt;&lt;P&gt;from pyspark.sql import Window&lt;/P&gt;&lt;P&gt;from pyspark.sql import functions &lt;/P&gt;&lt;P&gt;df = sqlContext.sql(" select * from Table")&lt;/P&gt;&lt;P&gt;df1=df.withColumn("NewColumn",functions.sum("Magnitude").over(Window.partitionBy("Category")))&lt;/P&gt;&lt;P&gt;display(df1)&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;Thanks&lt;/P&gt;&lt;P&gt;Faizan&lt;/P&gt;&lt;P&gt;&lt;/P&gt;</description>
      <pubDate>Wed, 19 Oct 2022 11:33:18 GMT</pubDate>
      <guid>https://community.databricks.com/t5/data-engineering/need-help-in-a-pyspark-code-in-databricks-to-calculate-a-new/m-p/26583#M18610</guid>
      <dc:creator>farefin</dc:creator>
      <dc:date>2022-10-19T11:33:18Z</dc:date>
    </item>
    <item>
      <title>Re: Need help in a pyspark code in Databricks to calculate a new measure column.</title>
      <link>https://community.databricks.com/t5/data-engineering/need-help-in-a-pyspark-code-in-databricks-to-calculate-a-new/m-p/26584#M18611</link>
      <description>&lt;P&gt;df1=df.withColumn("NewColumn",functions.sum("Magnitude").over(Window.partitionBy("Category"))&lt;/P&gt;&lt;P&gt;/functions.sum("Magnitude").over(Window.partitionBy(functions.lit("1"))))&lt;/P&gt;</description>
      <pubDate>Tue, 01 Nov 2022 03:25:08 GMT</pubDate>
      <guid>https://community.databricks.com/t5/data-engineering/need-help-in-a-pyspark-code-in-databricks-to-calculate-a-new/m-p/26584#M18611</guid>
      <dc:creator>Soma</dc:creator>
      <dc:date>2022-11-01T03:25:08Z</dc:date>
    </item>
    <item>
      <title>Re: Need help in a pyspark code in Databricks to calculate a new measure column.</title>
      <link>https://community.databricks.com/t5/data-engineering/need-help-in-a-pyspark-code-in-databricks-to-calculate-a-new/m-p/26585#M18612</link>
      <description>&lt;P&gt;Hi @Faizan Arefin​&amp;nbsp;&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;Hope all is well! Just wanted to check in if you were able to resolve your issue and would you be happy to share the solution or &lt;B&gt;mark an answer as best&lt;/B&gt;? Else please let us know if you need more help.&amp;nbsp;&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;We'd love to hear from you.&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;Thanks!&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;&lt;/P&gt;</description>
      <pubDate>Sun, 27 Nov 2022 13:36:25 GMT</pubDate>
      <guid>https://community.databricks.com/t5/data-engineering/need-help-in-a-pyspark-code-in-databricks-to-calculate-a-new/m-p/26585#M18612</guid>
      <dc:creator>Anonymous</dc:creator>
      <dc:date>2022-11-27T13:36:25Z</dc:date>
    </item>
  </channel>
</rss>

