<?xml version="1.0" encoding="UTF-8"?>
<rss xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:taxo="http://purl.org/rss/1.0/modules/taxonomy/" version="2.0">
  <channel>
    <title>topic Re: Dense rank possible bug in Data Engineering</title>
    <link>https://community.databricks.com/t5/data-engineering/dense-rank-possible-bug/m-p/37987#M26533</link>
    <description>&lt;P&gt;Hey&amp;nbsp;&lt;a href="https://community.databricks.com/t5/user/viewprofilepage/user-id/84961"&gt;@Łukasz&lt;/a&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;Thanks for reporting.&lt;/P&gt;&lt;P&gt;As I see Spark 3.4.0 introduced an improvement that looks to be the cause for this issue.&lt;/P&gt;&lt;P&gt;Improvement:&amp;nbsp;&lt;A href="https://issues.apache.org/jira/browse/SPARK-37099" target="_blank"&gt;https://issues.apache.org/jira/browse/SPARK-37099&lt;/A&gt;&lt;/P&gt;&lt;P&gt;Similar Bug:&amp;nbsp;&lt;A href="https://issues.apache.org/jira/browse/SPARK-44448" target="_blank"&gt;https://issues.apache.org/jira/browse/SPARK-44448&lt;/A&gt;&lt;/P&gt;&lt;P&gt;This improvement [&lt;A href="https://issues.apache.org/jira/browse/SPARK-37099" target="_blank"&gt;SPARK-37099&lt;/A&gt;] is included as part of DBR 13.1:&amp;nbsp;&lt;A href="https://docs.databricks.com/release-notes/runtime/13.1.html" target="_blank"&gt;https://docs.databricks.com/release-notes/runtime/13.1.html&lt;/A&gt;&lt;/P&gt;&lt;P&gt;That is the reason you are seeing this in DBR 13.1&lt;/P&gt;&lt;P&gt;As I have verified internally this seems to be fixed in DBR 13.1. I would request you to test it again once and let us know.&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;</description>
    <pubDate>Wed, 19 Jul 2023 19:37:11 GMT</pubDate>
    <dc:creator>saipujari_spark</dc:creator>
    <dc:date>2023-07-19T19:37:11Z</dc:date>
    <item>
      <title>Dense rank possible bug</title>
      <link>https://community.databricks.com/t5/data-engineering/dense-rank-possible-bug/m-p/37861#M26486</link>
      <description>&lt;P&gt;I have the case of deduplicating data source over specific business key using dense_rank function. Currently the data source does not have any duplicates, so the function should return 1 in all cases. The issue is that dense rank does not return proper integer, although data type is of integer:&lt;/P&gt;&lt;UL&gt;&lt;LI&gt;When filtering rank function equal to 1, it gives me "random" number of records. Most of dense_rank values with display values of 1 are getting dropped&lt;/LI&gt;&lt;LI&gt;When filtering rank &amp;lt; 1.1 it gives me the same results as above&lt;/LI&gt;&lt;LI&gt;When filtering rank &amp;gt; 0.9 it gives me the expected amount of rows&lt;/LI&gt;&lt;LI&gt;When casting rank function to double and then filtering it as equal to 1, it gives me expected number of rows&lt;/LI&gt;&lt;/UL&gt;&lt;P&gt;&amp;nbsp;It happens on databricks runtime 13.1, so I am assuming spark 3.4 has this issue. It works with no problem with runtime 12.2&lt;/P&gt;</description>
      <pubDate>Tue, 18 Jul 2023 12:02:06 GMT</pubDate>
      <guid>https://community.databricks.com/t5/data-engineering/dense-rank-possible-bug/m-p/37861#M26486</guid>
      <dc:creator>Łukasz</dc:creator>
      <dc:date>2023-07-18T12:02:06Z</dc:date>
    </item>
    <item>
      <title>Re: Dense rank possible bug</title>
      <link>https://community.databricks.com/t5/data-engineering/dense-rank-possible-bug/m-p/37865#M26487</link>
      <description>&lt;P&gt;Could you share a code snippet of how you are applying the rank function?&lt;/P&gt;</description>
      <pubDate>Tue, 18 Jul 2023 12:13:02 GMT</pubDate>
      <guid>https://community.databricks.com/t5/data-engineering/dense-rank-possible-bug/m-p/37865#M26487</guid>
      <dc:creator>Lakshay</dc:creator>
      <dc:date>2023-07-18T12:13:02Z</dc:date>
    </item>
    <item>
      <title>Re: Dense rank possible bug</title>
      <link>https://community.databricks.com/t5/data-engineering/dense-rank-possible-bug/m-p/37867#M26489</link>
      <description>&lt;DIV&gt;&lt;DIV&gt;&lt;SPAN&gt;&amp;nbsp; SELECT * &lt;/SPAN&gt;&lt;SPAN&gt;except&lt;/SPAN&gt;&lt;SPAN&gt;(d.AssessmentNo, d.UnitClassSup, d.UnitTypeSup, d.UnitCodeSup, d.ProdUnitNo, d.QuestionAnswerId, d.hash_value, d.load_date) , dense_rank() OVER (PARTITION BY m.UnitClassSup, m.UnitTypeSup, m.UnitCodeSup, m.AssessmentYear, m.ProdUnitNo ORDER BY m.UpdDtime DESC, m.AnswerUpdDate DESC, m.QuestionAnswerId DESC) AS Rk &lt;/SPAN&gt;&lt;/DIV&gt;&lt;DIV&gt;&lt;SPAN&gt;&amp;nbsp; &amp;nbsp; FROM delta.`/mnt/silver/path_main` m&lt;/SPAN&gt;&lt;/DIV&gt;&lt;DIV&gt;&lt;SPAN&gt;&amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; JOIN delta.`/mnt/silver/path_detail` d&lt;/SPAN&gt;&lt;/DIV&gt;&lt;DIV&gt;&lt;SPAN&gt;&amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; ON (ms.AssessmentNo = qe.AssessmentNo &lt;/SPAN&gt;&lt;/DIV&gt;&lt;DIV&gt;&lt;SPAN&gt;&amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; AND ms.UnitClassSup = qe.UnitClassSup &lt;/SPAN&gt;&lt;/DIV&gt;&lt;DIV&gt;&lt;SPAN&gt;&amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; AND ms.UnitTypeSup = qe.UnitTypeSup &lt;/SPAN&gt;&lt;/DIV&gt;&lt;DIV&gt;&lt;SPAN&gt;&amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; AND ms.UnitCodeSup = qe.UnitCodeSup &lt;/SPAN&gt;&lt;/DIV&gt;&lt;DIV&gt;&lt;SPAN&gt;&amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; AND ms.ProdUnitNo = qe.ProdUnitNo &lt;/SPAN&gt;&lt;/DIV&gt;&lt;DIV&gt;&lt;SPAN&gt;&amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; AND ms.QuestionAnswerId = qe.QuestionAnswerId )&lt;/SPAN&gt;&lt;/DIV&gt;&lt;DIV&gt;&amp;nbsp;&lt;/DIV&gt;&lt;DIV&gt;This is saved as cte and then queried with filter rk = 1&lt;/DIV&gt;&lt;/DIV&gt;</description>
      <pubDate>Tue, 18 Jul 2023 12:45:39 GMT</pubDate>
      <guid>https://community.databricks.com/t5/data-engineering/dense-rank-possible-bug/m-p/37867#M26489</guid>
      <dc:creator>Łukasz</dc:creator>
      <dc:date>2023-07-18T12:45:39Z</dc:date>
    </item>
    <item>
      <title>Re: Dense rank possible bug</title>
      <link>https://community.databricks.com/t5/data-engineering/dense-rank-possible-bug/m-p/37978#M26528</link>
      <description>&lt;P&gt;I tried running a dense rank query using DBR 13.1. But I do not see this issue. Could you try a simple dense rank query on a table&lt;BR /&gt;&lt;BR /&gt;&lt;/P&gt;</description>
      <pubDate>Wed, 19 Jul 2023 18:27:22 GMT</pubDate>
      <guid>https://community.databricks.com/t5/data-engineering/dense-rank-possible-bug/m-p/37978#M26528</guid>
      <dc:creator>Lakshay</dc:creator>
      <dc:date>2023-07-19T18:27:22Z</dc:date>
    </item>
    <item>
      <title>Re: Dense rank possible bug</title>
      <link>https://community.databricks.com/t5/data-engineering/dense-rank-possible-bug/m-p/37987#M26533</link>
      <description>&lt;P&gt;Hey&amp;nbsp;&lt;a href="https://community.databricks.com/t5/user/viewprofilepage/user-id/84961"&gt;@Łukasz&lt;/a&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;Thanks for reporting.&lt;/P&gt;&lt;P&gt;As I see Spark 3.4.0 introduced an improvement that looks to be the cause for this issue.&lt;/P&gt;&lt;P&gt;Improvement:&amp;nbsp;&lt;A href="https://issues.apache.org/jira/browse/SPARK-37099" target="_blank"&gt;https://issues.apache.org/jira/browse/SPARK-37099&lt;/A&gt;&lt;/P&gt;&lt;P&gt;Similar Bug:&amp;nbsp;&lt;A href="https://issues.apache.org/jira/browse/SPARK-44448" target="_blank"&gt;https://issues.apache.org/jira/browse/SPARK-44448&lt;/A&gt;&lt;/P&gt;&lt;P&gt;This improvement [&lt;A href="https://issues.apache.org/jira/browse/SPARK-37099" target="_blank"&gt;SPARK-37099&lt;/A&gt;] is included as part of DBR 13.1:&amp;nbsp;&lt;A href="https://docs.databricks.com/release-notes/runtime/13.1.html" target="_blank"&gt;https://docs.databricks.com/release-notes/runtime/13.1.html&lt;/A&gt;&lt;/P&gt;&lt;P&gt;That is the reason you are seeing this in DBR 13.1&lt;/P&gt;&lt;P&gt;As I have verified internally this seems to be fixed in DBR 13.1. I would request you to test it again once and let us know.&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;</description>
      <pubDate>Wed, 19 Jul 2023 19:37:11 GMT</pubDate>
      <guid>https://community.databricks.com/t5/data-engineering/dense-rank-possible-bug/m-p/37987#M26533</guid>
      <dc:creator>saipujari_spark</dc:creator>
      <dc:date>2023-07-19T19:37:11Z</dc:date>
    </item>
    <item>
      <title>Re: Dense rank possible bug</title>
      <link>https://community.databricks.com/t5/data-engineering/dense-rank-possible-bug/m-p/38011#M26541</link>
      <description>&lt;P&gt;Hello &lt;a href="https://community.databricks.com/t5/user/viewprofilepage/user-id/63249"&gt;@Saniam&lt;/a&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;Thanks for answer, I have just tested and it seems to be working fine both in 13.1 and 13.2&lt;/P&gt;&lt;P&gt;On the other note, can you help me understand how the releases are done for spark? The one that you mention is said to be released in 3.5, which should come in new databricks runtime release.&lt;/P&gt;&lt;P&gt;Kind regards,&lt;/P&gt;&lt;P&gt;Łukasz&lt;/P&gt;</description>
      <pubDate>Thu, 20 Jul 2023 07:57:38 GMT</pubDate>
      <guid>https://community.databricks.com/t5/data-engineering/dense-rank-possible-bug/m-p/38011#M26541</guid>
      <dc:creator>Łukasz</dc:creator>
      <dc:date>2023-07-20T07:57:38Z</dc:date>
    </item>
    <item>
      <title>Re: Dense rank possible bug</title>
      <link>https://community.databricks.com/t5/data-engineering/dense-rank-possible-bug/m-p/38509#M26654</link>
      <description>&lt;P&gt;Hey&amp;nbsp;&lt;a href="https://community.databricks.com/t5/user/viewprofilepage/user-id/84961"&gt;@Łukasz&lt;/a&gt;&amp;nbsp;it's because any fixes which are important are backported to older spark versions in DBR, that's the reason you see this fixed in DBR 13.1&lt;/P&gt;</description>
      <pubDate>Wed, 26 Jul 2023 15:24:37 GMT</pubDate>
      <guid>https://community.databricks.com/t5/data-engineering/dense-rank-possible-bug/m-p/38509#M26654</guid>
      <dc:creator>saipujari_spark</dc:creator>
      <dc:date>2023-07-26T15:24:37Z</dc:date>
    </item>
  </channel>
</rss>

