<?xml version="1.0" encoding="UTF-8"?>
<rss xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:taxo="http://purl.org/rss/1.0/modules/taxonomy/" version="2.0">
  <channel>
    <title>topic PowerBI performance with Databricks in Data Engineering</title>
    <link>https://community.databricks.com/t5/data-engineering/powerbi-performance-with-databricks/m-p/101430#M40663</link>
    <description>&lt;P&gt;We have integrated PowerBI with Databricks to generate reports. However, PowerBI generates over 8,000 lines of code, including numerous OR clauses, which cannot be modified at this time. This results in queries that take more than 4 minutes to execute and are automatically cancelled before a plan is generated. The time required for query optimization and file pruning further delays the process, preventing the plan from being generated. As a result, we are unable to use the report with Databricks, as queries containing numerous OR clauses are either taking an excessive amount of time to execute or failing altogether.&amp;nbsp;&lt;BR /&gt;Please note that we have already implemented optimization techniques within Databricks, and our data consists of small files, such as 1 file in the DIM table and 22 files in the FACT tables. Adjusting the size of the serverless SQL warehouse has not resolved the issue.&lt;/P&gt;&lt;P&gt;If anyone has successfully addressed this issue, please share your solution.&lt;/P&gt;</description>
    <pubDate>Mon, 09 Dec 2024 08:23:56 GMT</pubDate>
    <dc:creator>Vetrivel</dc:creator>
    <dc:date>2024-12-09T08:23:56Z</dc:date>
    <item>
      <title>PowerBI performance with Databricks</title>
      <link>https://community.databricks.com/t5/data-engineering/powerbi-performance-with-databricks/m-p/101430#M40663</link>
      <description>&lt;P&gt;We have integrated PowerBI with Databricks to generate reports. However, PowerBI generates over 8,000 lines of code, including numerous OR clauses, which cannot be modified at this time. This results in queries that take more than 4 minutes to execute and are automatically cancelled before a plan is generated. The time required for query optimization and file pruning further delays the process, preventing the plan from being generated. As a result, we are unable to use the report with Databricks, as queries containing numerous OR clauses are either taking an excessive amount of time to execute or failing altogether.&amp;nbsp;&lt;BR /&gt;Please note that we have already implemented optimization techniques within Databricks, and our data consists of small files, such as 1 file in the DIM table and 22 files in the FACT tables. Adjusting the size of the serverless SQL warehouse has not resolved the issue.&lt;/P&gt;&lt;P&gt;If anyone has successfully addressed this issue, please share your solution.&lt;/P&gt;</description>
      <pubDate>Mon, 09 Dec 2024 08:23:56 GMT</pubDate>
      <guid>https://community.databricks.com/t5/data-engineering/powerbi-performance-with-databricks/m-p/101430#M40663</guid>
      <dc:creator>Vetrivel</dc:creator>
      <dc:date>2024-12-09T08:23:56Z</dc:date>
    </item>
    <item>
      <title>Re: PowerBI performance with Databricks</title>
      <link>https://community.databricks.com/t5/data-engineering/powerbi-performance-with-databricks/m-p/101442#M40667</link>
      <description>&lt;P&gt;Hi,&lt;/P&gt;
&lt;DIV class="du-bois-light-typography css-ooisui"&gt;To address this issue, here are some suggestions:&lt;/DIV&gt;
&lt;OL&gt;
&lt;LI&gt;Use the&lt;SPAN&gt;&amp;nbsp;&lt;/SPAN&gt;&lt;SPAN class="du-bois-light-typography css-v80wf5"&gt;&lt;CODE&gt;BROADCAST&lt;/CODE&gt;&lt;/SPAN&gt;&lt;SPAN&gt;&amp;nbsp;&lt;/SPAN&gt;hint to optimize the join between the DIM and FACT tables. This can help reduce the amount of data that needs to be processed and improve the performance of the query.&lt;/LI&gt;
&lt;LI&gt;Use the&lt;SPAN&gt;&amp;nbsp;&lt;/SPAN&gt;&lt;SPAN class="du-bois-light-typography css-v80wf5"&gt;&lt;CODE&gt;MERGE&lt;/CODE&gt;&lt;/SPAN&gt;&lt;SPAN&gt;&amp;nbsp;&lt;/SPAN&gt;statement to combine the OR clauses into a single query. This can help reduce the number of queries that are generated and improve the performance of the query.&lt;/LI&gt;
&lt;LI&gt;Use the&lt;SPAN&gt;&amp;nbsp;&lt;/SPAN&gt;&lt;SPAN class="du-bois-light-typography css-v80wf5"&gt;&lt;CODE&gt;OPTIMIZE&lt;/CODE&gt;&lt;/SPAN&gt;&lt;SPAN&gt;&amp;nbsp;&lt;/SPAN&gt;command to optimize the Delta tables. This can help improve the performance of the query by reducing the amount of data that needs to be read and processed.&lt;/LI&gt;
&lt;LI&gt;Use the&lt;SPAN&gt;&amp;nbsp;&lt;/SPAN&gt;&lt;SPAN class="du-bois-light-typography css-v80wf5"&gt;&lt;CODE&gt;VACUUM&lt;/CODE&gt;&lt;/SPAN&gt;&lt;SPAN&gt;&amp;nbsp;&lt;/SPAN&gt;command to remove any deleted files from the Delta tables. This can help improve the performance of the query by reducing the amount of data that needs to be read and processed.&lt;/LI&gt;
&lt;LI&gt;Use the&lt;SPAN&gt;&amp;nbsp;&lt;/SPAN&gt;&lt;SPAN class="du-bois-light-typography css-v80wf5"&gt;&lt;CODE&gt;ZORDER&lt;/CODE&gt;&lt;/SPAN&gt;&lt;SPAN&gt;&amp;nbsp;&lt;/SPAN&gt;command to optimize the layout of the Delta tables. This can help improve the performance of the query by reducing the amount of data that needs to be read and processed.&lt;/LI&gt;
&lt;/OL&gt;</description>
      <pubDate>Mon, 09 Dec 2024 09:13:36 GMT</pubDate>
      <guid>https://community.databricks.com/t5/data-engineering/powerbi-performance-with-databricks/m-p/101442#M40667</guid>
      <dc:creator>Sidhant07</dc:creator>
      <dc:date>2024-12-09T09:13:36Z</dc:date>
    </item>
    <item>
      <title>Re: PowerBI performance with Databricks</title>
      <link>https://community.databricks.com/t5/data-engineering/powerbi-performance-with-databricks/m-p/101447#M40669</link>
      <description>&lt;P&gt;&lt;a href="https://community.databricks.com/t5/user/viewprofilepage/user-id/36707"&gt;@Sidhant07&lt;/a&gt;&amp;nbsp;The issue is not related to the volume of data, as it is relatively small. Rather, the challenge lies in the time it takes to generate the plan in Databricks, which results in the process being automatically cancelled. Consequently, we are unable to retrieve the complete query from the query history. Additionally, we cannot modify the query generated by PowerBI at this time. We have already implemented liquid clustering for the FACT and DIM tables.&lt;/P&gt;</description>
      <pubDate>Mon, 09 Dec 2024 09:33:53 GMT</pubDate>
      <guid>https://community.databricks.com/t5/data-engineering/powerbi-performance-with-databricks/m-p/101447#M40669</guid>
      <dc:creator>Vetrivel</dc:creator>
      <dc:date>2024-12-09T09:33:53Z</dc:date>
    </item>
    <item>
      <title>Re: PowerBI performance with Databricks</title>
      <link>https://community.databricks.com/t5/data-engineering/powerbi-performance-with-databricks/m-p/101451#M40670</link>
      <description>&lt;P&gt;&lt;SPAN&gt;As per our analysis, “joins” are not a problem but the huge “where” clause with lot of “OR” conditions.&lt;/SPAN&gt;&lt;/P&gt;</description>
      <pubDate>Mon, 09 Dec 2024 10:06:29 GMT</pubDate>
      <guid>https://community.databricks.com/t5/data-engineering/powerbi-performance-with-databricks/m-p/101451#M40670</guid>
      <dc:creator>Vetrivel</dc:creator>
      <dc:date>2024-12-09T10:06:29Z</dc:date>
    </item>
    <item>
      <title>Re: PowerBI performance with Databricks</title>
      <link>https://community.databricks.com/t5/data-engineering/powerbi-performance-with-databricks/m-p/101486#M40688</link>
      <description>&lt;P&gt;&lt;SPAN&gt;Attached is the sample query generated by Power BI. Without the OR conditions the query runs within seconds.&lt;/SPAN&gt;&lt;/P&gt;</description>
      <pubDate>Mon, 09 Dec 2024 15:27:38 GMT</pubDate>
      <guid>https://community.databricks.com/t5/data-engineering/powerbi-performance-with-databricks/m-p/101486#M40688</guid>
      <dc:creator>Vetrivel</dc:creator>
      <dc:date>2024-12-09T15:27:38Z</dc:date>
    </item>
  </channel>
</rss>

