<?xml version="1.0" encoding="UTF-8"?>
<rss xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:taxo="http://purl.org/rss/1.0/modules/taxonomy/" version="2.0">
  <channel>
    <title>topic Re: Performance Issue with UC Read from Federated SQL Table vs JDBC Read from SQL Server in Data Engineering</title>
    <link>https://community.databricks.com/t5/data-engineering/performance-issue-with-uc-read-from-federated-sql-table-vs-jdbc/m-p/106949#M42658</link>
    <description>&lt;P&gt;I'm facing the exact same issue with the exact same amount of time that seems to be of waste. When running multiple federated queries in a job, the additional overhead begins to add up and makes the functionality cost prohibitive.&amp;nbsp;&lt;/P&gt;</description>
    <pubDate>Fri, 24 Jan 2025 19:37:59 GMT</pubDate>
    <dc:creator>CharlesRColas</dc:creator>
    <dc:date>2025-01-24T19:37:59Z</dc:date>
    <item>
      <title>Performance Issue with UC Read from Federated SQL Table vs JDBC Read from SQL Server</title>
      <link>https://community.databricks.com/t5/data-engineering/performance-issue-with-uc-read-from-federated-sql-table-vs-jdbc/m-p/84075#M37125</link>
      <description>&lt;P&gt;Hi everyone,&lt;/P&gt;&lt;P&gt;I'm currently facing a significant performance issue when comparing the execution times of a query sent through JDBC versus a similar query executed through Databricks SQL (using Unity Catalog to access a federated SQL table).&lt;/P&gt;&lt;P&gt;&lt;STRONG&gt;JDBC Query&lt;/STRONG&gt;:&lt;BR /&gt;jdbc_query = f"""&lt;BR /&gt;SELECT TOP 1 *&lt;BR /&gt;FROM db.schema.table&lt;BR /&gt;WHERE id = (&lt;BR /&gt;SELECT TOP 1 id&lt;BR /&gt;FROM db.schema.table2&lt;BR /&gt;)&lt;BR /&gt;AND model_id = {model_id}"""&lt;BR /&gt;&lt;EM&gt;Execution Time: ~2 seconds&lt;BR /&gt;&lt;/EM&gt;&lt;BR /&gt;&lt;STRONG&gt;Databricks SQL Query (UC):&lt;/STRONG&gt;&lt;BR /&gt;Since Databricks SQL does not support TOP, I used LIMIT:&lt;BR /&gt;uc_query = f"""&lt;BR /&gt;SELECT *&lt;BR /&gt;FROM db.schema.table&lt;BR /&gt;WHERE id =&lt;BR /&gt;( SELECT id&lt;BR /&gt;FROM db.schema.table2&lt;BR /&gt;LIMIT 1 )&lt;BR /&gt;AND model_id = {model_id}&lt;BR /&gt;LIMIT 1&lt;BR /&gt;"""&lt;BR /&gt;&lt;EM&gt;Execution Time: 6-7 minutes&lt;BR /&gt;&lt;/EM&gt;&lt;BR /&gt;&lt;STRONG&gt;Additional Observations:&lt;/STRONG&gt;&lt;BR /&gt;When I load and display each individual table (without applying any filters or subqueries), the time difference between JDBC and Databricks SQL is only 1-2 seconds.&lt;/P&gt;&lt;P&gt;&lt;STRONG&gt;The Question:&lt;/STRONG&gt;&lt;BR /&gt;Given the significant time difference when running the combined query via Databricks SQL compared to JDBC, I'm trying to understand where these 6-7 minutes are lost.&lt;/P&gt;&lt;P&gt;Is this related to the conversion process from Databricks SQL to SQL Server SQL?&lt;BR /&gt;Could it be that the subquery or the overall optimization differs between how Databricks SQL and JDBC handle these queries?&lt;BR /&gt;&lt;BR /&gt;Any insights, similar experiences, or suggestions on how to improve the performance of the Databricks SQL query would be greatly appreciated!&lt;/P&gt;&lt;P&gt;Thanks in advance!&lt;/P&gt;</description>
      <pubDate>Fri, 23 Aug 2024 15:13:50 GMT</pubDate>
      <guid>https://community.databricks.com/t5/data-engineering/performance-issue-with-uc-read-from-federated-sql-table-vs-jdbc/m-p/84075#M37125</guid>
      <dc:creator>Direo</dc:creator>
      <dc:date>2024-08-23T15:13:50Z</dc:date>
    </item>
    <item>
      <title>Re: Performance Issue with UC Read from Federated SQL Table vs JDBC Read from SQL Server</title>
      <link>https://community.databricks.com/t5/data-engineering/performance-issue-with-uc-read-from-federated-sql-table-vs-jdbc/m-p/106949#M42658</link>
      <description>&lt;P&gt;I'm facing the exact same issue with the exact same amount of time that seems to be of waste. When running multiple federated queries in a job, the additional overhead begins to add up and makes the functionality cost prohibitive.&amp;nbsp;&lt;/P&gt;</description>
      <pubDate>Fri, 24 Jan 2025 19:37:59 GMT</pubDate>
      <guid>https://community.databricks.com/t5/data-engineering/performance-issue-with-uc-read-from-federated-sql-table-vs-jdbc/m-p/106949#M42658</guid>
      <dc:creator>CharlesRColas</dc:creator>
      <dc:date>2025-01-24T19:37:59Z</dc:date>
    </item>
    <item>
      <title>Re: Performance Issue with UC Read from Federated SQL Table vs JDBC Read from SQL Server</title>
      <link>https://community.databricks.com/t5/data-engineering/performance-issue-with-uc-read-from-federated-sql-table-vs-jdbc/m-p/109842#M43408</link>
      <description>&lt;P&gt;I've found the JDBC query to be faster than the federated query because in our testing, the federated query does not pass down the full query to the source database. Instead, it's running "select * from table", pulling all of the data into Databricks and then filtering it before displaying/returning to the notebook. The direct JDBC query method passes the entire query down and the filtering, etc happens in the source database and only the data I need gets retrieved and sent to Databricks.&lt;/P&gt;&lt;P&gt;We noticed this behavior with several different queries of an on-prem SQL Server.&lt;/P&gt;</description>
      <pubDate>Tue, 11 Feb 2025 15:46:12 GMT</pubDate>
      <guid>https://community.databricks.com/t5/data-engineering/performance-issue-with-uc-read-from-federated-sql-table-vs-jdbc/m-p/109842#M43408</guid>
      <dc:creator>pdiamond</dc:creator>
      <dc:date>2025-02-11T15:46:12Z</dc:date>
    </item>
  </channel>
</rss>

