<?xml version="1.0" encoding="UTF-8"?>
<rss xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:taxo="http://purl.org/rss/1.0/modules/taxonomy/" version="2.0">
  <channel>
    <title>topic Databrick JDBC Driver making &amp;quot;List Column SQL&amp;quot; Query Everytime in Data Engineering</title>
    <link>https://community.databricks.com/t5/data-engineering/databrick-jdbc-driver-making-quot-list-column-sql-quot-query/m-p/100902#M40468</link>
    <description>&lt;P&gt;I am trying to use the Databricks JDBC Spark Driver to run sql queries on the SQL Warehouse&lt;/P&gt;&lt;P&gt;Sample connection String&lt;/P&gt;&lt;DIV&gt;&lt;PRE&gt;String TOKEN = &lt;SPAN&gt;"&amp;lt;token&amp;gt;"&lt;/SPAN&gt;;&lt;BR /&gt;String HTTP_PATH = &lt;SPAN&gt;"/sql/1.0/warehouses/&amp;lt;sql-warehouse-id&amp;gt;"&lt;/SPAN&gt;;&lt;BR /&gt;&lt;BR /&gt;&lt;SPAN&gt;final &lt;/SPAN&gt;String connStr = &lt;SPAN&gt;"jdbc:spark://discover.cloud.databricks.com:443/default;" &lt;/SPAN&gt;+&lt;BR /&gt;        &lt;SPAN&gt;"TransportMode=http;SSL=1;" &lt;/SPAN&gt;+&lt;BR /&gt;        &lt;SPAN&gt;"HTTPPath=" &lt;/SPAN&gt;+ HTTP_PATH + &lt;SPAN&gt;";" &lt;/SPAN&gt;+&lt;BR /&gt;        &lt;SPAN&gt;"AuthMech=11;" &lt;/SPAN&gt;+&lt;BR /&gt;        &lt;SPAN&gt;"Auth_Flow=0;" &lt;/SPAN&gt;+&lt;BR /&gt;        &lt;SPAN&gt;"Auth_AccessToken=" &lt;/SPAN&gt;+ TOKEN + &lt;SPAN&gt;";"&lt;/SPAN&gt;;&lt;BR /&gt;&lt;SPAN&gt;try &lt;/SPAN&gt;{&lt;BR /&gt;    &lt;SPAN&gt;final &lt;/SPAN&gt;Connection connection = DriverManager.&lt;SPAN&gt;getConnection&lt;/SPAN&gt;(connStr);&lt;BR /&gt;    &lt;SPAN&gt;final &lt;/SPAN&gt;Statement statement = connection.createStatement();&lt;BR /&gt;&lt;BR /&gt;    &lt;SPAN&gt;final &lt;/SPAN&gt;ResultSet rset = statement.executeQuery(&lt;SPAN&gt;"&lt;/SPAN&gt;&lt;SPAN&gt;SELECT count(*) FROM table_name limit 10&lt;/SPAN&gt;&lt;SPAN&gt;"&lt;/SPAN&gt;);&lt;BR /&gt;    &lt;SPAN&gt;int &lt;/SPAN&gt;columnCount = rset.getMetaData().getColumnCount();&lt;BR /&gt;&lt;BR /&gt;    &lt;SPAN&gt;while &lt;/SPAN&gt;(rset.next()) {&lt;BR /&gt;        &lt;SPAN&gt;for &lt;/SPAN&gt;(&lt;SPAN&gt;int &lt;/SPAN&gt;i = &lt;SPAN&gt;1&lt;/SPAN&gt;; i &amp;lt;= columnCount; i++) {&lt;BR /&gt;            System.&lt;SPAN&gt;out&lt;/SPAN&gt;.print(rset.getString(i) + &lt;SPAN&gt;"&lt;/SPAN&gt;&lt;SPAN&gt;\t&lt;/SPAN&gt;&lt;SPAN&gt;"&lt;/SPAN&gt;);&lt;BR /&gt;        }&lt;BR /&gt;        System.&lt;SPAN&gt;out&lt;/SPAN&gt;.println();&lt;BR /&gt;    }&lt;BR /&gt;&lt;BR /&gt;    connection.close();&lt;BR /&gt;} &lt;SPAN&gt;catch &lt;/SPAN&gt;(SQLException e) {&lt;BR /&gt;    System.&lt;SPAN&gt;err&lt;/SPAN&gt;.println(e);&lt;BR /&gt;}&lt;/PRE&gt;&lt;P&gt;Above code is always running two SQL on the Warehouse&lt;BR /&gt;1.&amp;nbsp;Listing columns 'catalog : Spark, schemaPattern : &amp;lt;schema_name&amp;gt;, tablePattern : &amp;lt;table_name&amp;gt;, columnName : null' -- Take 15+ sec everytime&lt;BR /&gt;2. Actual query&lt;BR /&gt;&lt;BR /&gt;I tried reusing the connection object it didn't work. Any idea what going on here?&lt;BR /&gt;&lt;STRONG&gt;Using Driver Version: 2.6.18&lt;/STRONG&gt;&lt;BR /&gt;&lt;BR /&gt;&lt;/P&gt;&lt;/DIV&gt;</description>
    <pubDate>Wed, 04 Dec 2024 12:28:22 GMT</pubDate>
    <dc:creator>varunjaincse</dc:creator>
    <dc:date>2024-12-04T12:28:22Z</dc:date>
    <item>
      <title>Databrick JDBC Driver making "List Column SQL" Query Everytime</title>
      <link>https://community.databricks.com/t5/data-engineering/databrick-jdbc-driver-making-quot-list-column-sql-quot-query/m-p/100902#M40468</link>
      <description>&lt;P&gt;I am trying to use the Databricks JDBC Spark Driver to run sql queries on the SQL Warehouse&lt;/P&gt;&lt;P&gt;Sample connection String&lt;/P&gt;&lt;DIV&gt;&lt;PRE&gt;String TOKEN = &lt;SPAN&gt;"&amp;lt;token&amp;gt;"&lt;/SPAN&gt;;&lt;BR /&gt;String HTTP_PATH = &lt;SPAN&gt;"/sql/1.0/warehouses/&amp;lt;sql-warehouse-id&amp;gt;"&lt;/SPAN&gt;;&lt;BR /&gt;&lt;BR /&gt;&lt;SPAN&gt;final &lt;/SPAN&gt;String connStr = &lt;SPAN&gt;"jdbc:spark://discover.cloud.databricks.com:443/default;" &lt;/SPAN&gt;+&lt;BR /&gt;        &lt;SPAN&gt;"TransportMode=http;SSL=1;" &lt;/SPAN&gt;+&lt;BR /&gt;        &lt;SPAN&gt;"HTTPPath=" &lt;/SPAN&gt;+ HTTP_PATH + &lt;SPAN&gt;";" &lt;/SPAN&gt;+&lt;BR /&gt;        &lt;SPAN&gt;"AuthMech=11;" &lt;/SPAN&gt;+&lt;BR /&gt;        &lt;SPAN&gt;"Auth_Flow=0;" &lt;/SPAN&gt;+&lt;BR /&gt;        &lt;SPAN&gt;"Auth_AccessToken=" &lt;/SPAN&gt;+ TOKEN + &lt;SPAN&gt;";"&lt;/SPAN&gt;;&lt;BR /&gt;&lt;SPAN&gt;try &lt;/SPAN&gt;{&lt;BR /&gt;    &lt;SPAN&gt;final &lt;/SPAN&gt;Connection connection = DriverManager.&lt;SPAN&gt;getConnection&lt;/SPAN&gt;(connStr);&lt;BR /&gt;    &lt;SPAN&gt;final &lt;/SPAN&gt;Statement statement = connection.createStatement();&lt;BR /&gt;&lt;BR /&gt;    &lt;SPAN&gt;final &lt;/SPAN&gt;ResultSet rset = statement.executeQuery(&lt;SPAN&gt;"&lt;/SPAN&gt;&lt;SPAN&gt;SELECT count(*) FROM table_name limit 10&lt;/SPAN&gt;&lt;SPAN&gt;"&lt;/SPAN&gt;);&lt;BR /&gt;    &lt;SPAN&gt;int &lt;/SPAN&gt;columnCount = rset.getMetaData().getColumnCount();&lt;BR /&gt;&lt;BR /&gt;    &lt;SPAN&gt;while &lt;/SPAN&gt;(rset.next()) {&lt;BR /&gt;        &lt;SPAN&gt;for &lt;/SPAN&gt;(&lt;SPAN&gt;int &lt;/SPAN&gt;i = &lt;SPAN&gt;1&lt;/SPAN&gt;; i &amp;lt;= columnCount; i++) {&lt;BR /&gt;            System.&lt;SPAN&gt;out&lt;/SPAN&gt;.print(rset.getString(i) + &lt;SPAN&gt;"&lt;/SPAN&gt;&lt;SPAN&gt;\t&lt;/SPAN&gt;&lt;SPAN&gt;"&lt;/SPAN&gt;);&lt;BR /&gt;        }&lt;BR /&gt;        System.&lt;SPAN&gt;out&lt;/SPAN&gt;.println();&lt;BR /&gt;    }&lt;BR /&gt;&lt;BR /&gt;    connection.close();&lt;BR /&gt;} &lt;SPAN&gt;catch &lt;/SPAN&gt;(SQLException e) {&lt;BR /&gt;    System.&lt;SPAN&gt;err&lt;/SPAN&gt;.println(e);&lt;BR /&gt;}&lt;/PRE&gt;&lt;P&gt;Above code is always running two SQL on the Warehouse&lt;BR /&gt;1.&amp;nbsp;Listing columns 'catalog : Spark, schemaPattern : &amp;lt;schema_name&amp;gt;, tablePattern : &amp;lt;table_name&amp;gt;, columnName : null' -- Take 15+ sec everytime&lt;BR /&gt;2. Actual query&lt;BR /&gt;&lt;BR /&gt;I tried reusing the connection object it didn't work. Any idea what going on here?&lt;BR /&gt;&lt;STRONG&gt;Using Driver Version: 2.6.18&lt;/STRONG&gt;&lt;BR /&gt;&lt;BR /&gt;&lt;/P&gt;&lt;/DIV&gt;</description>
      <pubDate>Wed, 04 Dec 2024 12:28:22 GMT</pubDate>
      <guid>https://community.databricks.com/t5/data-engineering/databrick-jdbc-driver-making-quot-list-column-sql-quot-query/m-p/100902#M40468</guid>
      <dc:creator>varunjaincse</dc:creator>
      <dc:date>2024-12-04T12:28:22Z</dc:date>
    </item>
    <item>
      <title>Re: Databrick JDBC Driver making "List Column SQL" Query Everytime</title>
      <link>https://community.databricks.com/t5/data-engineering/databrick-jdbc-driver-making-quot-list-column-sql-quot-query/m-p/100912#M40472</link>
      <description>&lt;P&gt;Hello, thank you for your question.&lt;/P&gt;
&lt;P&gt;The initial metadata query (the "Listing Columns" query) is tied to the &lt;STRONG&gt;SparkGetColumnsOperation&lt;/STRONG&gt; class, which is part of the Apache Hive ThriftServer and Spark's handling of JDBC metadata operations.&lt;/P&gt;
&lt;P&gt;Can you please confirm the SELECT COUNT LIMIT statement is triggering this first 15+ seconds query, and not the&amp;nbsp;rset.getMetaData().getColumnCount(); ?&lt;/P&gt;
&lt;P&gt;With respect to the SELECT statement, at least initially, it is&amp;nbsp;ambiguous which catalog or schema the table resides in, so the driver needs to validate metadata to ensure the table and columns exist. But, the&amp;nbsp;rset.getMetaData().getColumnCount(),&amp;nbsp;explicitly fetches metadata for the query's result set. Internally, it uses DatabaseMetaData.getColumns() or similar operations which could be invoking the SparkGetColumnsOperation class.&amp;nbsp;The metadata operation retrieves schema information about the columns in the result set (names, types, nullability, etc.), even if the query result is a single column like count(*). I would expect this to be more probable trigger for the metadata query as it ensures the ResultSet is correctly structured.&lt;/P&gt;
&lt;P&gt;&lt;STRONG&gt;Please try some of these and compare the results:&lt;/STRONG&gt;&lt;/P&gt;
&lt;UL&gt;
&lt;LI&gt;&lt;STRONG&gt;Skip the metadata call and directly process the ResultSet:&lt;/STRONG&gt;
&lt;UL&gt;
&lt;LI&gt;&lt;SPAN class="hljs-keyword"&gt;final&lt;/SPAN&gt; &lt;SPAN class="hljs-type"&gt;ResultSet&lt;/SPAN&gt; &lt;SPAN class="hljs-variable"&gt;rset&lt;/SPAN&gt; &lt;SPAN class="hljs-operator"&gt;=&lt;/SPAN&gt; statement.executeQuery(&lt;SPAN class="hljs-string"&gt;"SELECT count(*) FROM table_name LIMIT 10"&lt;/SPAN&gt;); &lt;SPAN class="hljs-keyword"&gt;while&lt;/SPAN&gt; (rset.next()) { System.out.println(rset.getInt(&lt;SPAN class="hljs-number"&gt;1&lt;/SPAN&gt;)); &lt;SPAN class="hljs-comment"&gt;// Directly fetch the first column&lt;/SPAN&gt; }&lt;/LI&gt;
&lt;/UL&gt;
&lt;/LI&gt;
&lt;LI&gt;&lt;STRONG&gt;Use the FQN:&lt;/STRONG&gt;
&lt;UL&gt;
&lt;LI&gt;final ResultSet rset = statement.executeQuery("SELECT count(*) FROM my_catalog.my_schema.table_name LIMIT 10");&lt;/LI&gt;
&lt;/UL&gt;
&lt;/LI&gt;
&lt;LI&gt;&lt;STRONG&gt;Use PreparedStatements:&lt;/STRONG&gt;
&lt;UL&gt;
&lt;LI&gt;
&lt;P&gt;String query = "SELECT count(*) FROM table_name LIMIT 10";&lt;BR /&gt;PreparedStatement preparedStatement = connection.prepareStatement(query);&lt;BR /&gt;ResultSet rset = preparedStatement.executeQuery();&lt;/P&gt;
&lt;P&gt;while (rset.next()) {&lt;BR /&gt;System.out.println(rset.getInt(1));&lt;BR /&gt;}&lt;/P&gt;
&lt;/LI&gt;
&lt;/UL&gt;
&lt;/LI&gt;
&lt;/UL&gt;
&lt;P&gt;So, ultimately something like this:&lt;/P&gt;
&lt;LI-CODE lang="python"&gt;String TOKEN = "&amp;lt;token&amp;gt;";
String HTTP_PATH = "/sql/1.0/warehouses/&amp;lt;sql-warehouse-id&amp;gt;";

final String connStr = "jdbc:spark://discover.cloud.databricks.com:443/default;" +
        "TransportMode=http;SSL=1;" +
        "HTTPPath=" + HTTP_PATH + ";" +
        "AuthMech=11;" +
        "Auth_Flow=0;" +
        "Auth_AccessToken=" + TOKEN + ";";

try (Connection connection = DriverManager.getConnection(connStr);
     Statement statement = connection.createStatement()) {

    final ResultSet rset = statement.executeQuery("SELECT count(*) FROM my_catalog.my_schema.table_name LIMIT 10");

    while (rset.next()) {
        System.out.println(rset.getInt(1)); // Directly fetch the result
    }
} catch (SQLException e) {
    e.printStackTrace();
}
&lt;/LI-CODE&gt;
&lt;H3&gt;&amp;nbsp;&lt;/H3&gt;</description>
      <pubDate>Wed, 04 Dec 2024 13:12:57 GMT</pubDate>
      <guid>https://community.databricks.com/t5/data-engineering/databrick-jdbc-driver-making-quot-list-column-sql-quot-query/m-p/100912#M40472</guid>
      <dc:creator>VZLA</dc:creator>
      <dc:date>2024-12-04T13:12:57Z</dc:date>
    </item>
    <item>
      <title>Re: Databrick JDBC Driver making "List Column SQL" Query Everytime</title>
      <link>https://community.databricks.com/t5/data-engineering/databrick-jdbc-driver-making-quot-list-column-sql-quot-query/m-p/101030#M40515</link>
      <description>&lt;P&gt;Thank you for your time. I got this fix by adding Param UseNativeQuery=1,&amp;nbsp;in my jdbc connection string. By default it take value 2. As per the driver documentation&lt;/P&gt;&lt;P&gt;0: The connector transforms the queries emitted by applications and converts them into an equivalent form in HiveQL.&lt;BR /&gt;1: The connector does not transform the queries emitted by applications, so the native query is used.&lt;BR /&gt;2: The connector automatically sets this property to either 0 or 1, depending on the server capabilities.&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;</description>
      <pubDate>Thu, 05 Dec 2024 08:09:21 GMT</pubDate>
      <guid>https://community.databricks.com/t5/data-engineering/databrick-jdbc-driver-making-quot-list-column-sql-quot-query/m-p/101030#M40515</guid>
      <dc:creator>varunjaincse</dc:creator>
      <dc:date>2024-12-05T08:09:21Z</dc:date>
    </item>
  </channel>
</rss>

