cancel
Showing results for 
Search instead for 
Did you mean: 
Data Engineering
Join discussions on data engineering best practices, architectures, and optimization strategies within the Databricks Community. Exchange insights and solutions with fellow data engineers.
cancel
Showing results for 
Search instead for 
Did you mean: 

Databrick JDBC Driver making "List Column SQL" Query Everytime

varunjaincse
New Contributor III

I am trying to use the Databricks JDBC Spark Driver to run sql queries on the SQL Warehouse

Sample connection String

String TOKEN = "<token>";
String HTTP_PATH = "/sql/1.0/warehouses/<sql-warehouse-id>";

final String connStr = "jdbc:spark://discover.cloud.databricks.com:443/default;" +
"TransportMode=http;SSL=1;" +
"HTTPPath=" + HTTP_PATH + ";" +
"AuthMech=11;" +
"Auth_Flow=0;" +
"Auth_AccessToken=" + TOKEN + ";";
try {
final Connection connection = DriverManager.getConnection(connStr);
final Statement statement = connection.createStatement();

final ResultSet rset = statement.executeQuery("SELECT count(*) FROM table_name limit 10");
int columnCount = rset.getMetaData().getColumnCount();

while (rset.next()) {
for (int i = 1; i <= columnCount; i++) {
System.out.print(rset.getString(i) + "\t");
}
System.out.println();
}

connection.close();
} catch (SQLException e) {
System.err.println(e);
}

Above code is always running two SQL on the Warehouse
1. Listing columns 'catalog : Spark, schemaPattern : <schema_name>, tablePattern : <table_name>, columnName : null' -- Take 15+ sec everytime
2. Actual query

I tried reusing the connection object it didn't work. Any idea what going on here?
Using Driver Version: 2.6.18

1 ACCEPTED SOLUTION

Accepted Solutions

varunjaincse
New Contributor III

Thank you for your time. I got this fix by adding Param UseNativeQuery=1, in my jdbc connection string. By default it take value 2. As per the driver documentation

0: The connector transforms the queries emitted by applications and converts them into an equivalent form in HiveQL.
1: The connector does not transform the queries emitted by applications, so the native query is used.
2: The connector automatically sets this property to either 0 or 1, depending on the server capabilities.

 

View solution in original post

2 REPLIES 2

VZLA
Databricks Employee
Databricks Employee

Hello, thank you for your question.

The initial metadata query (the "Listing Columns" query) is tied to the SparkGetColumnsOperation class, which is part of the Apache Hive ThriftServer and Spark's handling of JDBC metadata operations.

Can you please confirm the SELECT COUNT LIMIT statement is triggering this first 15+ seconds query, and not the rset.getMetaData().getColumnCount(); ?

With respect to the SELECT statement, at least initially, it is ambiguous which catalog or schema the table resides in, so the driver needs to validate metadata to ensure the table and columns exist. But, the rset.getMetaData().getColumnCount(), explicitly fetches metadata for the query's result set. Internally, it uses DatabaseMetaData.getColumns() or similar operations which could be invoking the SparkGetColumnsOperation class. The metadata operation retrieves schema information about the columns in the result set (names, types, nullability, etc.), even if the query result is a single column like count(*). I would expect this to be more probable trigger for the metadata query as it ensures the ResultSet is correctly structured.

Please try some of these and compare the results:

  • Skip the metadata call and directly process the ResultSet:
    • final ResultSet rset = statement.executeQuery("SELECT count(*) FROM table_name LIMIT 10"); while (rset.next()) { System.out.println(rset.getInt(1)); // Directly fetch the first column }
  • Use the FQN:
    • final ResultSet rset = statement.executeQuery("SELECT count(*) FROM my_catalog.my_schema.table_name LIMIT 10");
  • Use PreparedStatements:
    • String query = "SELECT count(*) FROM table_name LIMIT 10";
      PreparedStatement preparedStatement = connection.prepareStatement(query);
      ResultSet rset = preparedStatement.executeQuery();

      while (rset.next()) {
      System.out.println(rset.getInt(1));
      }

So, ultimately something like this:

String TOKEN = "<token>";
String HTTP_PATH = "/sql/1.0/warehouses/<sql-warehouse-id>";

final String connStr = "jdbc:spark://discover.cloud.databricks.com:443/default;" +
        "TransportMode=http;SSL=1;" +
        "HTTPPath=" + HTTP_PATH + ";" +
        "AuthMech=11;" +
        "Auth_Flow=0;" +
        "Auth_AccessToken=" + TOKEN + ";";

try (Connection connection = DriverManager.getConnection(connStr);
     Statement statement = connection.createStatement()) {

    final ResultSet rset = statement.executeQuery("SELECT count(*) FROM my_catalog.my_schema.table_name LIMIT 10");

    while (rset.next()) {
        System.out.println(rset.getInt(1)); // Directly fetch the result
    }
} catch (SQLException e) {
    e.printStackTrace();
}

 

varunjaincse
New Contributor III

Thank you for your time. I got this fix by adding Param UseNativeQuery=1, in my jdbc connection string. By default it take value 2. As per the driver documentation

0: The connector transforms the queries emitted by applications and converts them into an equivalent form in HiveQL.
1: The connector does not transform the queries emitted by applications, so the native query is used.
2: The connector automatically sets this property to either 0 or 1, depending on the server capabilities.

 

Connect with Databricks Users in Your Area

Join a Regional User Group to connect with local Databricks users. Events will be happening in your city, and you won’t want to miss the chance to attend and share knowledge.

If there isn’t a group near you, start one and help create a community that brings people together.

Request a New Group