3 weeks ago
Hi,
I am trying to query a table using JDBC endpoint of Interactive Cluster. I am connected to JDBC endpoint using DBeaver. When I export a small subset of data 2000-8000 rows, it works fine and export the data. However, when I try to export all rows of table, it fails by saying request timed out. My table has almost 1 million rows.
I tried same using Python application by trying to export data in CSV and got the same result i.e. it succeeds for small number of rows but fails for entire table.
Can someone try this and tell if they are experiencing same issue and help me understand why I'm experiencing this issue? Note - I'm using non-UC ADB (Hive_metastore)
3 weeks ago - last edited 3 weeks ago
If this is related to following topic: Exposing Data for Consumers in non-UC ADB - Databricks Community - 130954
Then:
"Looking at the error I noticed the weird name of the storage account. So now I'm thinking that your jdbc connection is using CloudFetch.
"Cloud Fetch, a capability that fetches query results through the cloud storage that is set up in your Azure Databricks deployment.
Query results are uploaded to an internal DBFS storage location as Arrow-serialized files of up to 20 MB. When the driver sends fetch requests after query completion, Azure Databricks generates and returns shared access signatures to the uploaded files. The JDBC driver then uses the URLs to download the results directly from DBFS."
Can you add following option to your JDBC url? It will disable cloudFetch. Once you set this option try again.
EnableQueryResultDownload=0
3 weeks ago
Hi
I am using latest JDBC driver 2.7.3 https://www.databricks.com/spark/jdbc-drivers-archive
And my JDBC url comes from JDBC endpoint of Interactive Cluster.
jdbc:databricks://adb-{workspace_id}.azuredatabricks.net:443/default;transportMode=http;ssl=1;httpPath=sql/protocolv1/o/{workspace_id}/{cluster_id};AuthMech=11;Auth_Flow=2;TokenCachePassPhrase=U2MToken;EnableTokenCache=0
3 weeks ago - last edited 3 weeks ago
If this is related to following topic: Exposing Data for Consumers in non-UC ADB - Databricks Community - 130954
Then:
"Looking at the error I noticed the weird name of the storage account. So now I'm thinking that your jdbc connection is using CloudFetch.
"Cloud Fetch, a capability that fetches query results through the cloud storage that is set up in your Azure Databricks deployment.
Query results are uploaded to an internal DBFS storage location as Arrow-serialized files of up to 20 MB. When the driver sends fetch requests after query completion, Azure Databricks generates and returns shared access signatures to the uploaded files. The JDBC driver then uses the URLs to download the results directly from DBFS."
Can you add following option to your JDBC url? It will disable cloudFetch. Once you set this option try again.
EnableQueryResultDownload=0
3 weeks ago
Hi @szymon_dybczak ,
Your recommendation worked. I modified JDBC url to
jdbc:databricks://adb-{workspace_id}.azuredatabricks.net:443/default;transportMode=http;ssl=1;httpPath=sql/protocolv1/o/{workspace_id}/{cluster_id};AuthMech=11;Auth_Flow=2;TokenCachePassPhrase=U2MToken;EnableTokenCache=0;EnableQueryResultDownload=0.
I tried another solution which looks essentially same what @szymon_dybczak you recommended. Solution is attached.
3 weeks ago
Great that it worked @fly_high_five . I'm happy that I could help.
3 weeks ago - last edited 3 weeks ago
Hi @fly_high_five,
I found these references about this situation, see if they help you: increase the SocketTimeout in JDBC (Databricks KB “Best practices when using JDBC with Databricks SQL” – https://kb.databricks.com/dbsql/job-timeout-when-connecting-to-a-sql-endpoint-over-jdbc
) and replace fetchall() with fetchmany() while adjusting cursor.arraysize (Databricks SQL Connector for Python docs – https://docs.databricks.com/en/dev-tools/python-sql-connector.html
).
Passionate about hosting events and connecting people? Help us grow a vibrant local community—sign up today to get started!
Sign Up Now