01-08-2026 07:59 PM
Hello Team,
I am facing a consistent issue when streaming a large table (~53 million rows) from a Databricks SQL Warehouse using Python (databricks-sql-connector) with OAuth authentication.
I execute a single long-running query and fetch data in batches (50,000 rows) using cursor.fetchmany(), loading the data into an external database (cockroach dB). The job runs successfully for some time but always fails after ~55–65 minutes.
Typical errors:
In some runs, I also see:
I have already tried:
Proactive OAuth token refresh and caching
Smaller batch sizes
Reusing database connections
Multiple retries
However, once the failure occurs, the active cursor/result set becomes invalid and the query cannot continue. Refreshing the token does not help.
This appears to be related to SQL Warehouse session lifetime and long-running result set streaming, possibly exacerbated by CloudFetch download time.
Is streaming very large result sets (10M+ rows) via Databricks SQL Warehouse supported?
Is the recommended approach to use COPY INTO / UNLOAD to external storage instead of Python streaming?
Any clarification or official guidance would be appreciated.
Thank you.
01-08-2026 08:59 PM
Hi @Rose_15 53M rows via streaming looks big number. What is the use case, if the use case is streaming possibly, you need find a way to extract small batches ( like incremental) and load the data. The use case more look like a batch load.
Please provide more details on the use case so that we can explore optimal solution.
01-08-2026 09:33 PM
Hello Sanjeeb,
Thank you for your response.
This is indeed a batch data movement use case, not real-time streaming. The requirement is to perform a one-time / periodic bulk load of a large table (~53M rows) from Databricks into an external OLTP database (CockroachDB) using Python.
There is no requirement for low-latency or continuous streaming. The Python-based approach was chosen only as an extraction mechanism, not for real-time consumption.
We attempted:
Batch fetching (fetchmany) with small batch sizes
OAuth token refresh
Connection reuse
However, the job consistently fails after ~1 hour, indicating limitations with long-running cursors in Databricks SQL Warehouses.
Could you please confirm:
Whether COPY INTO/UNLOAD to cloud storage is the recommended approach for large batch exports from Databricks?
If Python-based extraction is required, is range/partition-based querying the only supported workaround?
Are there any documented limits or best practices regarding maximum result set sizes or cursor lifetimes for SQL Warehouses?
Your guidance on the officially supported pattern for this use case would be very helpful.
Thank you,
01-09-2026 07:54 PM
Hi @Rose_15 - Thanks for the details. It is better to do a planning like number of tables, size and number of records and better to extract the files to a cloud storage and reload the data using any mechanism. Once your extraction is complete, you will not depend upon source server and always load the file in databricks tables. If possible, you can explore the approach to extract it in .parquet file which will reduce the size as well.
2 weeks ago