Hello Team,
I am facing a consistent issue when streaming a large table (~53 million rows) from a Databricks SQL Warehouse using Python (databricks-sql-connector) with OAuth authentication.
I execute a single long-running query and fetch data in batches (50,000 rows) using cursor.fetchmany(), loading the data into an external database (cockroach dB). The job runs successfully for some time but always fails after ~55–65 minutes.
Typical errors:
Token exchange failed, using external token: 'access_token'
ThriftBackend.attempt_request: Exception
databricks.sql.exc.RequestError: Error during request to server
In some runs, I also see:
CloudFetch download slower than threshold: 0.08 MB/s (threshold: 0.1 MB/s)
I have already tried:
However, once the failure occurs, the active cursor/result set becomes invalid and the query cannot continue. Refreshing the token does not help.
This appears to be related to SQL Warehouse session lifetime and long-running result set streaming, possibly exacerbated by CloudFetch download time.
Is streaming very large result sets (10M+ rows) via Databricks SQL Warehouse supported?
Is the recommended approach to use COPY INTO / UNLOAD to external storage instead of Python streaming?
Any clarification or official guidance would be appreciated.
Thank you.