- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
12-15-2021 08:22 AM
We're trying to pull a big amount of data using databricks sql and seem to have a bottleneck on network throughput when fetching the data.
I see there's a new feature called cloud fetch and this seems to be the perfect solution for our issue. But I don't see any documentation on how to use this feature.
- Labels:
-
Azure databricks
-
Cloud
-
Cloud Fetch
Accepted Solutions
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
12-15-2021 08:37 AM
Clud fetch is architecture inside ODBC driver. To use it you need just latest ODBC driver https://databricks.com/blog/2021/08/11/how-we-achieved-high-bandwidth-connectivity-with-bi-tools.htm...
Big amount in sql what exactly is big? Maybe some partitioning, multi cluster and some load in chunks could help.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
12-15-2021 08:37 AM
Clud fetch is architecture inside ODBC driver. To use it you need just latest ODBC driver https://databricks.com/blog/2021/08/11/how-we-achieved-high-bandwidth-connectivity-with-bi-tools.htm...
Big amount in sql what exactly is big? Maybe some partitioning, multi cluster and some load in chunks could help.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
12-15-2021 09:12 AM
Is there any way we confirm it's using cloud fetch?
I'm not sure what's the exact size, but it's up to 100s of GB of data across multiple queries.
Looking at the metrics from the VM that's executing the query, the max throughput is 60MBps
It doesn't seem to match the throughput seen in the document. It's closer to the Baseline with single threaded. I'm using 2.6.19 ODBC driver
Here's sample execution details for one query
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
12-16-2021 12:13 AM
Trying to get an idea of what you are trying:
so you query directly on a database of +100GB or is it parquet/delta source?
Also, where is the result fetched to? File download, BI tool, ...?