01-30-2022 08:01 AM
I tried to benchmark the Powerbi Databricks connector vs the powerbi Delta Lake reader on a dataset of 2.15million rows. I found that the delta lake reader used 20 seconds, while importing through the SQL compute endpoint took ~75 seconds.
When I look at the query profile in SQL compute I see that 50 seconds are spendt in the "Columnar To Row" step. This makes me rather suspicios, since I got the impression that with an updated PowerBI we would take advantage of "cloud fetch" which creates files containing Apache Arrow batches, which is a columnar format. So why the conversion to rows? Maybe it is not actually using cloud fetch? Is there any way to verify that I am actually using cloud fetch? Either in PowerBi logs or in the Databricks SQL compute endpoint web interface?
06-23-2022 12:53 AM
You would need to set EnableQueryResultsDownload Flag to 0 (zero) which will disable cloud fetch.
06-23-2022 12:54 AM
So why is ColumnarToRow required?
10-26-2022 04:47 AM
Hi everyone, check out my latest blog post to verify whether or not cloudfetch is actually used, maybe you also find some other optimizations there:
02-27-2023 07:24 AM
Guys, is there any way to switch off CloudFetch and fall back to ArrowResultSet by default irrespective of size? using the latest version of Spark Simba ODBC driver?
4 weeks ago
I'm troubleshooting slow speeds (~6Mbps) from Azure Databricks to the PowerBI Service (Fabric) via dataflows.
Does this mean that CloudFetch is not enabled here?
In the neo4j logs I also see CloudStoreBasedResultHandler receiving and responding to getNextCloudStoreBasedSet which I interpret as cloudFetch being enabled?
Join a Regional User Group to connect with local Databricks users. Events will be happening in your city, and you won’t want to miss the chance to attend and share knowledge.
If there isn’t a group near you, start one and help create a community that brings people together.
Request a New Group