01-30-2022 08:01 AM
I tried to benchmark the Powerbi Databricks connector vs the powerbi Delta Lake reader on a dataset of 2.15million rows. I found that the delta lake reader used 20 seconds, while importing through the SQL compute endpoint took ~75 seconds.
When I look at the query profile in SQL compute I see that 50 seconds are spendt in the "Columnar To Row" step. This makes me rather suspicios, since I got the impression that with an updated PowerBI we would take advantage of "cloud fetch" which creates files containing Apache Arrow batches, which is a columnar format. So why the conversion to rows? Maybe it is not actually using cloud fetch? Is there any way to verify that I am actually using cloud fetch? Either in PowerBi logs or in the Databricks SQL compute endpoint web interface?
06-23-2022 12:53 AM
You would need to set EnableQueryResultsDownload Flag to 0 (zero) which will disable cloud fetch.
06-23-2022 12:54 AM
So why is ColumnarToRow required?
10-26-2022 04:47 AM
Hi everyone, check out my latest blog post to verify whether or not cloudfetch is actually used, maybe you also find some other optimizations there:
02-27-2023 07:24 AM
Guys, is there any way to switch off CloudFetch and fall back to ArrowResultSet by default irrespective of size? using the latest version of Spark Simba ODBC driver?
01-23-2025 07:40 AM
I'm troubleshooting slow speeds (~6Mbps) from Azure Databricks to the PowerBI Service (Fabric) via dataflows.
Does this mean that CloudFetch is not enabled here?
In the neo4j logs I also see CloudStoreBasedResultHandler receiving and responding to getNextCloudStoreBasedSet which I interpret as cloudFetch being enabled?
Passionate about hosting events and connecting people? Help us grow a vibrant local community—sign up today to get started!
Sign Up Now