01-30-2022 08:01 AM
I tried to benchmark the Powerbi Databricks connector vs the powerbi Delta Lake reader on a dataset of 2.15million rows. I found that the delta lake reader used 20 seconds, while importing through the SQL compute endpoint took ~75 seconds.
When I look at the query profile in SQL compute I see that 50 seconds are spendt in the "Columnar To Row" step. This makes me rather suspicios, since I got the impression that with an updated PowerBI we would take advantage of "cloud fetch" which creates files containing Apache Arrow batches, which is a columnar format. So why the conversion to rows? Maybe it is not actually using cloud fetch? Is there any way to verify that I am actually using cloud fetch? Either in PowerBi logs or in the Databricks SQL compute endpoint web interface?
03-07-2022 03:02 AM
It helps, but it did not solve it. See my reply to him.
03-07-2022 03:09 AM
Thank you for the update @Erik Parmann . We'll try to find a suitable answer for you.
05-18-2022 02:41 AM
Hi @Erik Parmann did you have a chance to look at this document?
https://docs.databricks.com/integrations/bi/jdbc-odbc-bi.html#arrow-serialization-in-odbc
05-24-2022 11:31 AM
Yes, thanks. In my case we are using Azure databricks, and I am not able to find an equally detailed description of cloud fetch on azure databricks, and if there are any settings we might have which dissables it.
06-23-2022 12:53 AM
You would need to set EnableQueryResultsDownload Flag to 0 (zero) which will disable cloud fetch.
06-23-2022 12:54 AM
So why is ColumnarToRow required?
10-26-2022 04:47 AM
Hi everyone, check out my latest blog post to verify whether or not cloudfetch is actually used, maybe you also find some other optimizations there:
02-27-2023 07:24 AM
Guys, is there any way to switch off CloudFetch and fall back to ArrowResultSet by default irrespective of size? using the latest version of Spark Simba ODBC driver?