Databricks Community

Erik · ‎01-30-2022

I tried to benchmark the Powerbi Databricks connector vs the powerbi Delta Lake reader on a dataset of 2.15million rows. I found that the delta lake reader used 20 seconds, while importing through the SQL compute endpoint took ~75 seconds.

When I look at the query profile in SQL compute I see that 50 seconds are spendt in the "Columnar To Row" step. This makes me rather suspicios, since I got the impression that with an updated PowerBI we would take advantage of "cloud fetch" which creates files containing Apache Arrow batches, which is a columnar format. So why the conversion to rows? Maybe it is not actually using cloud fetch? Is there any way to verify that I am actually using cloud fetch? Either in PowerBi logs or in the Databricks SQL compute endpoint web interface?

cchalc · ‎06-23-2022

You would need to set EnableQueryResultsDownload Flag to 0 (zero) which will disable cloud fetch.

cchalc · ‎06-23-2022

So why is ColumnarToRow required?

pichlerpa · ‎10-26-2022

Hi everyone, check out my latest blog post to verify whether or not cloudfetch is actually used, maybe you also find some other optimizations there:

https://medium.com/creative-data/boosting-databricks-odbc-driver-be2cf08a7a4a?sk=bd814e0c3d6a9b32beb...

pulkitm · ‎02-27-2023

Guys, is there any way to switch off CloudFetch and fall back to ArrowResultSet by default irrespective of size? using the latest version of Spark Simba ODBC driver?