Join discussions on data engineering best practices, architectures, and optimization strategies within the Databricks Community. Exchange insights and solutions with fellow data engineers.
I tried to benchmark the Powerbi Databricks connector vs the powerbi Delta Lake reader on a dataset of 2.15million rows. I found that the delta lake reader used 20 seconds, while importing through the SQL compute endpoint took ~75 seconds. When I loo...
I'm troubleshooting slow speeds (~6Mbps) from Azure Databricks to the PowerBI Service (Fabric) via dataflows.Drivers are up to date. PowerBI is using Microsoft's Spark ODBC driver Version 2.7.6.1014, confirmed via log4j.HybridCloudStoreResultHandler...
When trying to run a create table USING CSV that pulls data from an azure blob storage path in a custom catalog I created, I get an error stating Unsupported cloud file system schema 'wasbs'.However, when I run this code in the hive_metastore catalog...
At the moment we are running an inference server in azure machine learning. We would like to be able to expose existing metrics to prometheus as well as create our own custom metrics, all described belowExpose existing metrics:I would like a breakdow...
I'm tired of telling clients or referrals I don't know databricks but it seems like the only option is to have a big AWS account and then use databricks on that data. Can I download it locally for training, upskilling with python or is it only for cl...
Thanks for linking directly to the docker image @Hubert Dudek ! And thanks for the info @Prabakar Ammeappin and @Amit Nainawati @Andrew Schell Let us know if you have more questions! If not, choose a best answer in this thread and let us know how...
About Cloud Fetch mentioned in this article:https://databricks.com/blog/2021/08/11/how-we-achieved-high-bandwidth-connectivity-with-bi-tools.htmlAre there any public APIs that can be called directly without ODBC or JDBC drivers? Thanks.
We're trying to pull a big amount of data using databricks sql and seem to have a bottleneck on network throughput when fetching the data.I see there's a new feature called cloud fetch and this seems to be the perfect solution for our issue. But I do...
Trying to get an idea of what you are trying:so you query directly on a database of +100GB or is it parquet/delta source?Also, where is the result fetched to? File download, BI tool, ...?
Also, unlike other servers, Delta Sharing internally uses pre-signed URLs to S3, GCS, or ADSL, so data transfer from a client happens at the bandwidth of the underlying cloud object-store. This way the Delta Sharing server scales extremely well and d...