Data Engineering

Forum Posts

Sorted by:

by Erik • Valued Contributor III

01-30-2022 8:01:05 AM

30029 Views
19 replies
15 kudos

How to enable/verify cloud fetch from PowerBI

I tried to benchmark the Powerbi Databricks connector vs the powerbi Delta Lake reader on a dataset of 2.15million rows. I found that the delta lake reader used 20 seconds, while importing through the SQL compute endpoint took ~75 seconds. When I loo...

Data Engineering

30029 Views
19 replies
15 kudos

01-30-2022 8:01:05 AM

View Replies

Latest Reply

datadrivenangel
New Contributor III

01-23-2025 7:40:56 AM

15 kudos

I'm troubleshooting slow speeds (~6Mbps) from Azure Databricks to the PowerBI Service (Fabric) via dataflows.Drivers are up to date. PowerBI is using Microsoft's Spark ODBC driver Version 2.7.6.1014, confirmed via log4j.HybridCloudStoreResultHandler...

15 kudos

01-23-2025 7:40:56 AM

18 More Replies

by Vladif1 • New Contributor II

03-29-2023 9:43:41 PM

9971 Views
8 replies
1 kudos

Error when reading delta lake files with Auto Loader

Hi,When reading Delta Lake file (created by Auto Loader) with this code: df = ( spark.readStream .format('cloudFiles') .option("cloudFiles.format", "delta") .option("cloudFiles.schemaLocation", f"{silver_path}/_checkpoint") .load(bronz...

Data Engineering

9971 Views
8 replies
1 kudos

03-29-2023 9:43:41 PM

View Replies

Latest Reply

Panda
Valued Contributor

10-15-2024 5:09:31 PM

1 kudos

@Vladif1 The error occurs because the cloudFiles format in Auto Loader is meant for reading raw file formats like CSV, JSON ... for ingestion for more Format Support. For Delta tables, you should use the Delta format directly. #Sample Example bronze...

1 kudos

10-15-2024 5:09:31 PM

7 More Replies

by DJey • New Contributor III

06-02-2023 6:52:05 AM

22225 Views
6 replies
2 kudos

Resolved! MergeSchema Not Working

Hi All, I have a scenario where my Exisiting Delta Table looks like below:Now I have an incremental data with an additional column i.e. owner:Dataframe Name --> scdDFBelow is the code snippet to merge Incremental Dataframe to targetTable, but the new...

Data Engineering

22225 Views
6 replies
2 kudos

06-02-2023 6:52:05 AM

View Replies

Latest Reply

Amin112
New Contributor II

09-26-2024 8:51:35 PM

2 kudos

In Databricks Runtime 15.2 and above, you can specify schema evolution in a merge statement using SQL or Delta table APIs:MERGE WITH SCHEMA EVOLUTION INTO targetUSING sourceON source.key = target.keyWHEN MATCHED THENUPDATE SET *WHEN NOT MATCHED THENI...

2 kudos

09-26-2024 8:51:35 PM

5 More Replies

by Oliver_Angelil • Valued Contributor II

05-09-2023 8:21:07 AM

13184 Views
9 replies
6 kudos

Resolved! Confusion about Data storage: Data Asset within Databricks vs Hive Metastore vs Delta Lake vs Lakehouse vs DBFS vs Unity Catalogue vs Azure Blob

Hi thereIt seems there are many different ways to store / manage data in Databricks.This is the Data asset in Databricks: However data can also be stored (hyperlinks included to relevant pages):in a Lakehousein Delta Lakeon Azure Blob storagein the D...

Data Engineering

13184 Views
9 replies
6 kudos

05-09-2023 8:21:07 AM

View Replies

Latest Reply

Rahul_S
New Contributor II

07-14-2024 12:33:38 AM

6 kudos

Informative.

6 kudos

07-14-2024 12:33:38 AM

8 More Replies

by Ajay-Pandey • Databricks MVP

02-23-2023 3:30:55 AM

2571 Views
2 replies
7 kudos

docs.databricks.com

Rename and drop columns with Delta Lake column mapping. Hi all,Now databricks started supporting column rename and drop.Column mapping requires the following Delta protocols:Reader version 2 or above.Writer version 5 or above.Blog URL##Available in D...

Data Engineering

2571 Views
2 replies
7 kudos

02-23-2023 3:30:55 AM

View Replies

Latest Reply

Poovarasan
New Contributor III

03-03-2024 9:51:03 PM

7 kudos

Above mentioned feature is not working in the DLT pipeline. if the scrip has more than 4 columns

7 kudos

03-03-2024 9:51:03 PM

1 More Replies

by nitinsingh1 • New Contributor II

06-22-2022 8:19:30 AM

5271 Views
5 replies
2 kudos

Databricks Runtime compatibility error with latest version while reading from (ADLS) Dynamic 365 .

We are trying to establish ingestion from dynamic 365 >> ADLS >> Databricks, While reading information we need to use databricks runtime 6.4 to read the raw data from ADLS into Databricks. Latest databricks runtime couldn’t be used, Need your help to...

Data Engineering

5271 Views
5 replies
2 kudos

06-22-2022 8:19:30 AM

View Replies

Latest Reply

BobBubble2000
New Contributor II

08-09-2023 1:20:19 AM

2 kudos

Hi @nitinsingh1 Thank you for bringing up this topic, I'm also currently looking into how to ingest exported Dynamics 365 FO data (csv files with CDM) from ADLS into Databricks. Could you share how you achieved this? I'd be very curious to see your a...

2 kudos

08-09-2023 1:20:19 AM

4 More Replies

by Rishabh-Pandey • Databricks MVP

12-22-2022 3:44:27 AM

3721 Views
3 replies
5 kudos

www.linkedin.com

woahhh #Excel plug in for #DeltaSharing.Now I can import delta tables directly into my spreadsheet using Delta Sharing.It puts the power of #DeltaLake into the hands of millions of business users.What does this mean?Imagine a data provider delivering...

Data Engineering

3721 Views
3 replies
5 kudos

12-22-2022 3:44:27 AM

View Replies

Latest Reply

udit02
New Contributor II

01-28-2024 6:18:30 AM

5 kudos

If you have any uncertainties, feel free to inquire here or connect with me on my LinkedIn profile for further assistance.https://whatsgbpro.org/

5 kudos

01-28-2024 6:18:30 AM

2 More Replies

by apiury • New Contributor III

06-15-2023 4:49:13 AM

8139 Views
9 replies
14 kudos

Resolved! Pipeline workflow dude

Hi! I have a problem. I'm using an autoloader to ingest data from raw to a Delta Lake, but when my pipeline starts, I want to apply the pipeline only to the new data. The autoloader ingests data into the Delta Lake, but now, how can I distinguish the...

Data Engineering

8139 Views
9 replies
14 kudos

06-15-2023 4:49:13 AM

View Replies

Latest Reply

Anonymous
Not applicable

06-15-2023 8:35:13 PM

14 kudos

Hi @Alejandro Piury Pinzón We haven't heard from you since the last response from @Tyler Retzlaff , and I was checking back to see if her suggestions helped you.Or else, If you have any solution, please share it with the community, as it can be he...

14 kudos

06-15-2023 8:35:13 PM

8 More Replies

by pvignesh92 • Honored Contributor

05-26-2023 12:16:33 AM

1685 Views
0 replies
0 kudos

Very often, we need to know how many files my table path contains and the overall size of the path for various optimizations. In the past, I had to wr...

Very often, we need to know how many files my table path contains and the overall size of the path for various optimizations. In the past, I had to write my own logic to accomplish this.Delta Lake is making life easier. See how simple it is to obtain...

Data Engineering

1685 Views
0 replies
0 kudos

05-26-2023 12:16:33 AM

by THIAM_HUATTAN • Valued Contributor

05-15-2023 5:50:55 AM

6362 Views
6 replies
6 kudos

Resolved! Delta Lake’s CDF Feature

https://www.databricks.com/notebooks/delta-lake-cdf.htmlI am trying to understand the above article. Could someone explain to be the below questions?a) From SELECT * FROM table_changes('gold_consensus_eps', 2)why is consensus_eps values of 2.1 and 2....

Data Engineering

6362 Views
6 replies
6 kudos

05-15-2023 5:50:55 AM

View Replies

Latest Reply

Anonymous
Not applicable

05-23-2023 1:58:26 AM

6 kudos

Hi @THIAM HUAT TAN Thank you for posting your question in our community! We are happy to assist you.To help us provide you with the most accurate information, could you please take a moment to review the responses and select the one that best answer...

6 kudos

05-23-2023 1:58:26 AM

5 More Replies

by AnuVat • New Contributor III

02-02-2023 5:19:47 PM

51280 Views
7 replies
13 kudos

Resolved! How to read data from a table into a dataframe outside of Databricks environment?

Hi, I am working on an ML project and I need to access the data in tables hosted in my Databricks cluster through a notebook that I am running locally. This has been very easy while I run the notebooks in Databricks but I cannot figure out how to do ...

Data Engineering

51280 Views
7 replies
13 kudos

02-02-2023 5:19:47 PM

View Replies

Latest Reply

chakri
New Contributor III

04-28-2023 10:04:36 PM

13 kudos

We can use Apis and pyodbc to achieve this. Once go through the official documentation of databricks that might be helpful to access outside of the databricks environment.

13 kudos

04-28-2023 10:04:36 PM

6 More Replies

by JordanYaker • Contributor

02-24-2023 8:28:53 AM

6951 Views
8 replies
1 kudos

Why is Delta Lake creating a 238.0TiB shuffle on merge?

I'm frankly at a loss here. I have a task that is consistently performing just awfully. I took some time this morning to try and debug it and the physical plan is showing a 238TiB shuffle:== Physical Plan == AdaptiveSparkPlan (40) +- == Current Plan...

Data Engineering

6951 Views
8 replies
1 kudos

02-24-2023 8:28:53 AM

View Replies

Latest Reply

Vartika
Databricks Employee

04-25-2023 4:01:19 AM

1 kudos

Hi @Jordan Yaker,Hope all is well! Just wanted to check in if you were able to resolve your issue and would you be happy to share the solution or mark an answer as best? Else please let us know if you need more help. We'd love to hear from you.Thank...

1 kudos

04-25-2023 4:01:19 AM

7 More Replies

by Stokholm • New Contributor III

03-28-2023 1:45:14 AM

17976 Views
9 replies
1 kudos

Pushdown of datetime filter to date partition.

Hi Everybody,I have 20 years of data, 600m rows.I have partitioned them on year and month to generated a files size which seems reasonable.(128Mb)All data is queried using timestamp, as all queries needs to filter on the exact hours.So my requirement...

Data Engineering

17976 Views
9 replies
1 kudos

03-28-2023 1:45:14 AM

View Replies

Latest Reply

Stokholm
New Contributor III

04-24-2023 6:55:54 AM

1 kudos

Hi Guys, thanks for your advices. I found a solution. We upgrade the Databricks Runtime to 12.2 and now the pushdown of the partitionfilter works. The documentation said that 10.4 would be adequate, but obviously it wasn't enough.

1 kudos

04-24-2023 6:55:54 AM

8 More Replies

by Hunter1604 • New Contributor II

03-30-2023 11:44:11 AM

12830 Views
5 replies
0 kudos

How to remove checkpoints from DeltaLake table ?

How to remove checkpoints from DeltaLake table ?I see that on my delta table exist a few checkpoints I want to remove the oldest one. It seems that existing of it is blocking removing the oldest _delta_logs entries

Data Engineering

12830 Views
5 replies
0 kudos

03-30-2023 11:44:11 AM

View Replies

Latest Reply

Anonymous
Not applicable

03-31-2023 7:09:15 PM

0 kudos

Hi @Pawel Woj Hope everything is going great.Just wanted to check in if you were able to resolve your issue. If yes, would you be happy to mark an answer as best so that other members can find the solution more quickly? If not, please tell us so we ...

0 kudos

03-31-2023 7:09:15 PM

4 More Replies

by asethia • New Contributor

02-23-2023 2:43:24 PM

6840 Views
1 replies
0 kudos

delta lake in Apache Spark

Hi,As per documentation https://docs.delta.io/latest/quick-start.html , we can configure DeltaCatalog using spark.sql.catalog.spark_catalog.The Iceberg supports two Catalog implementations (https://iceberg.apache.org/docs/latest/spark-configuration/#...

Data Engineering

6840 Views
1 replies
0 kudos

02-23-2023 2:43:24 PM

View Replies

Latest Reply

Anonymous
Not applicable

03-31-2023 8:43:02 AM

0 kudos

@Arun Sethia :Yes, Delta Lake also supports custom catalogs. Delta Lake uses the Spark Catalog API, which allows for pluggable catalog implementations. You can implement your own custom catalog to use with Delta Lake.To use a custom catalog, you can...

0 kudos

03-31-2023 8:43:02 AM