Topics with Label: Delt Lake

Forum Posts

Sorted by:

by Erik • Valued Contributor II

01-30-2022 8:01:05 AM

10019 Views
22 replies
15 kudos

How to enable/verify cloud fetch from PowerBI

I tried to benchmark the Powerbi Databricks connector vs the powerbi Delta Lake reader on a dataset of 2.15million rows. I found that the delta lake reader used 20 seconds, while importing through the SQL compute endpoint took ~75 seconds. When I loo...

Data Engineering

10019 Views
22 replies
15 kudos

01-30-2022 8:01:05 AM

View Replies

Latest Reply

pulkitm
New Contributor III

02-27-2023 7:24:33 AM

15 kudos

Guys, is there any way to switch off CloudFetch and fall back to ArrowResultSet by default irrespective of size? using the latest version of Spark Simba ODBC driver?

15 kudos

02-27-2023 7:24:33 AM

21 More Replies

by zeta_load • New Contributor II

02-17-2023 6:41:15 AM

822 Views
2 replies
0 kudos

Resolved! When does delta lake actually compute a table?

Maybe I'm completely wrong, but from my understanding delta lake only calculates a table at certain points, for instance when you display your data. Before that point, operations are only written to the log file and are not executed (meaning no chang...

Data Engineering

822 Views
2 replies
0 kudos

02-17-2023 6:41:15 AM

View Replies

Latest Reply

Anonymous
Not applicable

02-19-2023 10:03:47 PM

0 kudos

Hi @Lukas Goldschmied Hope all is well! Just wanted to check in if you were able to resolve your issue and would you be happy to share the solution or mark an answer as best? Else please let us know if you need more help. We'd love to hear from you....

0 kudos

02-19-2023 10:03:47 PM

1 More Replies

by Rishabh264 • Honored Contributor II

12-26-2022 11:53:27 PM

1321 Views
0 replies
5 kudos

To connect Delta Lake with Microsoft Excel, you can use the Microsoft Power Query for Excel add-in. Power Query is a data connection tool that allows ...

To connect Delta Lake with Microsoft Excel, you can use the Microsoft Power Query for Excel add-in. Power Query is a data connection tool that allows you to connect to various data sources, including Delta Lake. Here's how to do it:Install the Micros...

Data Engineering

1321 Views
0 replies
5 kudos

12-26-2022 11:53:27 PM

by Chris_Shehu • Valued Contributor III

12-12-2022 1:36:26 PM

722 Views
1 replies
2 kudos

What are the options for extracting data from the delta lake for a vendor?

Our vendor is looking to use Microsoft API Manager to retrieve data from a variety of sources. Is it possible to extract records from the delta lake by using an API?What I've tried:I reviewed the available databricks API's it looks like most of them ...

Data Engineering

722 Views
1 replies
2 kudos

12-12-2022 1:36:26 PM

View Replies

Latest Reply

Chris_Shehu
Valued Contributor III

12-13-2022 6:25:36 AM

2 kudos

Another possibility for this potentially is to stand up a cluster and have a notebook running flask to create an API interface. I'm still looking into options, but it seems like there should be a baked in solution besides the JDBC connector. I'm not ...

2 kudos

12-13-2022 6:25:36 AM

by DB_developer • New Contributor III

12-08-2022 5:08:20 AM

3017 Views
2 replies
3 kudos

How to optimize storage for sparse data in data lake?

I have lot of tables with 80% of columns being filled with nulls. I understand SQL sever provides a way to handle these kind of data during the data definition of the tables (with Sparse keyword). Do datalake provide similar kind of thing?

Data Engineering

3017 Views
2 replies
3 kudos

12-08-2022 5:08:20 AM

View Replies

Latest Reply

-werners-
Esteemed Contributor III

12-08-2022 6:17:23 AM

3 kudos

datalake itself not, but the file format you use to store data does.f.e. parquet uses column compression, so sparse data will compress pretty good.csv on the other hand: total disaster

3 kudos

12-08-2022 6:17:23 AM

1 More Replies

by DB_developer • New Contributor III

12-07-2022 1:27:39 AM

2578 Views
4 replies
7 kudos

Resolved! How nulls are stored in delta lake and databricks?

In my findings I have found a lot of delta tables in the lake house to be sparse so just wondering what space data lake takes to store null data and also any suggestions to handle sparse data tables in lake house would be appreciated.I also want to o...

Data Engineering

2578 Views
4 replies
7 kudos

12-07-2022 1:27:39 AM

View Replies

Latest Reply

Kaniz
Community Manager

12-07-2022 4:20:30 AM

7 kudos

Hi @Akash Ragothu, We haven’t heard from you since the last response from @Ajay Pandey, and I was checking back to see if his suggestions helped you.Or else, If you have any solution, please share it with the community, as it can be helpful to othe...

7 kudos

12-07-2022 4:20:30 AM

3 More Replies

by User16835756816 • Valued Contributor

11-28-2022 12:04:54 PM

2361 Views
4 replies
11 kudos

How can I extract data from different sources and transform it into a fresh, reliable data pipeline?

Tip: These steps are built out for AWS accounts and workspaces that are using Delta Lake. If you would like to learn more watch this video and reach out to your Databricks sales representative for more information.Step 1: Create your own notebook or ...

Data Engineering

2361 Views
4 replies
11 kudos

11-28-2022 12:04:54 PM

View Replies

Latest Reply

Ajay-Pandey
Esteemed Contributor III

12-04-2022 11:02:29 PM

11 kudos

Thanks @Nithya Thangaraj

11 kudos

12-04-2022 11:02:29 PM

3 More Replies

by Sweta • New Contributor II

10-12-2022 8:50:35 PM

1727 Views
10 replies
7 kudos

Can Delta Lake completely host a data warehouse and replace Redshift?

Our use case is simple - to store our PB scale data and transform and use for BI, reporting and analytics. As my title says am trying to eliminate expenditure on Redshift as we are starting as a green field. I know I have designed/used just Delta lak...

Data Engineering

1727 Views
10 replies
7 kudos

10-12-2022 8:50:35 PM

View Replies

Latest Reply

Anonymous
Not applicable

11-27-2022 4:51:29 AM

7 kudos

Hi @Swetha Marakani Hope all is well! Just wanted to check in if you were able to resolve your issue and would you be happy to share the solution or mark an answer as best? Else please let us know if you need more help. We'd love to hear from you.Th...

7 kudos

11-27-2022 4:51:29 AM

9 More Replies

by db-avengers2rul • Contributor II

10-07-2022 4:13:50 AM

1526 Views
8 replies
18 kudos

Code snippet error from course - Databricks Academy - Delta Lake Rapid Start with Python

Dear Team,While i was doing hands on practice from the course - Delta Lake Rapid Start with Pythonhttps://customer-academy.databricks.com/learn/course/97/delta-lake-rapid-start-with-pythoni have come across false as the output dbutils.fs.rm(health_t...

Data Engineering

1526 Views
8 replies
18 kudos

10-07-2022 4:13:50 AM

View Replies

Latest Reply

Anonymous
Not applicable

11-19-2022 2:36:40 AM

18 kudos

Could you give more description about your issue (screenshot or something). Hope to help you find the issue?

18 kudos

11-19-2022 2:36:40 AM

7 More Replies

by Raghav1 • New Contributor II

10-27-2022 7:23:14 AM

499 Views
0 replies
0 kudos

How Delta Lake parses through the files on querying the table with a filter condition

Consider a table that gets partitioned on a date field. But, I'm filtering a column that is not partitioned. Now, with this filter condition whether all the files are parsed to attain the required result set, or does any data skipping happens?

Data Engineering

499 Views
0 replies
0 kudos

10-27-2022 7:23:14 AM

by NOOR_BASHASHAIK • Contributor

10-22-2022 9:59:36 AM

2194 Views
5 replies
4 kudos

Azure Databricks VM type for OPTIMIZE with ZORDER on a single column

DearsI was trying to check what Azure Databricks VM type is best suited for executing OPTIMIZE with ZORDER on a single timestamp value (but string data type) column for around 5000+ tables in the Delta Lake.I chose Standard_F16s_v2 with 6 workers & 1...

Data Engineering

2194 Views
5 replies
4 kudos

10-22-2022 9:59:36 AM

View Replies

Latest Reply

Kaniz
Community Manager

10-25-2022 4:13:33 AM

4 kudos

Hi @NOOR BASHA SHAIK, Please don't forget to click on the "Select As Best" button whenever the information provided helps resolve your question.

4 kudos

10-25-2022 4:13:33 AM

4 More Replies

by Priyanka48 • New Contributor III

08-31-2022 12:31:29 AM

856 Views
3 replies
0 kudos

Is there any way we can use usermetadataAsOf option in time travelling query or can we modify the timestamps of delta lake that seems to be immutable?

We are using delta lake time travelling capability in our current project. We can use select * from timestamp/versionAsOF query. However ,there might be some change in our approach and we might need to recreate the delta lake while persisting the tim...

Data Engineering

856 Views
3 replies
0 kudos

08-31-2022 12:31:29 AM

View Replies

Latest Reply

Kaniz
Community Manager

09-03-2022 1:58:57 PM

0 kudos

Hi @Priyanka Mane , We haven't heard from you on the last response from @Debayan Mukherjee , and I was checking back to see if his suggestions helped you. Or else, If you have any solution, please share it with the community as it can be helpful to...

0 kudos

09-03-2022 1:58:57 PM

2 More Replies

by jayallenmn • New Contributor III

07-26-2022 8:41:27 AM

1002 Views
4 replies
3 kudos

Resolved! Couple of Delta Lake questions

Hey guys,We're considering Delta Lake as the storage for our project and have a couple questions. The first one is what's the pricing for Delta Lake - can't seem to find a page that says x amount costs y.The second question is more technical - if we...

Data Engineering

1002 Views
4 replies
3 kudos

07-26-2022 8:41:27 AM

View Replies

Latest Reply

-werners-
Esteemed Contributor III

07-27-2022 4:03:02 AM

3 kudos

delta lake itself is free. It is a file format. But you will have to pay for storage and compute of course.If you want to use Databricks with delta lake, it will not be free unless you use the community edition.Depending on what you are planning to...

3 kudos

07-27-2022 4:03:02 AM

3 More Replies

by abaschkim • New Contributor II

05-30-2022 7:31:29 AM

937 Views
4 replies
0 kudos

Delta Lake table: large volume due to versioning

I have set up a Spark standalone cluster and use Spark Structured Streaming to write data from Kafka to multiple Delta Lake tables - simply stored in the file system. So there are multiple writes per second. After running the pipeline for a while, I ...

Data Engineering

937 Views
4 replies
0 kudos

05-30-2022 7:31:29 AM

View Replies

Latest Reply

Anonymous
Not applicable

07-29-2022 9:38:25 AM

0 kudos

Hey there @Kim Abasch Hope all is well! Just wanted to check in if you were able to resolve your issue and would you be happy to share the solution or mark an answer as best? Else please let us know if you need more help. We'd love to hear from you....

0 kudos

07-29-2022 9:38:25 AM

3 More Replies

by spartakos • New Contributor

06-30-2022 8:29:42 AM

486 Views
0 replies
0 kudos

Big data ingest into Delta Lake

I have a feature table in BQ that I want to ingest into Delta Lake. This feature table in BQ has 100TB of data. This table can be partitioned by DATE.What best practices and approaches can I take to ingest this 100TB? In particular, what can I do to ...

Data Engineering

486 Views
0 replies
0 kudos

06-30-2022 8:29:42 AM