cancel
Showing results for 
Search instead for 
Did you mean: 
Data Engineering
cancel
Showing results for 
Search instead for 
Did you mean: 

Forum Posts

Erik
by Valued Contributor II
  • 10019 Views
  • 22 replies
  • 15 kudos

How to enable/verify cloud fetch from PowerBI

I tried to benchmark the Powerbi Databricks connector vs the powerbi Delta Lake reader on a dataset of 2.15million rows. I found that the delta lake reader used 20 seconds, while importing through the SQL compute endpoint took ~75 seconds. When I loo...

query_statistics query_profile_tree_view
  • 10019 Views
  • 22 replies
  • 15 kudos
Latest Reply
pulkitm
New Contributor III
  • 15 kudos

Guys, is there any way to switch off CloudFetch and fall back to ArrowResultSet by default irrespective of size? using the latest version of Spark Simba ODBC driver?

  • 15 kudos
21 More Replies
zeta_load
by New Contributor II
  • 822 Views
  • 2 replies
  • 0 kudos

Resolved! When does delta lake actually compute a table?

Maybe I'm completely wrong, but from my understanding delta lake only calculates a table at certain points, for instance when you display your data. Before that point, operations are only written to the log file and are not executed (meaning no chang...

  • 822 Views
  • 2 replies
  • 0 kudos
Latest Reply
Anonymous
Not applicable
  • 0 kudos

Hi @Lukas Goldschmied​ Hope all is well! Just wanted to check in if you were able to resolve your issue and would you be happy to share the solution or mark an answer as best? Else please let us know if you need more help. We'd love to hear from you....

  • 0 kudos
1 More Replies
Rishabh264
by Honored Contributor II
  • 1321 Views
  • 0 replies
  • 5 kudos

To connect Delta Lake with Microsoft Excel, you can use the Microsoft Power Query for Excel add-in. Power Query is a data connection tool that allows ...

To connect Delta Lake with Microsoft Excel, you can use the Microsoft Power Query for Excel add-in. Power Query is a data connection tool that allows you to connect to various data sources, including Delta Lake. Here's how to do it:Install the Micros...

  • 1321 Views
  • 0 replies
  • 5 kudos
Chris_Shehu
by Valued Contributor III
  • 722 Views
  • 1 replies
  • 2 kudos

What are the options for extracting data from the delta lake for a vendor?

Our vendor is looking to use Microsoft API Manager to retrieve data from a variety of sources. Is it possible to extract records from the delta lake by using an API?What I've tried:I reviewed the available databricks API's it looks like most of them ...

  • 722 Views
  • 1 replies
  • 2 kudos
Latest Reply
Chris_Shehu
Valued Contributor III
  • 2 kudos

Another possibility for this potentially is to stand up a cluster and have a notebook running flask to create an API interface. I'm still looking into options, but it seems like there should be a baked in solution besides the JDBC connector. I'm not ...

  • 2 kudos
DB_developer
by New Contributor III
  • 3017 Views
  • 2 replies
  • 3 kudos

How to optimize storage for sparse data in data lake?

I have lot of tables with 80% of columns being filled with nulls. I understand SQL sever provides a way to handle these kind of data during the data definition of the tables (with Sparse keyword). Do datalake provide similar kind of thing?

  • 3017 Views
  • 2 replies
  • 3 kudos
Latest Reply
-werners-
Esteemed Contributor III
  • 3 kudos

datalake itself not, but the file format you use to store data does.f.e. parquet uses column compression, so sparse data will compress pretty good.csv on the other hand: total disaster

  • 3 kudos
1 More Replies
DB_developer
by New Contributor III
  • 2578 Views
  • 4 replies
  • 7 kudos

Resolved! How nulls are stored in delta lake and databricks?

In my findings I have found a lot of delta tables in the lake house to be sparse so just wondering what space data lake takes to store null data and also any suggestions to handle sparse data tables in lake house would be appreciated.I also want to o...

  • 2578 Views
  • 4 replies
  • 7 kudos
Latest Reply
Kaniz
Community Manager
  • 7 kudos

Hi @Akash Ragothu​, We haven’t heard from you since the last response from @Ajay Pandey​, and I was checking back to see if his suggestions helped you.Or else, If you have any solution, please share it with the community, as it can be helpful to othe...

  • 7 kudos
3 More Replies
User16835756816
by Valued Contributor
  • 2361 Views
  • 4 replies
  • 11 kudos

How can I extract data from different sources and transform it into a fresh, reliable data pipeline?

Tip: These steps are built out for AWS accounts and workspaces that are using Delta Lake. If you would like to learn more watch this video and reach out to your Databricks sales representative for more information.Step 1: Create your own notebook or ...

  • 2361 Views
  • 4 replies
  • 11 kudos
Latest Reply
Ajay-Pandey
Esteemed Contributor III
  • 11 kudos

Thanks @Nithya Thangaraj​ 

  • 11 kudos
3 More Replies
Sweta
by New Contributor II
  • 1727 Views
  • 10 replies
  • 7 kudos

Can Delta Lake completely host a data warehouse and replace Redshift?

Our use case is simple - to store our PB scale data and transform and use for BI, reporting and analytics. As my title says am trying to eliminate expenditure on Redshift as we are starting as a green field. I know I have designed/used just Delta lak...

  • 1727 Views
  • 10 replies
  • 7 kudos
Latest Reply
Anonymous
Not applicable
  • 7 kudos

Hi @Swetha Marakani​ Hope all is well! Just wanted to check in if you were able to resolve your issue and would you be happy to share the solution or mark an answer as best? Else please let us know if you need more help. We'd love to hear from you.Th...

  • 7 kudos
9 More Replies
db-avengers2rul
by Contributor II
  • 1526 Views
  • 8 replies
  • 18 kudos

Code snippet error from course - Databricks Academy - Delta Lake Rapid Start with Python

Dear Team,While i was doing hands on practice from the course - Delta Lake Rapid Start with Pythonhttps://customer-academy.databricks.com/learn/course/97/delta-lake-rapid-start-with-pythoni have come across false as the output dbutils.fs.rm(health_t...

  • 1526 Views
  • 8 replies
  • 18 kudos
Latest Reply
Anonymous
Not applicable
  • 18 kudos

Could you give more description about your issue (screenshot or something). Hope to help you find the issue?

  • 18 kudos
7 More Replies
NOOR_BASHASHAIK
by Contributor
  • 2194 Views
  • 5 replies
  • 4 kudos

Azure Databricks VM type for OPTIMIZE with ZORDER on a single column

DearsI was trying to check what Azure Databricks VM type is best suited for executing OPTIMIZE with ZORDER on a single timestamp value (but string data type) column for around 5000+ tables in the Delta Lake.I chose Standard_F16s_v2 with 6 workers & 1...

image image image image
  • 2194 Views
  • 5 replies
  • 4 kudos
Latest Reply
Kaniz
Community Manager
  • 4 kudos

Hi @NOOR BASHA SHAIK​​, Please don't forget to click on the "Select As Best" button whenever the information provided helps resolve your question.

  • 4 kudos
4 More Replies
Priyanka48
by New Contributor III
  • 856 Views
  • 3 replies
  • 0 kudos

Is there any way we can use usermetadataAsOf option in time travelling query or can we modify the timestamps of delta lake that seems to be immutable?

We are using delta lake time travelling capability in our current project. We can use select * from timestamp/versionAsOF query. However ,there might be some change in our approach and we might need to recreate the delta lake while persisting the tim...

  • 856 Views
  • 3 replies
  • 0 kudos
Latest Reply
Kaniz
Community Manager
  • 0 kudos

Hi @Priyanka Mane​ , We haven't heard from you on the last response from @Debayan Mukherjee​ , and I was checking back to see if his suggestions helped you. Or else, If you have any solution, please share it with the community as it can be helpful to...

  • 0 kudos
2 More Replies
jayallenmn
by New Contributor III
  • 1002 Views
  • 4 replies
  • 3 kudos

Resolved! Couple of Delta Lake questions

Hey guys,We're considering Delta Lake as the storage for our project and have a couple questions. The first one is what's the pricing for Delta Lake - can't seem to find a page that says x amount costs y.The second question is more technical - if we...

  • 1002 Views
  • 4 replies
  • 3 kudos
Latest Reply
-werners-
Esteemed Contributor III
  • 3 kudos

delta lake itself is free. It is a file format. But you will have to pay for storage and compute of course.If you want to use Databricks with delta lake, it will not be free unless you use the community edition.Depending on what you are planning to...

  • 3 kudos
3 More Replies
abaschkim
by New Contributor II
  • 937 Views
  • 4 replies
  • 0 kudos

Delta Lake table: large volume due to versioning

I have set up a Spark standalone cluster and use Spark Structured Streaming to write data from Kafka to multiple Delta Lake tables - simply stored in the file system. So there are multiple writes per second. After running the pipeline for a while, I ...

  • 937 Views
  • 4 replies
  • 0 kudos
Latest Reply
Anonymous
Not applicable
  • 0 kudos

Hey there @Kim Abasch​ Hope all is well! Just wanted to check in if you were able to resolve your issue and would you be happy to share the solution or mark an answer as best? Else please let us know if you need more help. We'd love to hear from you....

  • 0 kudos
3 More Replies
spartakos
by New Contributor
  • 486 Views
  • 0 replies
  • 0 kudos

Big data ingest into Delta Lake

I have a feature table in BQ that I want to ingest into Delta Lake. This feature table in BQ has 100TB of data. This table can be partitioned by DATE.What best practices and approaches can I take to ingest this 100TB? In particular, what can I do to ...

  • 486 Views
  • 0 replies
  • 0 kudos
Labels