Data Engineering

Forum Posts

Sorted by:

by Erik_L • Contributor II

01-31-2023 5:31:49 PM

6691 Views
4 replies
4 kudos

Resolved! Support for Parquet brotli compression or a work around

Spark 3.3.1 supports the brotli compression codec, but when I use it to read parquet files from S3, I get:INVALID_ARGUMENT: Unsupported codec for Parquet page: BROTLIExample code:df = (spark.read.format("parquet") .option("compression", "brotli")...

Data Engineering

6691 Views
4 replies
4 kudos

01-31-2023 5:31:49 PM

View Replies

Latest Reply

Erik_L
Contributor II

02-01-2023 1:48:21 PM

4 kudos

Given the new information I appended, I looked into the Delta caching and I can disable it:.option("spark.databricks.io.cache.enabled", False)This works as a work around while I read these files in to save them locally in DBFS, but does it have perfo...

4 kudos

02-01-2023 1:48:21 PM

3 More Replies

by Ajay-Pandey • Esteemed Contributor III

02-23-2023 3:30:55 AM

2527 Views
2 replies
7 kudos

docs.databricks.com

Rename and drop columns with Delta Lake column mapping. Hi all,Now databricks started supporting column rename and drop.Column mapping requires the following Delta protocols:Reader version 2 or above.Writer version 5 or above.Blog URL##Available in D...

Data Engineering

2527 Views
2 replies
7 kudos

02-23-2023 3:30:55 AM

View Replies

Latest Reply

Poovarasan
New Contributor III

03-03-2024 9:51:03 PM

7 kudos

Above mentioned feature is not working in the DLT pipeline. if the scrip has more than 4 columns

7 kudos

03-03-2024 9:51:03 PM

1 More Replies

by Stokholm • New Contributor III

03-28-2023 1:45:14 AM

17756 Views
9 replies
1 kudos

Pushdown of datetime filter to date partition.

Hi Everybody,I have 20 years of data, 600m rows.I have partitioned them on year and month to generated a files size which seems reasonable.(128Mb)All data is queried using timestamp, as all queries needs to filter on the exact hours.So my requirement...

Data Engineering

17756 Views
9 replies
1 kudos

03-28-2023 1:45:14 AM

View Replies

Latest Reply

Stokholm
New Contributor III

04-24-2023 6:55:54 AM

1 kudos

Hi Guys, thanks for your advices. I found a solution. We upgrade the Databricks Runtime to 12.2 and now the pushdown of the partitionfilter works. The documentation said that 10.4 would be adequate, but obviously it wasn't enough.

1 kudos

04-24-2023 6:55:54 AM

8 More Replies

by Dave_Nithio • Contributor II

12-12-2022 3:18:35 PM

2180 Views
0 replies
1 kudos

Natively Query Delta Lake with R

I have a large delta table that I need to analyze in native R. The only option I have currently is to query the delta table then use collect() to bring that spark dataframe into an R dataframe. Is there an alternative method that would allow me to qu...

Data Engineering

2180 Views
0 replies
1 kudos

12-12-2022 3:18:35 PM

by db-avengers2rul • Contributor II

10-07-2022 4:13:50 AM

4326 Views
8 replies
18 kudos

Code snippet error from course - Databricks Academy - Delta Lake Rapid Start with Python

Dear Team,While i was doing hands on practice from the course - Delta Lake Rapid Start with Pythonhttps://customer-academy.databricks.com/learn/course/97/delta-lake-rapid-start-with-pythoni have come across false as the output dbutils.fs.rm(health_t...

Data Engineering

4326 Views
8 replies
18 kudos

10-07-2022 4:13:50 AM

View Replies

Latest Reply

Anonymous
Not applicable

11-19-2022 2:36:40 AM

18 kudos

Could you give more description about your issue (screenshot or something). Hope to help you find the issue?

18 kudos

11-19-2022 2:36:40 AM

7 More Replies

by AJDJ • New Contributor III

09-30-2022 3:35:41 PM

13009 Views
9 replies
4 kudos

Delta Lake Demo - Not working

Hi there, I imported the delta lake demo notebook from databricks link and at command 12 it errors out. I tired other ways and path but couldnt get past the error. May be the notebook is outdated?https://www.databricks.com/notebooks/Demo_Hub-Delta_La...

Data Engineering

13009 Views
9 replies
4 kudos

09-30-2022 3:35:41 PM

View Replies

Latest Reply

Anonymous
Not applicable

10-22-2022 11:11:37 PM

4 kudos

Hi @AJ DJ Does @Hubert Dudek response answer your question? If yes, would you be happy to mark it as best so that other members can find the solution more quickly?We'd love to hear from you.Thanks!

4 kudos

10-22-2022 11:11:37 PM

8 More Replies

by jayallenmn • New Contributor III

07-26-2022 8:41:27 AM

3938 Views
4 replies
3 kudos

Resolved! Couple of Delta Lake questions

Hey guys,We're considering Delta Lake as the storage for our project and have a couple questions. The first one is what's the pricing for Delta Lake - can't seem to find a page that says x amount costs y.The second question is more technical - if we...

Data Engineering

3938 Views
4 replies
3 kudos

07-26-2022 8:41:27 AM

View Replies

Latest Reply

-werners-
Esteemed Contributor III

07-27-2022 4:03:02 AM

3 kudos

delta lake itself is free. It is a file format. But you will have to pay for storage and compute of course.If you want to use Databricks with delta lake, it will not be free unless you use the community edition.Depending on what you are planning to...

3 kudos

07-27-2022 4:03:02 AM

3 More Replies

by Bency • New Contributor III

03-18-2022 4:51:39 AM

8419 Views
6 replies
4 kudos

Resolved! Databricks Delta Lake Sink Connector

I am trying to use Databricks Delta Lake Sink Connector(confluent cloud ) and write to S3 . the connector starts up with the following error . Any help on this could be appreciated org.apache.kafka.connect.errors.ConnectException: java.sql.SQLExcepti...

Data Engineering

8419 Views
6 replies
4 kudos

03-18-2022 4:51:39 AM

View Replies

Latest Reply

Bency
New Contributor III

03-24-2022 11:53:07 AM

4 kudos

Hi @Kaniz Fatma yes we did , looks like it was indeed a whitelisting issue . Thanks @Hubert Dudek @Kaniz Fatma

4 kudos

03-24-2022 11:53:07 AM

5 More Replies

by Anonymous • Not applicable

06-22-2021 7:05:00 PM

4282 Views
1 replies
0 kudos

How Can I setup a Presto to Delta Lake integration & Query Delta Tables?

Data Engineering

4282 Views
1 replies
0 kudos

06-22-2021 7:05:00 PM

View Replies

Latest Reply

aladda
Databricks Employee

06-22-2021 8:25:57 PM

0 kudos

See this docs article for details on setting up a Delta Presto integration - https://docs.databricks.com/delta/presto-integration.html

0 kudos

06-22-2021 8:25:57 PM

by User16790091296 • Databricks Employee

05-21-2021 11:40:49 AM

1876 Views
1 replies
0 kudos

What’s the best instance type to run OPTIMIZE (bin-packing and Z-Ordering) on?

I've been doing some research on optimizing data storage while implementing delta, however, I'm not sure which instance type would be best for this.

Data Engineering

1876 Views
1 replies
0 kudos

05-21-2021 11:40:49 AM

View Replies

Latest Reply

sajith_appukutt
Databricks Employee

06-17-2021 10:26:45 PM

0 kudos

OPTIMIZE as you alluded has two operations , Bin-packing and multi-dimensional clustering ( zorder)Bin-packing optimization is idempotent, meaning that if it is run twice on the same dataset, the second run has no effectZ-Ordering is not idempotent b...

0 kudos

06-17-2021 10:26:45 PM