Data Engineering

Forum Posts

Sorted by:

by brickster_2018 • Databricks Employee

06-23-2021 8:25:02 AM

5449 Views
4 replies
2 kudos

Resolved! Databricks Spark Vs Spark on Yarn

I am moving my Spark workloads from EMR/on-premise Spark cluster to Databricks. I understand Databricks Spark is different from Yarn. How is the Databricks architecture different from yarn?

Data Engineering

5449 Views
4 replies
2 kudos

06-23-2021 8:25:02 AM

View Replies

Latest Reply

de-qrosh
New Contributor III

01-29-2025 8:47:59 AM

2 kudos

What about the disadvantages?How can I separate multiple jobs running on the same cluster cleanly in the logs and same in the spark-ui?

2 kudos

01-29-2025 8:47:59 AM

3 More Replies

by Matt_L • New Contributor III

10-12-2021 10:19:36 AM

6931 Views
3 replies
3 kudos

Resolved! Slow performance loading checkpoint file?

Using OSS Delta, hopefully this is the right forum for this question:Hey all, I could use some help as I feel like I’m doing something wrong here.I’m streaming from Kafka -> Delta on EMR/S3FS, and am seeing ever-increasingly slow batches.When looking...

Data Engineering

6931 Views
3 replies
3 kudos

10-12-2021 10:19:36 AM

View Replies

Latest Reply

Matt_L
New Contributor III

10-13-2021 9:22:28 AM

3 kudos

Found the answer through the Slack user group, courtesy of an Adam Binford.I had set `delta.logRetentionDuration='24 HOURS'` but did not set `delta.deletedFileRetentionDuration`, and so the checkpoint file still had all the accumulated tombstones sin...

3 kudos

10-13-2021 9:22:28 AM

2 More Replies

by MarsSu • New Contributor II

05-11-2023 7:23:35 PM

3352 Views
3 replies
3 kudos

Resolved! Does driver node of job compute have HA?

I would like to confirm and discuss HA mechanism about driver node of job compute. Because we can image driver node just like master node of cluster. In AWS EMR, we can setup 2 master node so that one of master node failed, another master node can re...

Data Engineering

3352 Views
3 replies
3 kudos

05-11-2023 7:23:35 PM

View Replies

Latest Reply

Anonymous
Not applicable

05-22-2023 12:23:36 AM

3 kudos

Hi @Mars Su We haven't heard from you since the last response from @Werner Stinckens and @karthik p , and I was checking back to see if her suggestions helped you.Or else, If you have any solution, please share it with the community, as it can be...

3 kudos

05-22-2023 12:23:36 AM

2 More Replies

by gud4eve • New Contributor III

04-10-2023 12:07:10 AM

2941 Views
1 replies
0 kudos

Resolved! Scala app getting NullPointerException while migrating from DBR 7.3 to 9.1 (and above)

We are migrating our Scala jobs from AWS EMR (6.2.1 and Spark version - 3.0.1) to Lakehouse and few of our jobs are failing due to NullPointerException. We tried in Databricks Runtime 7.3 LTS, it is working fine. Because it had same spark version 3.0...

Data Engineering

2941 Views
1 replies
0 kudos

04-10-2023 12:07:10 AM

View Replies

Latest Reply

gud4eve
New Contributor III

04-10-2023 11:33:40 PM

0 kudos

In one of my code statements, I updated scala Boolean to java.lang.Boolean and this is working fine now. May be in new newer Spark versions, null in scala Boolean isn't supported.

0 kudos

04-10-2023 11:33:40 PM

by gud4eve • New Contributor III

10-09-2022 11:42:57 PM

5801 Views
5 replies
5 kudos

Resolved! Why is Databricks on AWS cluster start time less than 5 mins and EMR cluster start time is 15 mins?

We are migrating from AWS EMR to Databricks. One thing that we have noticed during the POCs is that Databricks cluster of same size and instance type takes much lesser time to start compared to EMR.My understanding is Databricks also would be request...

Data Engineering

5801 Views
5 replies
5 kudos

10-09-2022 11:42:57 PM

View Replies

Latest Reply

karthik_p
Esteemed Contributor

10-13-2022 7:41:18 AM

5 kudos

@gud4eve what kind of cluster you are using, have you configured pools. if not as @Werner Stinckens said there might be chance Databricks worked hard to get provisioning of instances in faster way

5 kudos

10-13-2022 7:41:18 AM

4 More Replies

Databricks Community

Resolved! Databricks Spark Vs Spark on Yarn

Resolved! Slow performance loading checkpoint file?

Resolved! Does driver node of job compute have HA?

Resolved! Scala app getting NullPointerException while migrating from DBR 7.3 to 9.1 (and above)

Resolved! Why is Databricks on AWS cluster start time less than 5 mins and EMR cluster start time is 15 mins?