- 5449 Views
- 4 replies
- 2 kudos
I am moving my Spark workloads from EMR/on-premise Spark cluster to Databricks. I understand Databricks Spark is different from Yarn. How is the Databricks architecture different from yarn?
- 5449 Views
- 4 replies
- 2 kudos
Latest Reply
What about the disadvantages?How can I separate multiple jobs running on the same cluster cleanly in the logs and same in the spark-ui?
3 More Replies
by
Matt_L
• New Contributor III
- 6931 Views
- 3 replies
- 3 kudos
Using OSS Delta, hopefully this is the right forum for this question:Hey all, I could use some help as I feel like I’m doing something wrong here.I’m streaming from Kafka -> Delta on EMR/S3FS, and am seeing ever-increasingly slow batches.When looking...
- 6931 Views
- 3 replies
- 3 kudos
Latest Reply
Found the answer through the Slack user group, courtesy of an Adam Binford.I had set `delta.logRetentionDuration='24 HOURS'` but did not set `delta.deletedFileRetentionDuration`, and so the checkpoint file still had all the accumulated tombstones sin...
2 More Replies
by
MarsSu
• New Contributor II
- 3352 Views
- 3 replies
- 3 kudos
I would like to confirm and discuss HA mechanism about driver node of job compute. Because we can image driver node just like master node of cluster. In AWS EMR, we can setup 2 master node so that one of master node failed, another master node can re...
- 3352 Views
- 3 replies
- 3 kudos
Latest Reply
Hi @Mars Su We haven't heard from you since the last response from @Werner Stinckens and @karthik p , and I was checking back to see if her suggestions helped you.Or else, If you have any solution, please share it with the community, as it can be...
2 More Replies
- 2941 Views
- 1 replies
- 0 kudos
We are migrating our Scala jobs from AWS EMR (6.2.1 and Spark version - 3.0.1) to Lakehouse and few of our jobs are failing due to NullPointerException. We tried in Databricks Runtime 7.3 LTS, it is working fine. Because it had same spark version 3.0...
- 2941 Views
- 1 replies
- 0 kudos
Latest Reply
In one of my code statements, I updated scala Boolean to java.lang.Boolean and this is working fine now. May be in new newer Spark versions, null in scala Boolean isn't supported.
- 5801 Views
- 5 replies
- 5 kudos
We are migrating from AWS EMR to Databricks. One thing that we have noticed during the POCs is that Databricks cluster of same size and instance type takes much lesser time to start compared to EMR.My understanding is Databricks also would be request...
- 5801 Views
- 5 replies
- 5 kudos
Latest Reply
@gud4eve what kind of cluster you are using, have you configured pools. if not as @Werner Stinckens said there might be chance Databricks worked hard to get provisioning of instances in faster way
4 More Replies