Data Engineering

Forum Posts

Sorted by:

by Nis • New Contributor II

05-17-2023 7:01:35 AM

2474 Views
2 replies
2 kudos

Best sequence of using Vacuum, optimize, fsck repair and refresh commands.

I have a delta table whose size will increases gradually now we have around 1.5 crores of rows while running vacuum command on that table i am getting the below error.ERROR: Job aborted due to stage failure: Task 7 in stage 491.0 failed 4 times, most...

Data Engineering

2474 Views
2 replies
2 kudos

05-17-2023 7:01:35 AM

View Replies

Latest Reply

alex307
New Contributor II

2 weeks ago

2 kudos

In my opinion Best order: Optimize → Vacuum → FSCK Repair → Refresh.Your error is likely a timeout — try more cluster resources or a longer retention period.

2 kudos

2 weeks ago

1 More Replies

by alhuelamo • New Contributor II

12-07-2022 8:14:35 AM

10483 Views
5 replies
1 kudos

Getting non-traceable NullPointerExceptions

We're running a job that's issuing NullPointerException without traces of our job's code.Does anybody know what would be the best course of action when it comes to debugging these issues?The job is a Scala job running on DBR 11.3 LTS.In case it's rel...

Data Engineering

10483 Views
5 replies
1 kudos

12-07-2022 8:14:35 AM

View Replies

Latest Reply

Amora
New Contributor II

3 weeks ago

1 kudos

You could try enabling full stack traces and checking the Spark executor logs for hidden errors. Null Pointer Exceptions in Scala on DBR often come from lazy evaluations or missing schema fields during I/O. Reviewing your Data Frame transformations a...

1 kudos

3 weeks ago

4 More Replies

by Anwar_Patel • New Contributor III

04-12-2023 4:41:07 AM

4323 Views
5 replies
0 kudos

Resolved! Not received my certificate after passing Databricks Certified Associate Developer for Apache Spark 3.0 - Python.

I've successfully passed Databricks Certified Associate Developer for Apache Spark 3.0 - Python but still have not received the certificate. E-mail : anwarpatel91@gmail.com

Data Engineering

4323 Views
5 replies
0 kudos

04-12-2023 4:41:07 AM

View Replies

Latest Reply

simha6_reddy
New Contributor II

07-14-2025 5:56:09 AM

0 kudos

Even i am facing the same issue. I have successfully passed Databricks Certified Associate Developer for Apache Spark - Python but still have not received the certificate. E-mail : simha6.reddy@gmail.com

0 kudos

07-14-2025 5:56:09 AM

4 More Replies

by RS1 • New Contributor III

07-05-2022 9:34:25 AM

1121 Views
1 replies
1 kudos

I attended the Advanced Machine Learning with Databricks training last week virtually I am still unable to get the day 2 session videos of any of the...

I attended the Advanced Machine Learning with Databricks training last week virtually I am still unable to get the day 2 session videos of any of the Instructor led Paid Trainings. They are supposed to be available for replay with in 24 hours but I ...

Data Engineering

1121 Views
1 replies
1 kudos

07-05-2022 9:34:25 AM

View Replies

Latest Reply

murali9
New Contributor II

02-25-2025 2:41:37 PM

1 kudos

I have the same problem.

1 kudos

02-25-2025 2:41:37 PM

by FarBo • New Contributor III

01-05-2023 6:57:40 AM

10884 Views
5 replies
5 kudos

Spark issue handling data from json when the schema DataType mismatch occurs

Hi,I have encountered a problem using spark, when creating a dataframe from a raw json source.I have defined an schema for my data and the problem is that when there is a mismatch between one of the column values and its defined schema, spark not onl...

Data Engineering

10884 Views
5 replies
5 kudos

01-05-2023 6:57:40 AM

View Replies

Latest Reply

Anonymous
Not applicable

04-10-2023 6:11:47 AM

5 kudos

@Farzad Bonabi :Thank you for reporting this issue. It seems to be a known bug in Spark when dealing with malformed decimal values. When a decimal value in the input JSON data is not parseable by Spark, it sets not only that column to null but also ...

5 kudos

04-10-2023 6:11:47 AM

4 More Replies

by brickster_2018 • Databricks Employee

06-23-2021 8:25:02 AM

7214 Views
4 replies
2 kudos

Resolved! Databricks Spark Vs Spark on Yarn

I am moving my Spark workloads from EMR/on-premise Spark cluster to Databricks. I understand Databricks Spark is different from Yarn. How is the Databricks architecture different from yarn?

Data Engineering

7214 Views
4 replies
2 kudos

06-23-2021 8:25:02 AM

View Replies

Latest Reply

de-qrosh
New Contributor III

01-29-2025 8:47:59 AM

2 kudos

What about the disadvantages?How can I separate multiple jobs running on the same cluster cleanly in the logs and same in the spark-ui?

2 kudos

01-29-2025 8:47:59 AM

3 More Replies

by DJey • New Contributor III

06-02-2023 6:52:05 AM

22169 Views
6 replies
2 kudos

Resolved! MergeSchema Not Working

Hi All, I have a scenario where my Exisiting Delta Table looks like below:Now I have an incremental data with an additional column i.e. owner:Dataframe Name --> scdDFBelow is the code snippet to merge Incremental Dataframe to targetTable, but the new...

Data Engineering

22169 Views
6 replies
2 kudos

06-02-2023 6:52:05 AM

View Replies

Latest Reply

Amin112
New Contributor II

09-26-2024 8:51:35 PM

2 kudos

In Databricks Runtime 15.2 and above, you can specify schema evolution in a merge statement using SQL or Delta table APIs:MERGE WITH SCHEMA EVOLUTION INTO targetUSING sourceON source.key = target.keyWHEN MATCHED THENUPDATE SET *WHEN NOT MATCHED THENI...

2 kudos

09-26-2024 8:51:35 PM

5 More Replies

by Anonymous • Not applicable

02-18-2022 4:13:05 AM

4060 Views
1 replies
2 kudos

6.4 Extended Support (includes Apache Spark 2.4.5, Scala 2.11 Connect Timeout

"Notebook detached Exception when creating execution context: java.net.SocketTimeout Exception: Connect Timeout" when trying to connect my cluster to a notebook. Then "Error trying to handle that request We failed to handle that request, please try a...

Data Engineering

4060 Views
1 replies
2 kudos

02-18-2022 4:13:05 AM

View Replies

Latest Reply

Wolverine
New Contributor III

03-27-2024 1:00:49 PM

2 kudos

Hello @Retired_mod I am facing same issue I tried changing DBR but it is still giving me error and the cluster is not startingRegardsMS

2 kudos

03-27-2024 1:00:49 PM

by brickster_2018 • Databricks Employee

06-23-2021 11:37:25 PM

15216 Views
3 replies
6 kudos

Resolved! How to add I custom logging in Databricks

I want to add custom logs that redirect in the Spark driver logs. Can I use the existing logger classes to have my application logs or progress message in the Spark driver logs.

Data Engineering

15216 Views
3 replies
6 kudos

06-23-2021 11:37:25 PM

View Replies

Latest Reply

Kaizen
Valued Contributor

02-09-2024 10:18:15 AM

6 kudos

1) Is it possible to save all the custom logging to its own file? Currently it is being logging with all other cluster logs (see image) 2) Also Databricks it seems like a lot of blank files are also being created for this. Is this a bug? this include...

6 kudos

02-09-2024 10:18:15 AM

2 More Replies

by Smitha1 • Valued Contributor II

10-26-2022 12:41:47 AM

5690 Views
9 replies
3 kudos

Databricks Certified Associate Developer for Apache Spark 3.0

Data Engineering

5690 Views
9 replies
3 kudos

10-26-2022 12:41:47 AM

View Replies

Latest Reply

Shivam_Patil
New Contributor II

11-22-2023 4:06:30 AM

3 kudos

Hey I am looking for sample papers for the above exam other than the one provided by databricks do any one have any idea about it

3 kudos

11-22-2023 4:06:30 AM

8 More Replies

by brickster_2018 • Databricks Employee

06-25-2021 11:43:48 AM

4118 Views
2 replies
0 kudos

Resolved! The driver is temporarily unavailable

My job fails with Driver is temporarily unavailable. Apparently, it's permanently unavailable, because the job is not pausing but failing.

Data Engineering

4118 Views
2 replies
0 kudos

06-25-2021 11:43:48 AM

View Replies

Latest Reply

Chalki
New Contributor III

08-14-2023 1:10:17 PM

0 kudos

I am facing the same issues . I am writing in batches using a simple for loop. I don't have any collect statements inside the loop. I am rewriting the partitions with partition overwrite dynamic mode in a huge wide delta table - several tb. The incr...

0 kudos

08-14-2023 1:10:17 PM

1 More Replies

by Smitha1 • Valued Contributor II

11-30-2022 6:03:05 AM

9445 Views
10 replies
9 kudos

Resolved! Request for reattempt voucher. Databricks Certified Associate Developer for Apache Spark 3.0 exam

Hi,I gave Databricks Certified Associate Developer for Apache Spark 3.0 exam today but missed by one percent. I got 68.33% and pass is 70%.I am planning to reattempt the exam, could you kindly give me another opportunity and provide reattempt voucher...

Data Engineering

9445 Views
10 replies
9 kudos

11-30-2022 6:03:05 AM

View Replies

Latest Reply

shriya
New Contributor II

08-13-2023 11:05:10 PM

9 kudos

Hi,I gave Databricks Certified Associate Developer for Apache Spark 3.0 Python exam yesterday but missed by three percent. I got 66.66% and pass is 70%.I am planning to reattempt the exam, could you kindly give me another opportunity and provide reat...

9 kudos

08-13-2023 11:05:10 PM

9 More Replies

by Sujitha • Databricks Employee

12-13-2022 10:38:47 AM

2919 Views
3 replies
2 kudos

KB Feedback Discussion In addition to the Databricks Community, we have a Support team that maintains a Knowledge Base (KB). The KB contains answers t...

KB Feedback DiscussionIn addition to the Databricks Community, we have a Support team that maintains a Knowledge Base (KB). The KB contains answers to common questions about Databricks, as well as information on optimisation and troubleshooting.These...

Data Engineering

2919 Views
3 replies
2 kudos

12-13-2022 10:38:47 AM

View Replies

Latest Reply

martinez
New Contributor III

07-17-2023 12:00:38 AM

2 kudos

Thanks for sharing!

2 kudos

07-17-2023 12:00:38 AM

2 More Replies

by Vsleg • Contributor

04-18-2023 5:27:44 AM

6774 Views
5 replies
3 kudos

Resolved! Issue with Apache Spark™ Programming with Databricks course

Hello,I found an issue with the Apache Spark™ Programming with Databricks courses on Databricks Academy when trying to do the labs. The mount that the courses use for training data is failing with what looks to me like an authentication issue (see sc...

Data Engineering

6774 Views
5 replies
3 kudos

04-18-2023 5:27:44 AM

View Replies

Latest Reply

Vsleg
Contributor

04-18-2023 5:44:42 AM

3 kudos

I found the course Git Repo at (https://github.com/databricks-academy/apache-spark-programming-with-databricks-english), this works so using that instead of the 'apache-spark-programming-with-databricks.dbc' file available in the learning portal. #DA...

3 kudos

04-18-2023 5:44:42 AM

4 More Replies

by JKR • Contributor

06-23-2023 4:29:39 PM

1220 Views
0 replies
1 kudos

Missed Associate Developer for Apache Spark 3.0 - Python Due to Power outage

Dear Databricks Certification Team,Unfortunately, I was unable to take the exam as scheduled due to an unforeseen power breakdown in my area. The power outage occurred just before the exam, rendering me unable to access the necessary resources to com...

Data Engineering

1220 Views
0 replies
1 kudos

06-23-2023 4:29:39 PM