cancel
Showing results for 
Search instead for 
Did you mean: 
Data Engineering
Join discussions on data engineering best practices, architectures, and optimization strategies within the Databricks Community. Exchange insights and solutions with fellow data engineers.
cancel
Showing results for 
Search instead for 
Did you mean: 

Forum Posts

Nis
by New Contributor II
  • 2474 Views
  • 2 replies
  • 2 kudos

Best sequence of using Vacuum, optimize, fsck repair and refresh commands.

I have a delta table whose size will increases gradually now we have around 1.5 crores of rows while running vacuum command on that table i am getting the below error.ERROR: Job aborted due to stage failure: Task 7 in stage 491.0 failed 4 times, most...

  • 2474 Views
  • 2 replies
  • 2 kudos
Latest Reply
alex307
New Contributor II
  • 2 kudos

In my opinion Best order: Optimize → Vacuum → FSCK Repair → Refresh.Your error is likely a timeout — try more cluster resources or a longer retention period.

  • 2 kudos
1 More Replies
alhuelamo
by New Contributor II
  • 10483 Views
  • 5 replies
  • 1 kudos

Getting non-traceable NullPointerExceptions

We're running a job that's issuing NullPointerException without traces of our job's code.Does anybody know what would be the best course of action when it comes to debugging these issues?The job is a Scala job running on DBR 11.3 LTS.In case it's rel...

  • 10483 Views
  • 5 replies
  • 1 kudos
Latest Reply
Amora
New Contributor II
  • 1 kudos

You could try enabling full stack traces and checking the Spark executor logs for hidden errors. Null Pointer Exceptions in Scala on DBR often come from lazy evaluations or missing schema fields during I/O. Reviewing your Data Frame transformations a...

  • 1 kudos
4 More Replies
Anwar_Patel
by New Contributor III
  • 4323 Views
  • 5 replies
  • 0 kudos

Resolved! Not received my certificate after passing Databricks Certified Associate Developer for Apache Spark 3.0 - Python.

I've successfully passed Databricks Certified Associate Developer for Apache Spark 3.0 - Python but still have not received the certificate. E-mail : anwarpatel91@gmail.com

  • 4323 Views
  • 5 replies
  • 0 kudos
Latest Reply
simha6_reddy
New Contributor II
  • 0 kudos

Even i am facing the same issue. I have successfully passed Databricks Certified Associate Developer for Apache Spark - Python but still have not received the certificate. E-mail : simha6.reddy@gmail.com

  • 0 kudos
4 More Replies
RS1
by New Contributor III
  • 1121 Views
  • 1 replies
  • 1 kudos

I attended the Advanced Machine Learning with Databricks training last week virtually I am still unable to get the day 2 session videos of any of the...

I attended the Advanced Machine Learning with Databricks training last week virtually I am still unable to get the day 2 session videos of any of the Instructor led Paid Trainings. They are supposed to be available for replay with in 24 hours but I ...

  • 1121 Views
  • 1 replies
  • 1 kudos
Latest Reply
murali9
New Contributor II
  • 1 kudos

I have the same problem.

  • 1 kudos
FarBo
by New Contributor III
  • 10884 Views
  • 5 replies
  • 5 kudos

Spark issue handling data from json when the schema DataType mismatch occurs

Hi,I have encountered a problem using spark, when creating a dataframe from a raw json source.I have defined an schema for my data and the problem is that when there is a mismatch between one of the column values and its defined schema, spark not onl...

  • 10884 Views
  • 5 replies
  • 5 kudos
Latest Reply
Anonymous
Not applicable
  • 5 kudos

@Farzad Bonabi​ :Thank you for reporting this issue. It seems to be a known bug in Spark when dealing with malformed decimal values. When a decimal value in the input JSON data is not parseable by Spark, it sets not only that column to null but also ...

  • 5 kudos
4 More Replies
brickster_2018
by Databricks Employee
  • 7214 Views
  • 4 replies
  • 2 kudos

Resolved! Databricks Spark Vs Spark on Yarn

I am moving my Spark workloads from EMR/on-premise Spark cluster to Databricks. I understand Databricks Spark is different from Yarn. How is the Databricks architecture different from yarn?

  • 7214 Views
  • 4 replies
  • 2 kudos
Latest Reply
de-qrosh
New Contributor III
  • 2 kudos

What about the disadvantages?How can I separate multiple jobs running on the same cluster cleanly in the logs and same in the spark-ui?

  • 2 kudos
3 More Replies
DJey
by New Contributor III
  • 22169 Views
  • 6 replies
  • 2 kudos

Resolved! MergeSchema Not Working

Hi All, I have a scenario where my Exisiting Delta Table looks like below:Now I have an incremental data with an additional column i.e. owner:Dataframe Name --> scdDFBelow is the code snippet to merge Incremental Dataframe to targetTable, but the new...

image image image image
  • 22169 Views
  • 6 replies
  • 2 kudos
Latest Reply
Amin112
New Contributor II
  • 2 kudos

In Databricks Runtime 15.2 and above, you can specify schema evolution in a merge statement using SQL or Delta table APIs:MERGE WITH SCHEMA EVOLUTION INTO targetUSING sourceON source.key = target.keyWHEN MATCHED THENUPDATE SET *WHEN NOT MATCHED THENI...

  • 2 kudos
5 More Replies
Anonymous
by Not applicable
  • 4060 Views
  • 1 replies
  • 2 kudos

6.4 Extended Support (includes Apache Spark 2.4.5, Scala 2.11 Connect Timeout

"Notebook detached Exception when creating execution context: java.net.SocketTimeout Exception: Connect Timeout" when trying to connect my cluster to a notebook. Then "Error trying to handle that request We failed to handle that request, please try a...

  • 4060 Views
  • 1 replies
  • 2 kudos
Latest Reply
Wolverine
New Contributor III
  • 2 kudos

Hello @Retired_mod  I am facing same issue I tried changing DBR but it is still giving me error and the cluster is not startingRegardsMS

  • 2 kudos
brickster_2018
by Databricks Employee
  • 15216 Views
  • 3 replies
  • 6 kudos

Resolved! How to add I custom logging in Databricks

I want to add custom logs that redirect in the Spark driver logs. Can I use the existing logger classes to have my application logs or progress message in the Spark driver logs.

  • 15216 Views
  • 3 replies
  • 6 kudos
Latest Reply
Kaizen
Valued Contributor
  • 6 kudos

1) Is it possible to save all the custom logging to its own file? Currently it is being logging with all other cluster logs (see image) 2) Also Databricks it seems like a lot of blank files are also being created for this. Is this a bug? this include...

  • 6 kudos
2 More Replies
Smitha1
by Valued Contributor II
  • 5690 Views
  • 9 replies
  • 3 kudos

Databricks Certified Associate Developer for Apache Spark 3.0

Databricks Certified Associate Developer for Apache Spark 3.0

  • 5690 Views
  • 9 replies
  • 3 kudos
Latest Reply
Shivam_Patil
New Contributor II
  • 3 kudos

Hey I am looking for sample papers for the above exam other than the one provided by databricks do any one have any idea about it

  • 3 kudos
8 More Replies
brickster_2018
by Databricks Employee
  • 4118 Views
  • 2 replies
  • 0 kudos

Resolved! The driver is temporarily unavailable

My job fails with Driver is temporarily unavailable. Apparently, it's permanently unavailable, because the job is not pausing but failing.

  • 4118 Views
  • 2 replies
  • 0 kudos
Latest Reply
Chalki
New Contributor III
  • 0 kudos

I am facing the same issues .  I am writing in batches using a simple for loop. I don't have any collect statements inside the loop. I am rewriting the partitions with partition overwrite dynamic mode in a huge wide delta table - several tb. The incr...

  • 0 kudos
1 More Replies
Smitha1
by Valued Contributor II
  • 9445 Views
  • 10 replies
  • 9 kudos

Resolved! Request for reattempt voucher. Databricks Certified Associate Developer for Apache Spark 3.0 exam

Hi,I gave Databricks Certified Associate Developer for Apache Spark 3.0 exam today but missed by one percent. I got 68.33% and pass is 70%.I am planning to reattempt the exam, could you kindly give me another opportunity and provide reattempt voucher...

  • 9445 Views
  • 10 replies
  • 9 kudos
Latest Reply
shriya
New Contributor II
  • 9 kudos

Hi,I gave Databricks Certified Associate Developer for Apache Spark 3.0 Python exam yesterday but missed by three percent. I got 66.66% and pass is 70%.I am planning to reattempt the exam, could you kindly give me another opportunity and provide reat...

  • 9 kudos
9 More Replies
Sujitha
by Databricks Employee
  • 2919 Views
  • 3 replies
  • 2 kudos

KB Feedback Discussion In addition to the Databricks Community, we have a Support team that maintains a Knowledge Base (KB). The KB contains answers t...

KB Feedback DiscussionIn addition to the Databricks Community, we have a Support team that maintains a Knowledge Base (KB). The KB contains answers to common questions about Databricks, as well as information on optimisation and troubleshooting.These...

  • 2919 Views
  • 3 replies
  • 2 kudos
Latest Reply
martinez
New Contributor III
  • 2 kudos

Thanks for sharing!  

  • 2 kudos
2 More Replies
Vsleg
by Contributor
  • 6774 Views
  • 5 replies
  • 3 kudos

Resolved! Issue with Apache Sparkâ„¢ Programming with Databricks course

Hello,I found an issue with the Apache Sparkâ„¢ Programming with Databricks courses on Databricks Academy when trying to do the labs. The mount that the courses use for training data is failing with what looks to me like an authentication issue (see sc...

image
  • 6774 Views
  • 5 replies
  • 3 kudos
Latest Reply
Vsleg
Contributor
  • 3 kudos

I found the course Git Repo at (https://github.com/databricks-academy/apache-spark-programming-with-databricks-english), this works so using that instead of the 'apache-spark-programming-with-databricks.dbc' file available in the learning portal. #DA...

  • 3 kudos
4 More Replies
Labels