cancel
Showing results for 
Search instead for 
Did you mean: 
Data Engineering
Join discussions on data engineering best practices, architectures, and optimization strategies within the Databricks Community. Exchange insights and solutions with fellow data engineers.
cancel
Showing results for 
Search instead for 
Did you mean: 

Forum Posts

nadia
by New Contributor II
  • 22421 Views
  • 4 replies
  • 2 kudos

Resolved! Executor heartbeat timed out

Hello, I'm trying to read a table that is located on Postgreqsl and contains 28 million rows. I have the following result:"SparkException: Job aborted due to stage failure: Task 0 in stage 0.0 failed 4 times, most recent failure: Lost task 0.3 in sta...

  • 22421 Views
  • 4 replies
  • 2 kudos
Latest Reply
SparkJun
Databricks Employee
  • 2 kudos

Please also review the Spark UI to see the failed Spark job and Spark stage. Please check on the GC time and data spill to memory and disk. See if there is any error in the failed task in the Spark stage view. This will confirm data skew or GC/memory...

  • 2 kudos
3 More Replies
Ancil
by Contributor II
  • 5642 Views
  • 8 replies
  • 6 kudos

Job aborted due to stage failure: Task 1863 in stage 10.0 failed 4 times, most recent failure: Lost task 1863.3 in stage 10.0 (TID 2021) (10.0.4.7 executor 2): org.apache.spark.SparkException: Python worker exited unexpectedly (crashed): Fatal Python erro

I am getting below error some time run my databricks notebook from ADF, If the executor node is one then it works fine, if it increases 2 or more some times its failing on same data.Cluster Detail : Standard_F4s_v2 · Workers: Standard_F4s_v2 · 1-8 wo...

  • 5642 Views
  • 8 replies
  • 6 kudos
Latest Reply
swethaNandan
Databricks Employee
  • 6 kudos

Hi @Ancil P A​ Can you give paste the complete stacktrace from the failed task (from failed stage 10.0) and the code snippet that you are trying to run in the notebook . Also, do you think you can raise a databricks support ticket for the same?

  • 6 kudos
7 More Replies
robert37201
by New Contributor II
  • 1803 Views
  • 3 replies
  • 4 kudos

Job aborted due to stage failure: Input buffer size 0 for bloom filter is not power of 2

Query works great in a notebook, fails in Classic SQL Warehouse (photon enabled) with that error. Tables are relatively small. Just don't know where to begin understanding that error, google wasn't much help and Query History doesn't give me anything...

  • 1803 Views
  • 3 replies
  • 4 kudos
Latest Reply
Anonymous
Not applicable
  • 4 kudos

Hi @Robert McCartney​ We haven't heard from you since the last response from @Lakshay Goel​ â€‹, and I was checking back to see if his suggestions helped you.Or else, If you have any solution, please share it with the community, as it can be helpful to...

  • 4 kudos
2 More Replies
manasa
by Contributor
  • 6860 Views
  • 4 replies
  • 2 kudos

org.apache.spark.SparkException: Job aborted due to stage failure: Authorized committer failed while pushing dataframe to azure cosmos db.

I am writing data to the azure cosmos db using OLTP connector using below codecfg["spark.cosmos.write.strategy"]="ItemOverwrite" json_df.write.format("cosmos.oltp").options(**cfg).mode("APPEND").save()I am getting below error Please let me know i...

image.png image.png
  • 6860 Views
  • 4 replies
  • 2 kudos
Latest Reply
Anonymous
Not applicable
  • 2 kudos

Hi @Manasa Kalluri​ Thank you for your question! To assist you better, please take a moment to review the answer and let me know if it best fits your needs.Please help us select the best solution by clicking on "Select As Best" if it does.Your feedba...

  • 2 kudos
3 More Replies
Manjusha
by New Contributor II
  • 2127 Views
  • 1 replies
  • 1 kudos

SocketTimeout exception when running a display command on spark dataframe

I am using runtime 9.1LTSI have a R notebook that reads a csv into a R dataframe and does some transformations and finally is converted to spark dataframe using the createDataFrame function.after that when I call the display function on this spark da...

  • 2127 Views
  • 1 replies
  • 1 kudos
Latest Reply
Anonymous
Not applicable
  • 1 kudos

Hi @Manjusha Unnikrishnan​ Great to meet you, and thanks for your question! Let's see if your peers in the community have an answer to your question first. Or else bricksters will get back to you soon. Thanks.

  • 1 kudos
pjp94
by Contributor
  • 2726 Views
  • 1 replies
  • 0 kudos

ERROR - Remote RPC client disassociated. Likely due to containers exceeding thresholds, or network issues. Check driver logs for WARN messages.

I get the below error when trying to run multi-threading - fails towards the end of the run. My guess is it's related to memory/worker config. I've seen some solutions involving modifying the number of workers or CPU on the cluster - however that's n...

  • 2726 Views
  • 1 replies
  • 0 kudos
Latest Reply
pjp94
Contributor
  • 0 kudos

Since I don't have permissions to change cluster configurations, the only solution that ended up working was setting a max thread count to about half of the actual max so I don't overload the containers. However, open to any other optimization ideas!

  • 0 kudos
Tahseen0354
by Valued Contributor
  • 22019 Views
  • 8 replies
  • 5 kudos

Resolved! Getting "Job aborted due to stage failure" SparkException when trying to download full result

I have generated a result using SQL. But whenever I try to download the full result (1 million rows), it is throwing SparkException. I can download the preview result but not the full result. Why ? What happens under the hood when I try to download ...

  • 22019 Views
  • 8 replies
  • 5 kudos
Latest Reply
rpshgupta
New Contributor III
  • 5 kudos

I am also having this issue again and again. I really want to understand what can we do to avoid this?

  • 5 kudos
7 More Replies
Labels