cancel
Showing results for 
Search instead for 
Did you mean: 
Data Engineering
cancel
Showing results for 
Search instead for 
Did you mean: 

Forum Posts

JKR
by New Contributor III
  • 1418 Views
  • 2 replies
  • 0 kudos

The spark driver has stopped unexpectedly and is restarting. Your notebook will be automatically reattached.

Getting below error Context: Using Databricks shared interactive cluster for scheduled run multiple parallel jobs at the same time after every 5 mins. When I check Ganglia, driver node's memory reaches almost max and then restart of driver happens an...

  • 1418 Views
  • 2 replies
  • 0 kudos
Latest Reply
jose_gonzalez
Moderator
  • 0 kudos

please check the driver's logs, for example the log4j and the GC logs

  • 0 kudos
1 More Replies
shan_chandra
by Honored Contributor III
  • 956 Views
  • 1 replies
  • 1 kudos

Resolved! Adding spark_conf tag on Jobs API

using Jobs API, when we create a new job to run on an interactive cluster, can we add spark_conf tag and specify spark config tuning parameters?

  • 956 Views
  • 1 replies
  • 1 kudos
Latest Reply
shan_chandra
Honored Contributor III
  • 1 kudos

 spark_conf needs to be set prior to the start of the cluster or have to restart the existing cluster. Hence, the spark_conf tag is available only on the job_cluster. you may have to set the configs manually on the interactive cluster prior to using ...

  • 1 kudos
FranPérez
by New Contributor III
  • 5419 Views
  • 7 replies
  • 4 kudos

set PYTHONPATH when executing workflows

I set up a workflow using 2 tasks. Just for demo purposes, I'm using an interactive cluster for running the workflow. { "task_key": "prepare", "spark_python_task": { "python_file": "file...

  • 5419 Views
  • 7 replies
  • 4 kudos
Latest Reply
jose_gonzalez
Moderator
  • 4 kudos

Hi @Fran Pérez​,Just a friendly follow-up. Did any of the responses help you to resolve your question? if it did, please mark it as best. Otherwise, please let us know if you still need help.

  • 4 kudos
6 More Replies
Praveen2609
by New Contributor
  • 1046 Views
  • 2 replies
  • 0 kudos

dbfs access for job clusters and interactive cluster

Hi All,I am new to databricks need some understanding for my requirement .our requirement:a: we have zip file in azure blob storage and we are bringing that file to dbfs and unzip that file and executing our transformations in multiple steps (3 steps...

  • 1046 Views
  • 2 replies
  • 0 kudos
Latest Reply
Anonymous
Not applicable
  • 0 kudos

Hi @praveen rajak​ Does @Debayan Mukherjee​  response answer your question? If yes, would you be happy to mark it as best so that other members can find the solution more quickly?We'd love to hear from you.Thanks!

  • 0 kudos
1 More Replies
Dineshkumar_Raj
by New Contributor
  • 1673 Views
  • 2 replies
  • 1 kudos

why the job running time and command execution time not matching in databricks

I have a azure databricks job and it's triggered via ADF using a API call. I want see why the job has been taking n minutes to complete the tasks. When the job execution results, The job execution time says 15 mins and the individual cells/commands d...

  • 1673 Views
  • 2 replies
  • 1 kudos
Latest Reply
Anonymous
Not applicable
  • 1 kudos

Hey there @DineshKumar​ Does @Prabakar Ammeappin​'s response answer your question? If yes, would you be happy to mark it as best so that other members can find the solution more quickly? Else please let us know if you need more help. Cheers!

  • 1 kudos
1 More Replies
Alix
by New Contributor III
  • 7309 Views
  • 9 replies
  • 3 kudos

Resolved! Remote RPC client disassociated error

Hello,I've been trying to submit a job to a transient cluster, but it is failing with this error :Caused by: org.apache.spark.SparkException: Job aborted due to stage failure: Task 0 in stage 0.0 failed 4 times, most recent failure: Lost task 0.3 in ...

  • 7309 Views
  • 9 replies
  • 3 kudos
Latest Reply
Kaniz
Community Manager
  • 3 kudos

Hi @Alix Métivier​ , Just a friendly follow-up. Do you still need help, or @Shanmugavel Chandrakasu​ 's response help you to find the solution? Please let us know.

  • 3 kudos
8 More Replies
User16783852686
by New Contributor II
  • 1615 Views
  • 5 replies
  • 2 kudos

Resolved! Slow first time run, jar based jobs

When running a jar-based job, I've noticed that the 1st run always takes the extra time to complete the job and consecutive runs take less time to finish the job. This behavior is reproducible on an interactive cluster. What's causing this? Is this e...

  • 1615 Views
  • 5 replies
  • 2 kudos
Latest Reply
User16783852686
New Contributor II
  • 2 kudos

@Sandeep Katta​ , this is a fat jar that does read-transform-write. @DD Sharma​  response matches @Werner Stinckens​  & I intuition that there was efficiency on the second job due to jar already being loaded. I would not have noticed this had job run...

  • 2 kudos
4 More Replies
User15813097110
by New Contributor III
  • 1127 Views
  • 1 replies
  • 0 kudos
  • 1127 Views
  • 1 replies
  • 0 kudos
Latest Reply
User15813097110
New Contributor III
  • 0 kudos

Since the SparkContext is already up and running, it requires a restart. Technically, it might be possible to kill the JVM process and restart it but we do not recommend that approach. In this case, we recommend restarting the cluster so that the Sp...

  • 0 kudos
Labels