cancel
Showing results for 
Search instead for 
Did you mean: 
Data Engineering
Join discussions on data engineering best practices, architectures, and optimization strategies within the Databricks Community. Exchange insights and solutions with fellow data engineers.
cancel
Showing results for 
Search instead for 
Did you mean: 

Forum Posts

JKR
by Contributor
  • 2543 Views
  • 2 replies
  • 0 kudos

The spark driver has stopped unexpectedly and is restarting. Your notebook will be automatically reattached.

Getting below error Context: Using Databricks shared interactive cluster for scheduled run multiple parallel jobs at the same time after every 5 mins. When I check Ganglia, driver node's memory reaches almost max and then restart of driver happens an...

  • 2543 Views
  • 2 replies
  • 0 kudos
Latest Reply
jose_gonzalez
Databricks Employee
  • 0 kudos

please check the driver's logs, for example the log4j and the GC logs

  • 0 kudos
1 More Replies
shan_chandra
by Databricks Employee
  • 2428 Views
  • 1 replies
  • 1 kudos

Resolved! Adding spark_conf tag on Jobs API

using Jobs API, when we create a new job to run on an interactive cluster, can we add spark_conf tag and specify spark config tuning parameters?

  • 2428 Views
  • 1 replies
  • 1 kudos
Latest Reply
shan_chandra
Databricks Employee
  • 1 kudos

 spark_conf needs to be set prior to the start of the cluster or have to restart the existing cluster. Hence, the spark_conf tag is available only on the job_cluster. you may have to set the configs manually on the interactive cluster prior to using ...

  • 1 kudos
FranPérez
by New Contributor III
  • 10125 Views
  • 7 replies
  • 4 kudos

set PYTHONPATH when executing workflows

I set up a workflow using 2 tasks. Just for demo purposes, I'm using an interactive cluster for running the workflow. { "task_key": "prepare", "spark_python_task": { "python_file": "file...

  • 10125 Views
  • 7 replies
  • 4 kudos
Latest Reply
jose_gonzalez
Databricks Employee
  • 4 kudos

Hi @Fran Pérez​,Just a friendly follow-up. Did any of the responses help you to resolve your question? if it did, please mark it as best. Otherwise, please let us know if you still need help.

  • 4 kudos
6 More Replies
Praveen2609
by New Contributor
  • 2046 Views
  • 2 replies
  • 0 kudos

dbfs access for job clusters and interactive cluster

Hi All,I am new to databricks need some understanding for my requirement .our requirement:a: we have zip file in azure blob storage and we are bringing that file to dbfs and unzip that file and executing our transformations in multiple steps (3 steps...

  • 2046 Views
  • 2 replies
  • 0 kudos
Latest Reply
Anonymous
Not applicable
  • 0 kudos

Hi @praveen rajak​ Does @Debayan Mukherjee​  response answer your question? If yes, would you be happy to mark it as best so that other members can find the solution more quickly?We'd love to hear from you.Thanks!

  • 0 kudos
1 More Replies
Dineshkumar_Raj
by New Contributor
  • 2579 Views
  • 2 replies
  • 1 kudos

why the job running time and command execution time not matching in databricks

I have a azure databricks job and it's triggered via ADF using a API call. I want see why the job has been taking n minutes to complete the tasks. When the job execution results, The job execution time says 15 mins and the individual cells/commands d...

  • 2579 Views
  • 2 replies
  • 1 kudos
Latest Reply
Anonymous
Not applicable
  • 1 kudos

Hey there @DineshKumar​ Does @Prabakar Ammeappin​'s response answer your question? If yes, would you be happy to mark it as best so that other members can find the solution more quickly? Else please let us know if you need more help. Cheers!

  • 1 kudos
1 More Replies
Alix
by New Contributor III
  • 10467 Views
  • 8 replies
  • 3 kudos

Resolved! Remote RPC client disassociated error

Hello,I've been trying to submit a job to a transient cluster, but it is failing with this error :Caused by: org.apache.spark.SparkException: Job aborted due to stage failure: Task 0 in stage 0.0 failed 4 times, most recent failure: Lost task 0.3 in ...

  • 10467 Views
  • 8 replies
  • 3 kudos
Latest Reply
shan_chandra
Databricks Employee
  • 3 kudos

@Alix Métivier​  - The error is thrown from the user code (please investigate the jar file attached to the cluster). at m80.dbruniv_0_1.dbruniv.tFixedFlowInput_1Process(dbruniv.java:941)at m80.dbruniv_0_1.dbruniv.run(dbruniv.java:1654)at m80.dbruniv_...

  • 3 kudos
7 More Replies
User16783852686
by Databricks Employee
  • 3295 Views
  • 4 replies
  • 2 kudos

Resolved! Slow first time run, jar based jobs

When running a jar-based job, I've noticed that the 1st run always takes the extra time to complete the job and consecutive runs take less time to finish the job. This behavior is reproducible on an interactive cluster. What's causing this? Is this e...

  • 3295 Views
  • 4 replies
  • 2 kudos
Latest Reply
User16783852686
Databricks Employee
  • 2 kudos

@Sandeep Katta​ , this is a fat jar that does read-transform-write. @DD Sharma​  response matches @Werner Stinckens​  & I intuition that there was efficiency on the second job due to jar already being loaded. I would not have noticed this had job run...

  • 2 kudos
3 More Replies
User15813097110
by New Contributor III
  • 1740 Views
  • 1 replies
  • 0 kudos
  • 1740 Views
  • 1 replies
  • 0 kudos
Latest Reply
User15813097110
New Contributor III
  • 0 kudos

Since the SparkContext is already up and running, it requires a restart. Technically, it might be possible to kill the JVM process and restart it but we do not recommend that approach. In this case, we recommend restarting the cluster so that the Sp...

  • 0 kudos
Labels