- 2772 Views
- 2 replies
- 0 kudos
Getting below error Context: Using Databricks shared interactive cluster for scheduled run multiple parallel jobs at the same time after every 5 mins. When I check Ganglia, driver node's memory reaches almost max and then restart of driver happens an...
- 2772 Views
- 2 replies
- 0 kudos
Latest Reply
please check the driver's logs, for example the log4j and the GC logs
1 More Replies
by
Soma
• Valued Contributor
- 7706 Views
- 7 replies
- 0 kudos
Databricks Workflow cost on running in interactive cluster
- 7706 Views
- 7 replies
- 0 kudos
- 3204 Views
- 1 replies
- 1 kudos
using Jobs API, when we create a new job to run on an interactive cluster, can we add spark_conf tag and specify spark config tuning parameters?
- 3204 Views
- 1 replies
- 1 kudos
Latest Reply
spark_conf needs to be set prior to the start of the cluster or have to restart the existing cluster. Hence, the spark_conf tag is available only on the job_cluster. you may have to set the configs manually on the interactive cluster prior to using ...
- 11714 Views
- 7 replies
- 4 kudos
I set up a workflow using 2 tasks. Just for demo purposes, I'm using an interactive cluster for running the workflow. {
"task_key": "prepare",
"spark_python_task": {
"python_file": "file...
- 11714 Views
- 7 replies
- 4 kudos
Latest Reply
Hi @Fran Pérez,Just a friendly follow-up. Did any of the responses help you to resolve your question? if it did, please mark it as best. Otherwise, please let us know if you still need help.
6 More Replies
- 2380 Views
- 2 replies
- 0 kudos
Hi All,I am new to databricks need some understanding for my requirement .our requirement:a: we have zip file in azure blob storage and we are bringing that file to dbfs and unzip that file and executing our transformations in multiple steps (3 steps...
- 2380 Views
- 2 replies
- 0 kudos
Latest Reply
Hi @praveen rajak Does @Debayan Mukherjee response answer your question? If yes, would you be happy to mark it as best so that other members can find the solution more quickly?We'd love to hear from you.Thanks!
1 More Replies
- 2845 Views
- 2 replies
- 1 kudos
I have a azure databricks job and it's triggered via ADF using a API call. I want see why the job has been taking n minutes to complete the tasks. When the job execution results, The job execution time says 15 mins and the individual cells/commands d...
- 2845 Views
- 2 replies
- 1 kudos
Latest Reply
Hey there @DineshKumar Does @Prabakar Ammeappin's response answer your question? If yes, would you be happy to mark it as best so that other members can find the solution more quickly? Else please let us know if you need more help. Cheers!
1 More Replies
by
Alix
• New Contributor III
- 11262 Views
- 8 replies
- 3 kudos
Hello,I've been trying to submit a job to a transient cluster, but it is failing with this error :Caused by: org.apache.spark.SparkException: Job aborted due to stage failure: Task 0 in stage 0.0 failed 4 times, most recent failure: Lost task 0.3 in ...
- 11262 Views
- 8 replies
- 3 kudos
Latest Reply
@Alix Métivier - The error is thrown from the user code (please investigate the jar file attached to the cluster). at m80.dbruniv_0_1.dbruniv.tFixedFlowInput_1Process(dbruniv.java:941)at m80.dbruniv_0_1.dbruniv.run(dbruniv.java:1654)at m80.dbruniv_...
7 More Replies
- 3580 Views
- 4 replies
- 2 kudos
When running a jar-based job, I've noticed that the 1st run always takes the extra time to complete the job and consecutive runs take less time to finish the job. This behavior is reproducible on an interactive cluster. What's causing this? Is this e...
- 3580 Views
- 4 replies
- 2 kudos
Latest Reply
@Sandeep Katta , this is a fat jar that does read-transform-write. @DD Sharma response matches @Werner Stinckens & I intuition that there was efficiency on the second job due to jar already being loaded. I would not have noticed this had job run...
3 More Replies