by
JKR
• New Contributor III
- 1418 Views
- 2 replies
- 0 kudos
Getting below error Context: Using Databricks shared interactive cluster for scheduled run multiple parallel jobs at the same time after every 5 mins. When I check Ganglia, driver node's memory reaches almost max and then restart of driver happens an...
- 1418 Views
- 2 replies
- 0 kudos
Latest Reply
please check the driver's logs, for example the log4j and the GC logs
1 More Replies
by
Soma
• Valued Contributor
- 1841 Views
- 7 replies
- 0 kudos
Databricks Workflow cost on running in interactive cluster
- 1841 Views
- 7 replies
- 0 kudos
- 956 Views
- 1 replies
- 1 kudos
using Jobs API, when we create a new job to run on an interactive cluster, can we add spark_conf tag and specify spark config tuning parameters?
- 956 Views
- 1 replies
- 1 kudos
Latest Reply
spark_conf needs to be set prior to the start of the cluster or have to restart the existing cluster. Hence, the spark_conf tag is available only on the job_cluster. you may have to set the configs manually on the interactive cluster prior to using ...
- 5419 Views
- 7 replies
- 4 kudos
I set up a workflow using 2 tasks. Just for demo purposes, I'm using an interactive cluster for running the workflow. {
"task_key": "prepare",
"spark_python_task": {
"python_file": "file...
- 5419 Views
- 7 replies
- 4 kudos
Latest Reply
Hi @Fran Pérez​,Just a friendly follow-up. Did any of the responses help you to resolve your question? if it did, please mark it as best. Otherwise, please let us know if you still need help.
6 More Replies
- 1046 Views
- 2 replies
- 0 kudos
Hi All,I am new to databricks need some understanding for my requirement .our requirement:a: we have zip file in azure blob storage and we are bringing that file to dbfs and unzip that file and executing our transformations in multiple steps (3 steps...
- 1046 Views
- 2 replies
- 0 kudos
Latest Reply
Hi @praveen rajak​ Does @Debayan Mukherjee​ response answer your question? If yes, would you be happy to mark it as best so that other members can find the solution more quickly?We'd love to hear from you.Thanks!
1 More Replies
- 1673 Views
- 2 replies
- 1 kudos
I have a azure databricks job and it's triggered via ADF using a API call. I want see why the job has been taking n minutes to complete the tasks. When the job execution results, The job execution time says 15 mins and the individual cells/commands d...
- 1673 Views
- 2 replies
- 1 kudos
Latest Reply
Hey there @DineshKumar​ Does @Prabakar Ammeappin​'s response answer your question? If yes, would you be happy to mark it as best so that other members can find the solution more quickly? Else please let us know if you need more help. Cheers!
1 More Replies
by
Alix
• New Contributor III
- 7309 Views
- 9 replies
- 3 kudos
Hello,I've been trying to submit a job to a transient cluster, but it is failing with this error :Caused by: org.apache.spark.SparkException: Job aborted due to stage failure: Task 0 in stage 0.0 failed 4 times, most recent failure: Lost task 0.3 in ...
- 7309 Views
- 9 replies
- 3 kudos
Latest Reply
Hi @Alix Métivier​ , Just a friendly follow-up. Do you still need help, or @Shanmugavel Chandrakasu​ 's response help you to find the solution? Please let us know.
8 More Replies
- 1615 Views
- 5 replies
- 2 kudos
When running a jar-based job, I've noticed that the 1st run always takes the extra time to complete the job and consecutive runs take less time to finish the job. This behavior is reproducible on an interactive cluster. What's causing this? Is this e...
- 1615 Views
- 5 replies
- 2 kudos
Latest Reply
@Sandeep Katta​ , this is a fat jar that does read-transform-write. @DD Sharma​ response matches @Werner Stinckens​ & I intuition that there was efficiency on the second job due to jar already being loaded. I would not have noticed this had job run...
4 More Replies