cancel
Showing results for 
Search instead for 
Did you mean: 
Data Engineering
Join discussions on data engineering best practices, architectures, and optimization strategies within the Databricks Community. Exchange insights and solutions with fellow data engineers.
cancel
Showing results for 
Search instead for 
Did you mean: 

Forum Posts

thib
by New Contributor III
  • 5430 Views
  • 3 replies
  • 2 kudos

Can we use multiple git repos for a job running multiple tasks?

I have a job running multiple tasks :Task 1 runs a machine learning pipeline from git repo 1Task 2 runs an ETL pipeline from git repo 1Task 2 is actually a generic pipeline and should not be checked in repo 1, and will be made available in another re...

image
  • 5430 Views
  • 3 replies
  • 2 kudos
Latest Reply
trijit
New Contributor II
  • 2 kudos

The way to go about this would be to create Databricks repos in the workspace and then use that in the task formation. This way we can refer multiple repos in different tasks.

  • 2 kudos
2 More Replies
Diego_MSFT
by New Contributor II
  • 5364 Views
  • 1 replies
  • 4 kudos

Automating the re run of job (with several Tasks) // automate the notification of a failed specific tasks after re trying // Error handling on azure data factory pipeline with DataBricks notebook

Hi DataBricks Experts:I'm using Databricks on Azure.... I'd like to understand the following:1) if there is way of automating the re run some specific failed tasks from a job (with several Tasks), for example if I have 4 tasks, and the task 1 and 2 h...

  • 5364 Views
  • 1 replies
  • 4 kudos
Latest Reply
Lindberg
New Contributor II
  • 4 kudos

You can use "retries".In Workflow, select your job, the task, and in the options below, configure retries.If so, you can also see more options at:https://learn.microsoft.com/pt-br/azure/databricks/dev-tools/api/2.0/jobs?source=recommendations

  • 4 kudos
yzhang
by New Contributor III
  • 2893 Views
  • 5 replies
  • 0 kudos

Cannot find such info if Databricks supports nested jobs or tasks. For example, I have a 'job_a', which contains list of tasks, and another &#...

Cannot find such info if Databricks supports nested jobs or tasks. For example, I have a 'job_a', which contains list of tasks, and another 'job_b', also contains a list of tasks. Now I'd like to have a 'job_all' that will run both 'job_a' and 'job_b...

  • 2893 Views
  • 5 replies
  • 0 kudos
Latest Reply
Anonymous
Not applicable
  • 0 kudos

Hi @Yanan Zhang​ Thank you for posting your question in our community! We are happy to assist you.To help us provide you with the most accurate information, could you please take a moment to review the response and select the one that best answers yo...

  • 0 kudos
4 More Replies
FranPérez
by New Contributor III
  • 10664 Views
  • 7 replies
  • 4 kudos

set PYTHONPATH when executing workflows

I set up a workflow using 2 tasks. Just for demo purposes, I'm using an interactive cluster for running the workflow. { "task_key": "prepare", "spark_python_task": { "python_file": "file...

  • 10664 Views
  • 7 replies
  • 4 kudos
Latest Reply
jose_gonzalez
Databricks Employee
  • 4 kudos

Hi @Fran Pérez​,Just a friendly follow-up. Did any of the responses help you to resolve your question? if it did, please mark it as best. Otherwise, please let us know if you still need help.

  • 4 kudos
6 More Replies
swetha
by New Contributor III
  • 2773 Views
  • 3 replies
  • 4 kudos

Resolved! Retrieving the job-id's of a notebook running inside tasks

I have created a job, Inside a job I have created tasks which are independent, I have used the concept of concurrent futures to exhibit parallelism and in each task there are couple of notebooks running(which are independent) Each notebook running ha...

  • 2773 Views
  • 3 replies
  • 4 kudos
Latest Reply
Anonymous
Not applicable
  • 4 kudos

Hi @swetha kadiyala​ Hope all is well! Just wanted to check in if you were able to resolve your issue and would you be happy to share the solution or mark an answer as best? Else please let us know if you need more help. We'd love to hear from you.Th...

  • 4 kudos
2 More Replies
shan_chandra
by Databricks Employee
  • 5189 Views
  • 1 replies
  • 1 kudos

Resolved! Insert query fails with error "The query is not executed because it tries to launch ***** tasks in a single stage, while maximum allowed tasks one query can launch is 100000;

Py4JJavaError: An error occurred while calling o236.sql. : org.apache.spark.SparkException: Job aborted. at org.apache.spark.sql.execution.datasources.FileFormatWriter$.write(FileFormatWriter.scala:201) at org.apache.spark.sql.execution.datasources.I...

  • 5189 Views
  • 1 replies
  • 1 kudos
Latest Reply
shan_chandra
Databricks Employee
  • 1 kudos

could you please increase the below config (at the cluster level) to a higher value or set it to zero spark.databricks.queryWatchdog.maxQueryTasks 0The spark config while it alleviates the issue.

  • 1 kudos
shreyag
by New Contributor II
  • 2262 Views
  • 2 replies
  • 0 kudos

scheduling tasks through CLI

Is there a way to schedule tasks or jobs through the Databricks CLI instead of the GUI? I want to be able to create a job flow with different notebook through the CLI.

  • 2262 Views
  • 2 replies
  • 0 kudos
Latest Reply
Atanu
Databricks Employee
  • 0 kudos

I agreed with @Kaniz Fatma​  https://docs.databricks.com/dev-tools/cli/jobs-cli.html?_ga=2.101966982.684786035.1646666830-480220406.1638459894 this is the job CLI we currently support @Shreya Gupta​ 

  • 0 kudos
1 More Replies
guruv
by New Contributor III
  • 5679 Views
  • 4 replies
  • 1 kudos

Resolved! Saprk UI not showing any running tasks

HI,I am running a Notebook job calling a JAR code (application code implmented in C#). in the Spark UI page for almost 2 hrs, it'w not showing any tasks and even the CPU usage is below 20%, memory usage is very small. Before this 2 hr window it shows...

  • 5679 Views
  • 4 replies
  • 1 kudos
Latest Reply
Atanu
Databricks Employee
  • 1 kudos

If I understood the issue correctly .

  • 1 kudos
3 More Replies
Mohit_m
by Valued Contributor II
  • 1230 Views
  • 1 replies
  • 4 kudos

Enabling of Task Orchestration feature in Jobs via API as well Databricks supports the ability to orchestrate multiple tasks within a job. You must en...

Enabling of Task Orchestration feature in Jobs via API as wellDatabricks supports the ability to orchestrate multiple tasks within a job. You must enable this feature in the admin console. Once enabled, this feature cannot be disabled. To enable orch...

  • 1230 Views
  • 1 replies
  • 4 kudos
Latest Reply
Prabakar
Databricks Employee
  • 4 kudos

@Mohit Miglani​ this will be really helpful for those who prefer CLI / API more than the UI.

  • 4 kudos
Erik
by Valued Contributor III
  • 5189 Views
  • 6 replies
  • 2 kudos

Run more than nr-of-cores concurrent tasks.

We are using the terraform databricks provier, which is starting a cluster and checking every mount (since there is no mount rest API!). Each mount takes 20 seconds to check, and 99.9% of that time is idle waiting, and it starts a job per mount. If w...

  • 5189 Views
  • 6 replies
  • 2 kudos
Latest Reply
jose_gonzalez
Databricks Employee
  • 2 kudos

hi @Erik Parmann​ ,It is possible to do, but you might need to also enable dynamic allocation at the cluster level to be able to make sure your settings are apply at cluster creation . You can find more details here. As best practice, we do not recom...

  • 2 kudos
5 More Replies
saipujari_spark
by Databricks Employee
  • 7536 Views
  • 1 replies
  • 3 kudos

Resolved! How to restrict the number of tasks per executor?

In general, one task per core is how spark executes the tasks.If we want to restrict the number of tasks submitted to the executor to get more task to memory ratio, How can we achieve that?

  • 7536 Views
  • 1 replies
  • 3 kudos
Latest Reply
saipujari_spark
Databricks Employee
  • 3 kudos

We can use a config called "spark.task.cpus"This specifies the number of cores to allocate for each task.The default value is 1If we specify say 2, it means fewer tasks will be assigned to the executor.

  • 3 kudos
Labels