Data Engineering

Forum Posts

Sorted by:

by thib • New Contributor III

06-14-2022 10:52:04 AM

6100 Views
4 replies
2 kudos

Can we use multiple git repos for a job running multiple tasks?

I have a job running multiple tasks :Task 1 runs a machine learning pipeline from git repo 1Task 2 runs an ETL pipeline from git repo 1Task 2 is actually a generic pipeline and should not be checked in repo 1, and will be made available in another re...

Data Engineering

6100 Views
4 replies
2 kudos

06-14-2022 10:52:04 AM

View Replies

Latest Reply

tors_r_us
New Contributor II

02-03-2025 10:27:07 PM

2 kudos

Had this same problem. Fix was to have two workflows with no triggers, each pointing to the respective git repo. Then setup a 3rd workflow with appropriate triggers/schedule which calls the first 2 workflows. A workflow can run other workflows.

2 kudos

02-03-2025 10:27:07 PM

3 More Replies

by Diego_MSFT • New Contributor II

08-05-2022 6:02:47 PM

5962 Views
1 replies
4 kudos

Automating the re run of job (with several Tasks) // automate the notification of a failed specific tasks after re trying // Error handling on azure data factory pipeline with DataBricks notebook

Hi DataBricks Experts:I'm using Databricks on Azure.... I'd like to understand the following:1) if there is way of automating the re run some specific failed tasks from a job (with several Tasks), for example if I have 4 tasks, and the task 1 and 2 h...

Data Engineering

5962 Views
1 replies
4 kudos

08-05-2022 6:02:47 PM

View Replies

Latest Reply

Lindberg
New Contributor II

04-20-2023 11:55:30 AM

4 kudos

You can use "retries".In Workflow, select your job, the task, and in the options below, configure retries.If so, you can also see more options at:https://learn.microsoft.com/pt-br/azure/databricks/dev-tools/api/2.0/jobs?source=recommendations

4 kudos

04-20-2023 11:55:30 AM

by yzhang • New Contributor III

03-27-2023 4:04:49 PM

3076 Views
5 replies
0 kudos

Cannot find such info if Databricks supports nested jobs or tasks. For example, I have a 'job_a', which contains list of tasks, and another &#...

Cannot find such info if Databricks supports nested jobs or tasks. For example, I have a 'job_a', which contains list of tasks, and another 'job_b', also contains a list of tasks. Now I'd like to have a 'job_all' that will run both 'job_a' and 'job_b...

Data Engineering

3076 Views
5 replies
0 kudos

03-27-2023 4:04:49 PM

View Replies

Latest Reply

Anonymous
Not applicable

03-27-2023 9:08:45 PM

0 kudos

Hi @Yanan Zhang Thank you for posting your question in our community! We are happy to assist you.To help us provide you with the most accurate information, could you please take a moment to review the response and select the one that best answers yo...

0 kudos

03-27-2023 9:08:45 PM

4 More Replies

by FranPérez • New Contributor III

08-01-2022 12:37:10 AM

11742 Views
7 replies
4 kudos

set PYTHONPATH when executing workflows

I set up a workflow using 2 tasks. Just for demo purposes, I'm using an interactive cluster for running the workflow. { "task_key": "prepare", "spark_python_task": { "python_file": "file...

Data Engineering

11742 Views
7 replies
4 kudos

08-01-2022 12:37:10 AM

View Replies

Latest Reply

jose_gonzalez
Databricks Employee

08-30-2022 10:07:06 AM

4 kudos

Hi @Fran Pérez,Just a friendly follow-up. Did any of the responses help you to resolve your question? if it did, please mark it as best. Otherwise, please let us know if you still need help.

4 kudos

08-30-2022 10:07:06 AM

6 More Replies

by swetha • New Contributor III

09-08-2022 5:05:05 PM

3001 Views
3 replies
4 kudos

Resolved! Retrieving the job-id's of a notebook running inside tasks

I have created a job, Inside a job I have created tasks which are independent, I have used the concept of concurrent futures to exhibit parallelism and in each task there are couple of notebooks running(which are independent) Each notebook running ha...

Data Engineering

3001 Views
3 replies
4 kudos

09-08-2022 5:05:05 PM

View Replies

Latest Reply

Anonymous
Not applicable

09-23-2022 11:20:58 PM

4 kudos

Hi @swetha kadiyala Hope all is well! Just wanted to check in if you were able to resolve your issue and would you be happy to share the solution or mark an answer as best? Else please let us know if you need more help. We'd love to hear from you.Th...

4 kudos

09-23-2022 11:20:58 PM

2 More Replies

by shan_chandra • Databricks Employee

06-04-2022 12:11:17 PM

5507 Views
1 replies
1 kudos

Resolved! Insert query fails with error "The query is not executed because it tries to launch ***** tasks in a single stage, while maximum allowed tasks one query can launch is 100000;

Py4JJavaError: An error occurred while calling o236.sql. : org.apache.spark.SparkException: Job aborted. at org.apache.spark.sql.execution.datasources.FileFormatWriter$.write(FileFormatWriter.scala:201) at org.apache.spark.sql.execution.datasources.I...

Data Engineering

5507 Views
1 replies
1 kudos

06-04-2022 12:11:17 PM

View Replies

Latest Reply

shan_chandra
Databricks Employee

06-04-2022 12:21:57 PM

1 kudos

could you please increase the below config (at the cluster level) to a higher value or set it to zero spark.databricks.queryWatchdog.maxQueryTasks 0The spark config while it alleviates the issue.

1 kudos

06-04-2022 12:21:57 PM

by shreyag • New Contributor II

02-16-2022 6:48:03 AM

2390 Views
2 replies
0 kudos

scheduling tasks through CLI

Is there a way to schedule tasks or jobs through the Databricks CLI instead of the GUI? I want to be able to create a job flow with different notebook through the CLI.

Data Engineering

2390 Views
2 replies
0 kudos

02-16-2022 6:48:03 AM

View Replies

Latest Reply

Atanu
Databricks Employee

03-08-2022 5:53:13 AM

0 kudos

I agreed with @Kaniz Fatma https://docs.databricks.com/dev-tools/cli/jobs-cli.html?_ga=2.101966982.684786035.1646666830-480220406.1638459894 this is the job CLI we currently support @Shreya Gupta

0 kudos

03-08-2022 5:53:13 AM

1 More Replies

by guruv • New Contributor III

01-15-2022 1:18:38 AM

6027 Views
4 replies
1 kudos

Resolved! Saprk UI not showing any running tasks

HI,I am running a Notebook job calling a JAR code (application code implmented in C#). in the Spark UI page for almost 2 hrs, it'w not showing any tasks and even the CPU usage is below 20%, memory usage is very small. Before this 2 hr window it shows...

Data Engineering

6027 Views
4 replies
1 kudos

01-15-2022 1:18:38 AM

View Replies

Latest Reply

Atanu
Databricks Employee

01-25-2022 9:03:42 PM

1 kudos

If I understood the issue correctly .

1 kudos

01-25-2022 9:03:42 PM

3 More Replies

by Mohit_m • Valued Contributor II

11-08-2021 8:36:07 AM

1293 Views
1 replies
4 kudos

Enabling of Task Orchestration feature in Jobs via API as well Databricks supports the ability to orchestrate multiple tasks within a job. You must en...

Enabling of Task Orchestration feature in Jobs via API as wellDatabricks supports the ability to orchestrate multiple tasks within a job. You must enable this feature in the admin console. Once enabled, this feature cannot be disabled. To enable orch...

Data Engineering

1293 Views
1 replies
4 kudos

11-08-2021 8:36:07 AM

View Replies

Latest Reply

Prabakar
Databricks Employee

11-08-2021 8:40:54 AM

4 kudos

@Mohit Miglani this will be really helpful for those who prefer CLI / API more than the UI.

4 kudos

11-08-2021 8:40:54 AM

by Erik • Valued Contributor III

09-20-2021 5:46:56 AM

5694 Views
6 replies
2 kudos

Run more than nr-of-cores concurrent tasks.

We are using the terraform databricks provier, which is starting a cluster and checking every mount (since there is no mount rest API!). Each mount takes 20 seconds to check, and 99.9% of that time is idle waiting, and it starts a job per mount. If w...

Data Engineering

5694 Views
6 replies
2 kudos

09-20-2021 5:46:56 AM

View Replies

Latest Reply

jose_gonzalez
Databricks Employee

10-18-2021 3:00:33 PM

2 kudos

hi @Erik Parmann ,It is possible to do, but you might need to also enable dynamic allocation at the cluster level to be able to make sure your settings are apply at cluster creation . You can find more details here. As best practice, we do not recom...

2 kudos

10-18-2021 3:00:33 PM

5 More Replies

by saipujari_spark • Databricks Employee

09-22-2021 3:17:47 PM

8052 Views
1 replies
3 kudos

Resolved! How to restrict the number of tasks per executor?

In general, one task per core is how spark executes the tasks.If we want to restrict the number of tasks submitted to the executor to get more task to memory ratio, How can we achieve that?

Data Engineering

8052 Views
1 replies
3 kudos

09-22-2021 3:17:47 PM

View Replies

Latest Reply

saipujari_spark
Databricks Employee

09-22-2021 3:23:12 PM

3 kudos

We can use a config called "spark.task.cpus"This specifies the number of cores to allocate for each task.The default value is 1If we specify say 2, it means fewer tasks will be assigned to the executor.

3 kudos

09-22-2021 3:23:12 PM