Data Engineering

Forum Posts

Sorted by:

by cmilligan • Contributor II

11-11-2022 9:09:24 AM

3470 Views
4 replies
3 kudos

Dropdown for parameters in a job

I want to be able to denote the type of run from a predetermined list of values that a user can choose from when kicking off a run using different parameters. Our team does standardized job runs on a weekly cadence but can have timeframes that change...

Data Engineering

3470 Views
4 replies
3 kudos

11-11-2022 9:09:24 AM

View Replies

Latest Reply

Leon_K
New Contributor II

a week ago

3 kudos

I'm looking to this too. Wonder if there a way to make as a drop down for a job parameter

3 kudos

a week ago

3 More Replies

by pgruetter • Contributor

04-18-2023 11:05:44 PM

6924 Views
7 replies
2 kudos

Run Task as Service Principal with Code in Azure DevOps Repo

Hi allI have a task of type Notebook, source is Git (Azure DevOps). This task runs fine with my user, but if I change the Owner to a service principal, I get the following error:Run result unavailable: run failed with error message Failed to checkout...

Data Engineering

6924 Views
7 replies
2 kudos

04-18-2023 11:05:44 PM

View Replies

Latest Reply

Anonymous
Not applicable

04-24-2023 8:43:16 PM

2 kudos

@pgruetter :To enable a service principal to access a specific Azure DevOps repository, you need to grant it the necessary permissions at both the organization and repository levels.Here are the steps to grant the service principal the necessary per...

2 kudos

04-24-2023 8:43:16 PM

6 More Replies

by Diego_MSFT • New Contributor II

08-05-2022 6:02:47 PM

4712 Views
1 replies
4 kudos

Automating the re run of job (with several Tasks) // automate the notification of a failed specific tasks after re trying // Error handling on azure data factory pipeline with DataBricks notebook

Hi DataBricks Experts:I'm using Databricks on Azure.... I'd like to understand the following:1) if there is way of automating the re run some specific failed tasks from a job (with several Tasks), for example if I have 4 tasks, and the task 1 and 2 h...

Data Engineering

4712 Views
1 replies
4 kudos

08-05-2022 6:02:47 PM

View Replies

Latest Reply

Lindberg
New Contributor II

04-20-2023 11:55:30 AM

4 kudos

You can use "retries".In Workflow, select your job, the task, and in the options below, configure retries.If so, you can also see more options at:https://learn.microsoft.com/pt-br/azure/databricks/dev-tools/api/2.0/jobs?source=recommendations

4 kudos

04-20-2023 11:55:30 AM

by Michael_Papadop • New Contributor II

04-07-2023 6:35:05 AM

10906 Views
3 replies
0 kudos

How can I set the status of a databricks job as skipped via python?

I have a basic 2 task job. The 1st notebook (task) checks whether the source file has changes and if so then refreshes a corresponding materialized view. In case we have no changes then I use dbutils.jobs.taskValues.set(key = "skip_job", value = 1) &...

Data Engineering

10906 Views
3 replies
0 kudos

04-07-2023 6:35:05 AM

View Replies

Latest Reply

karthik_p
Esteemed Contributor

04-07-2023 1:32:12 PM

0 kudos

@Michael Papadopoulos usually that should not be the case i think, as for task level we have 3 level notifications ( success, failure,start), where as whole job level skip option is available to discard notification . will see if some one from commu...

0 kudos

04-07-2023 1:32:12 PM

2 More Replies

by mmenjivar • New Contributor II

01-23-2023 2:45:17 PM

2216 Views
2 replies
0 kudos

How to get the run_id from a previous task in a Databricks jobs

Hi, is there any way to share the run_id from a task_A to a task_B within the same job when task_A is a dbt task?

Data Engineering

2216 Views
2 replies
0 kudos

01-23-2023 2:45:17 PM

View Replies

Latest Reply

Debayan
Databricks Employee

01-23-2023 10:49:54 PM

0 kudos

Hi, You can pass {job_id}} and {{run_id}} in Job arguments and print that information and save into wherever it is neededplease find below the documentation for the same:https://docs.databricks.com/data-engineering/jobs/jobs.html#task-parameter-varia...

0 kudos

01-23-2023 10:49:54 PM

1 More Replies

by Chanu • New Contributor II

01-11-2023 2:15:40 AM

1845 Views
2 replies
2 kudos

Databricks JAR task type functionality

Hi, I would like to understand Databricks JAR based workflow tasks. Can I interpret JAR based runs to be something like a spark-submit on a cluster? In the logs, I was expecting to see the spark-submit --class com.xyz --num-executors 4 etc., And, the...

Data Engineering

1845 Views
2 replies
2 kudos

01-11-2023 2:15:40 AM

View Replies

Latest Reply

Chanu
New Contributor II

01-12-2023 4:25:52 AM

2 kudos

Hi, I did try using the Workflows>Jobs>CreateTask>JarTaskType>UploadedMyJAR and Class and created JobCluster and tested this task. This JAR reads some tables as input, does some transformations and output as writing some other tables. I would like t...

2 kudos

01-12-2023 4:25:52 AM

1 More Replies

by Choolanadu • New Contributor

12-17-2022 11:16:54 AM

3272 Views
1 replies
0 kudos

Airflow - How to pull XComs value in the notebook task?

Using AIrflow, I have created a DAG with a sequence of notebook tasks. The first notebook returns a batch id; the subsequent notebook tasks need this batch_id.I am using the DatabricksSubmitRunOperator to run the notebook task. This operator pushes ...

Data Engineering

3272 Views
1 replies
0 kudos

12-17-2022 11:16:54 AM

View Replies

Latest Reply

daniel_sahal
Esteemed Contributor

12-22-2022 5:32:04 AM

0 kudos

From what I understand - you want to pass a run_id parameter to the second notebook task?You can: Create a widget param inside your databricks notebook (https://docs.databricks.com/notebooks/widgets.html) that will consume your run_idPass the paramet...

0 kudos

12-22-2022 5:32:04 AM

by rammy • Contributor III

12-15-2022 9:29:42 AM

8547 Views
5 replies
5 kudos

How I could read the Job id, run id and parameters in python cell?

I have tried following ways to get job parameters but none of the things are working.runId='{{run_id}}' jobId='{{job_id}}' filepath='{{filepath}}' print(runId," ",jobId," ",filepath) r1=dbutils.widgets.get('{{run_id}}') f1=dbutils.widgets.get('{{file...

Data Engineering

8547 Views
5 replies
5 kudos

12-15-2022 9:29:42 AM

View Replies

Latest Reply

rammy
Contributor III

12-17-2022 9:17:32 AM

5 kudos

Thanks for your response. I found the solution. The below code gives me all the job parametersall_args = dbutils.notebook.entry_point.getCurrentBindings()print(all_args)Thanks for your support

5 kudos

12-17-2022 9:17:32 AM

4 More Replies

by arthur_wang • New Contributor

08-25-2021 9:00:36 AM

3691 Views
2 replies
1 kudos

How does Task Orchestration compare to Airflow (for Databricks-only jobs)?

One of my clients has been orchestration Databricks notebooks using Airflow + REST API. They're curious about the pros/cons of switching these jobs to Databricks jobs with Task Orchestration.I know there are all sorts of considerations - for example,...

Data Engineering

3691 Views
2 replies
1 kudos

08-25-2021 9:00:36 AM

View Replies

Latest Reply

Shourya
New Contributor III

09-06-2022 7:34:33 AM

1 kudos

@Kaniz Fatma Hello Kaniz, I'm currently working with a major Enterprise Client looking to make the choice between the Airflow vs Databricks for Jobs scheduling. Our Entire code base is in Databricks and we are trying to figure out the complexities t...

1 kudos

09-06-2022 7:34:33 AM

1 More Replies

by Robbie • New Contributor III

06-09-2022 9:38:58 AM

3069 Views
2 replies
4 kudos

Resolved! Why can't I create new jobs? ("You are not entitled to run this type of task...")

This morning I encountered an issue when trying to create a new job using the Workflows UI (in browser). Never had this issue before.The error message that appears is:"You are not entitled to run this type of task, please contact your Databricks admi...

Data Engineering

3069 Views
2 replies
4 kudos

06-09-2022 9:38:58 AM

View Replies

Latest Reply

Robbie
New Contributor III

06-13-2022 12:17:20 PM

4 kudos

@Kaniz Fatma @Philip Nord, thanks!I was able to do what I needed by cloning an existing job & modifying. It's fine as a temporary fix for now.Thanks again for the response-- good to know you're aware of it & this isn't anything on my end.

4 kudos

06-13-2022 12:17:20 PM

1 More Replies

by User16826994223 • Honored Contributor III

06-17-2021 6:11:52 AM

4573 Views
2 replies
2 kudos

Mult task - restart of the failed jobs

Hi Team I am using Multitask and I am trying to restart only the failed task but seems like I have to restart complete workflow again and again , is there any way or workaround

Data Engineering

4573 Views
2 replies
2 kudos

06-17-2021 6:11:52 AM

View Replies

Latest Reply

TheOptimizer
Contributor

05-19-2022 9:33:10 AM

2 kudos

One way that works is to go to your task definition, click advanced options, and set retry policy. The task will restart per those instructions. Does that work for you?

2 kudos

05-19-2022 9:33:10 AM

1 More Replies

by saipujari_spark • Databricks Employee

09-22-2021 3:17:47 PM

7424 Views
1 replies
3 kudos

Resolved! How to restrict the number of tasks per executor?

In general, one task per core is how spark executes the tasks.If we want to restrict the number of tasks submitted to the executor to get more task to memory ratio, How can we achieve that?

Data Engineering

7424 Views
1 replies
3 kudos

09-22-2021 3:17:47 PM

View Replies

Latest Reply

saipujari_spark
Databricks Employee

09-22-2021 3:23:12 PM

3 kudos

We can use a config called "spark.task.cpus"This specifies the number of cores to allocate for each task.The default value is 1If we specify say 2, it means fewer tasks will be assigned to the executor.

3 kudos

09-22-2021 3:23:12 PM

by Anonymous • Not applicable

06-10-2021 9:16:12 PM

1806 Views
1 replies
0 kudos

Resolved! How long does a task have to be in the queue before the cluster autoscales?:

Data Engineering

1806 Views
1 replies
0 kudos

06-10-2021 9:16:12 PM

View Replies

Latest Reply

Ryan_Chynoweth
Esteemed Contributor

06-11-2021 1:46:00 AM

0 kudos

There are two types of auto scaling in Databricks: Standard and Optimized. In both scenarios when tasks are submitted the cluster will begin scaling to execute as many of them in parallel immediately.Scaling down is different. In optimized autoscalin...

0 kudos

06-11-2021 1:46:00 AM