Data Engineering

Forum Posts

Sorted by:

by swzzzsw • New Contributor III

01-24-2022 11:17:24 AM

11606 Views
4 replies
9 kudos

"Run now with different parameters" - different parameters not recognized by jobs involving multiple tasks

I'm running a databricks job involving multiple tasks and would like to run the job with different set of task parameters. I can achieve that by edit each task and and change the parameter values. However, it gets very manual when I have a lot of tas...

Data Engineering

11606 Views
4 replies
9 kudos

01-24-2022 11:17:24 AM

View Replies

Latest Reply

VijayNakkonda
New Contributor II

03-03-2025 2:13:02 AM

9 kudos

Dear Team, For now, I found a solution. Disconnect the bundle source on Databricks, edit the parameters that you want to run. After execution, redeploy your code again from repository.

9 kudos

03-03-2025 2:13:02 AM

3 More Replies

by thib • New Contributor III

06-14-2022 10:52:04 AM

8339 Views
4 replies
3 kudos

Can we use multiple git repos for a job running multiple tasks?

I have a job running multiple tasks :Task 1 runs a machine learning pipeline from git repo 1Task 2 runs an ETL pipeline from git repo 1Task 2 is actually a generic pipeline and should not be checked in repo 1, and will be made available in another re...

Data Engineering

8339 Views
4 replies
3 kudos

06-14-2022 10:52:04 AM

View Replies

Latest Reply

tors_r_us
New Contributor II

02-03-2025 10:27:07 PM

3 kudos

Had this same problem. Fix was to have two workflows with no triggers, each pointing to the respective git repo. Then setup a 3rd workflow with appropriate triggers/schedule which calls the first 2 workflows. A workflow can run other workflows.

3 kudos

02-03-2025 10:27:07 PM

3 More Replies

by RKNutalapati • Valued Contributor

07-11-2022 6:41:22 AM

2604 Views
3 replies
0 kudos

Jobs API "run now" - How to set task wise parameters

I have a job with multiple tasks like Task1 -> Task2 -> Task3. I am trying to call the job using api "run now". Task details are belowTask1 - It executes a Note Book with some input parametersTask2 - It runs using "ABC.jar", so its a jar based task ...

Data Engineering

2604 Views
3 replies
0 kudos

07-11-2022 6:41:22 AM

View Replies

Latest Reply

Harsha777
New Contributor III

07-10-2024 11:14:58 AM

0 kudos

Hi,It would be a good feature to pass parameters at task level. We have scenarios where we would like to create a job with multiple tasks (notebook/dbt) and pass parameters at task level.

0 kudos

07-10-2024 11:14:58 AM

2 More Replies

by dave_hiltbrand • New Contributor II

06-22-2023 7:47:26 PM

7190 Views
3 replies
0 kudos

I have a job with multiple tasks running asynchronously and I don't think its leveraging all the nodes on the cluster based on runtime.

I have a job with multiple tasks running asynchronously and I don't think its leveraging all the nodes on the cluster based on runtime. I open the Spark UI for the cluster and checkout the executors and don't see any tasks for my worker nodes. How ca...

Data Engineering

7190 Views
3 replies
0 kudos

06-22-2023 7:47:26 PM

View Replies

Latest Reply

Anonymous
Not applicable

06-23-2023 12:18:56 AM

0 kudos

Hi @Dave Hiltbrand Great to meet you, and thanks for your question! Let's see if your peers in the community have an answer to your question. Thanks.

0 kudos

06-23-2023 12:18:56 AM

2 More Replies

by assapin • New Contributor

11-12-2022 2:54:32 AM

1799 Views
0 replies
0 kudos

{{start_time}} isn't accurate and doesn't behave logically for multi-task jobs

I am trying to run an incremental data processing job using python wheel.The job is scheduled to run e.g. every hour.For my code to know what data increment to process, I inject it with the {{start_time}} as part of the command line, like so["end_dat...

Data Engineering

1799 Views
0 replies
0 kudos

11-12-2022 2:54:32 AM

by Arun_tsr • New Contributor III

11-08-2022 10:30:02 PM

3073 Views
2 replies
0 kudos

Spark SQL output multiple small files

We are having multiple joins involving a large table (about 500gb in size). The output of the joins is stored into multiple small files each of size 800kb-1.5mb. Because of this the job is split into multiple tasks and taking a long time to complete....

Data Engineering

3073 Views
2 replies
0 kudos

11-08-2022 10:30:02 PM

View Replies

Latest Reply

Debayan
Databricks Employee

11-08-2022 11:32:03 PM

0 kudos

Hi @Arun Balaji , Could you please provide the error message you are receiving?

0 kudos

11-08-2022 11:32:03 PM

1 More Replies

by RJB • New Contributor II

03-03-2022 1:16:27 PM

14146 Views
6 replies
0 kudos

Resolved! How to pass outputs from a python task to a notebook task

I am trying to create a job which has 2 tasks as follows:A python task which accepts a date and an integer from the user and outputs a list of dates (say, a list of 5 dates in string format).A notebook which runs once for each of the dates from the d...

Data Engineering

14146 Views
6 replies
0 kudos

03-03-2022 1:16:27 PM

View Replies

Latest Reply

BilalAslamDbrx
Databricks Employee

10-22-2022 1:14:35 AM

0 kudos

Just a note that this feature, Task Values, has been generally available for a while.

0 kudos

10-22-2022 1:14:35 AM

5 More Replies

by swzzzsw • New Contributor III

01-24-2022 11:34:29 AM

7436 Views
5 replies
2 kudos

Resolved! Pass variable values from one task to another

I created a Databricks job with multiple tasks. Is there a way to pass variable values from one task to another. For example, if I have tasks A and B as Databricks notebooks. Can I create a variable (e.g. x) in notebook A and later use that value in ...

Data Engineering

7436 Views
5 replies
2 kudos

01-24-2022 11:34:29 AM

View Replies

Latest Reply

-werners-
Esteemed Contributor III

01-25-2022 7:26:43 AM

2 kudos

you could also consider using an orchestration tool like Data Factory (Azure) or Glue (AWS). there you can inject and use parameters from notebooks.The job scheduling of databricks also has the possibility to add parameters, but I do not know if yo...

2 kudos

01-25-2022 7:26:43 AM

4 More Replies

Databricks Community

"Run now with different parameters" - different parameters not recognized by jobs involving multiple tasks

Can we use multiple git repos for a job running multiple tasks?

Jobs API "run now" - How to set task wise parameters

I have a job with multiple tasks running asynchronously and I don't think its leveraging all the nodes on the cluster based on runtime.

{{start_time}} isn't accurate and doesn't behave logically for multi-task jobs

Spark SQL output multiple small files

Resolved! How to pass outputs from a python task to a notebook task

Resolved! Pass variable values from one task to another