cancel
Showing results for 
Search instead for 
Did you mean: 
Data Engineering
Join discussions on data engineering best practices, architectures, and optimization strategies within the Databricks Community. Exchange insights and solutions with fellow data engineers.
cancel
Showing results for 
Search instead for 
Did you mean: 

How to pass outputs from a python task to a notebook task

RJB
New Contributor II

I am trying to create a job which has 2 tasks as follows:

  1. A python task which accepts a date and an integer from the user and outputs a list of dates (say, a list of 5 dates in string format).
  2. A notebook which runs once for each of the dates from the dates list from the previous task. Each run of the notebook should take the one element output of the date list.

While this is relatively easy to do using azure pipelines (current implementation) I am not able to do this from within DataBricks Jobs. Basically, I don't know how to do the following:

  1. How to pass outputs from the python Task as the inputs to the notebook Task.
  2. How do structure a for loop from the Jobs interface. I understand that the looping through the date list can be done easily within the notebook, but, would really like to know if that is possible to create loops using the Jobs and run the notebooks in parallel.

Please let me know if anything is not clear.

Edit:

I tried the date_tests=dbutils.jobs.taskValues.set() and get() to pass the values. However, it seems to be disabled for my workspace. I get the error "com.databricks.common.client.DatabricksServiceHttpClientException: FEATURE_DISABLED: The task values API is disabled for this workspace". And, I do not know how to enable it. I do have access to the admin console but I have no clue where to look for this feature to enable. Please let me know if you have some idea where to find it.

Thanks,

1 ACCEPTED SOLUTION

Accepted Solutions

BilalAslamDbrx
Honored Contributor III
Honored Contributor III

@Rahul Bahadur​ there a few ways to pass values between tasks in a job:

  1. [new] we are previewing a new API for setting and getting small values (e.g. a few KB or less) between tasks in a job.
  2. Write the value to a table in one task and read it from another task

So it appears you are trying to use No. 1. Please email me at bilal dot aslam at databricks dot com and I will get you enrolled in the preview.

View solution in original post

6 REPLIES 6

-werners-
Esteemed Contributor III

I think that option was disabled when the new job functionality was introduced.

There is only one jobs setting in the admin panel: "Task orchestration in Jobs"

To pass parameters into a job you can use the jobs API

(https://docs.microsoft.com/en-us/azure/databricks/jobs) or the CLI.

Or use notebook workflows, where you can run notebooks in parallel.

For the latter you are not using jobs, but jobs are just a way of scheduling notebooks (or jars).

This will also be the most transparent imo.

Hubert-Dudek
Esteemed Contributor III

"task which accepts a date and an integer from the user " how user enter this dates?, through web on external web server? or inside databricks through widget? or some other way?

RJB
New Contributor II

The input is through the UI or can also be through a .txt file. See attached pic for details

BilalAslamDbrx
Honored Contributor III
Honored Contributor III

@Rahul Bahadur​ there a few ways to pass values between tasks in a job:

  1. [new] we are previewing a new API for setting and getting small values (e.g. a few KB or less) between tasks in a job.
  2. Write the value to a table in one task and read it from another task

So it appears you are trying to use No. 1. Please email me at bilal dot aslam at databricks dot com and I will get you enrolled in the preview.

RJB
New Contributor II

Thanks @Bilal Aslam​ ​ . This is exactly what I was looking for. Also, is there a way to create parallel loops for running a notebook concurrently . For example, running a notebook 5 times with 5 different values from a list which was entered as a parameter? I know I can put a for loop within a notebook, but, that would mean running the code serially for the values in the list. I was wondering if it is possible to run the notebooks concurrently - Is there a way to architect a parallel for loop from within the Jobs API or the Jobs UI?. Thanks

BilalAslamDbrx
Honored Contributor III
Honored Contributor III

Just a note that this feature, Task Values, has been generally available for a while.

Join 100K+ Data Experts: Register Now & Grow with Us!

Excited to expand your horizons with us? Click here to Register and begin your journey to success!

Already a member? Login and join your local regional user group! If there isn’t one near you, fill out this form and we’ll create one for you to join!