cancel
Showing results for 
Search instead for 
Did you mean: 
Data Engineering
Join discussions on data engineering best practices, architectures, and optimization strategies within the Databricks Community. Exchange insights and solutions with fellow data engineers.
cancel
Showing results for 
Search instead for 
Did you mean: 

for each documentation is lacking in detail

bbg
New Contributor II

we are trying to use for each task to concurrently execute a spark_python_task with different input parameters.

Here is the flow:

Job JSON configuration will configure a job cluster. task1 will set the payload something like

```python

dbutils = DBUtils(spark)

dbutils.jobs.taskValues.set(
    key="pay_load",
    value=[]
```
 
the next task will depend on task1 and create a for each task with the iteration task of spark_python_task which does the actual ETL.
```javascript
{
              "task_key": "loop_job",
              "depends_on": [
                {
                  "task_key": "task1"
                }
              ],
              "run_if": "ALL_SUCCESS",
              "description": "Processes each table mapping using foreach input",
              "for_each_task": {
                "inputs": "{{tasks.XXXX.values.pay_load}}",
                "concurrency": 3,
                "task": {
                  "task_key": "loop_job_iteration",
                  "run_if": "ALL_SUCCESS",
                  "spark_python_task": {
                    "python_file": "file://XXXXXXXX",
                    "parameters": [
                      "--input1",
                      "{{input}}"
                    ]
                  },
```
 
when I try to deploy this task I get the following error message.
 
HTTPError: 400 Client Error: Bad Request for url:
***/api/2.1/jobs/reset
Response from server:
{ 'details': [ { '@type': 'type.googleapis.com/google.rpc.RequestInfo',
'request_id': 'bec8b327-670a-4318-93ed-030025f20206',
'serving_data': ''}],
'error_code': 'INVALID_PARAMETER_VALUE',
'message': 'For each does not support dependent libraries. Remove the '
'dependent libraries and retry again.'}
 
 
Please help me understand how to fix this.
I have not specified any libraries in task configuration.
Using dbx to deploy the workflow to target databricks cluster
 
1 REPLY 1

SP_6721
Contributor III

Hi @bbg ,

Even if you haven't added libraries manually, tools like dbx can include dependencies from your project setup, which can trigger this error. Try removing all libraries and environment settings from your job, cluster, and task configs before deploying again.

Also, just to note - dbx is deprecated. It's been replaced by Databricks Asset Bundles, which work better with features like for_each_task. If possible, consider switching to Bundles for smoother deployment.

Join Us as a Local Community Builder!

Passionate about hosting events and connecting people? Help us grow a vibrant local community—sign up today to get started!

Sign Up Now