topic Re: Databricks orchestration job in Data Engineering

Databricks orchestration job

maikel — Fri, 13 Feb 2026 18:12:04 GMT

Hello Community,

We are currently building a system in Databricks where multiple tasks are combined into a single job that produces final output data.

So far, our approach is based on Python notebooks (with asset bundles) that orchestrate the workflow. Each notebook calls functions from separate Python modules responsible for smaller processing steps. We can unit test the Python modules without issues, but testing the notebook logic itself is challenging. At the moment, the only way to validate the full flow is to run everything directly in Databricks.

Because of this limitation, we are considering replacing notebooks with pure Python files. Before making this change, I have a few questions:

How can variables be passed between tasks when using pure Python files?
I’m familiar with passing variables between notebook tasks, but I’m unsure how this would work with Python scripts.
What is the recommended approach for writing end-to-end (E2E) integration tests for a Databricks job consisting of multiple tasks?
What is the general recommendation — notebooks or pure Python files?
Regardless of the option, what are the main benefits and trade-offs of each approach?

I would appreciate any insights or best practices based on your experience.

Thank you!

Re: Databricks orchestration job

aleksandra_ch — Tue, 17 Feb 2026 18:05:55 GMT

Hi @maikel ,

In order to pass dynamic parameters between Python script tasks:
1. In the upstream task (named "task_1"), set the dynamic parameter via dbutils:
  from databricks.sdk.runtime import * dbutils.jobs.taskValues.set(key = "fave_food", value = "beans")
2. In the downstream task, set the input parameter from the upstream task, as explained here:
3. In the downstream Python task itself, the dynamic parameter is passed as command-line argument:
  import argparse p = argparse.ArgumentParser() p.add_argument("-input_dynamic_param") args = p.parse_args() print(args.input_dynamic_param)
A typical integration test of the workflow would be:
1. Deploy the workflow via Databricks Asset Bundles (to a separate integration/staging workspace, or to a separate target in the DAB definition).
2. Run the workflow on a subset of data.
3. Output the result into a separate catalog / schema.
4. Optionally, add an additional step to the workflow to compare results with the ground truth.
5. Ensure that the workflow deployment, input and output data are isolated from other workloads.
There is no general recommendation on whether to choose Python scripts or Notebooks - it all depends on your team's habits and overall practices:
1. Notebooks give richer experience (Markdown, widgets, magic commands).
2. Note also that you can save Notebooks as plain python scripts and run them locally (if the code doesn't depend on some rich experience).
3. Also, you can run Databricks notebooks directly via your local IDE with Databricks connect.
4. Please note that for Lakeflow Spark Declarative Pipelines it's different. Python files are strongly recommended over Notebooks in that case.

Hope it helps.

Best regards,

Re: Databricks orchestration job

maikel — Tue, 10 Mar 2026 08:09:11 GMT

Hello @aleksandra_ch,

thanks a lot for your response! Very helpful! One thing I would like to ask - by Lakeflow Spark Declarative Pipelines do you mean the chain of jobs to perform some data engineering operations?

Thank you!

Re: Databricks orchestration job

aleksandra_ch — Tue, 10 Mar 2026 13:07:00 GMT

Hi @maikel ,

Happy to help! By Lakeflow Spark Declarative Pipelines (SDP) I mean using the SDP framework instead of plain PySpark / SQL. Check here for more details:

https://docs.databricks.com/aws/en/ldp/

Best regards,