02-13-2026 10:12 AM
Hello Community,
We are currently building a system in Databricks where multiple tasks are combined into a single job that produces final output data.
So far, our approach is based on Python notebooks (with asset bundles) that orchestrate the workflow. Each notebook calls functions from separate Python modules responsible for smaller processing steps. We can unit test the Python modules without issues, but testing the notebook logic itself is challenging. At the moment, the only way to validate the full flow is to run everything directly in Databricks.
Because of this limitation, we are considering replacing notebooks with pure Python files. Before making this change, I have a few questions:
How can variables be passed between tasks when using pure Python files?
I’m familiar with passing variables between notebook tasks, but I’m unsure how this would work with Python scripts.
What is the recommended approach for writing end-to-end (E2E) integration tests for a Databricks job consisting of multiple tasks?
What is the general recommendation — notebooks or pure Python files?
Regardless of the option, what are the main benefits and trade-offs of each approach?
I would appreciate any insights or best practices based on your experience.
Thank you!
02-17-2026 10:05 AM
Hi @maikel ,
from databricks.sdk.runtime import *
dbutils.jobs.taskValues.set(key = "fave_food", value = "beans")import argparse
p = argparse.ArgumentParser()
p.add_argument("-input_dynamic_param")
args = p.parse_args()
print(args.input_dynamic_param)Hope it helps.
Best regards,
02-17-2026 10:05 AM
Hi @maikel ,
from databricks.sdk.runtime import *
dbutils.jobs.taskValues.set(key = "fave_food", value = "beans")import argparse
p = argparse.ArgumentParser()
p.add_argument("-input_dynamic_param")
args = p.parse_args()
print(args.input_dynamic_param)Hope it helps.
Best regards,
03-10-2026 01:09 AM
Hello @aleksandra_ch,
thanks a lot for your response! Very helpful! One thing I would like to ask - by Lakeflow Spark Declarative Pipelines do you mean the chain of jobs to perform some data engineering operations?
Thank you!
03-10-2026 06:07 AM
Hi @maikel ,
Happy to help! By Lakeflow Spark Declarative Pipelines (SDP) I mean using the SDP framework instead of plain PySpark / SQL. Check here for more details:
Best regards,