<?xml version="1.0" encoding="UTF-8"?>
<rss xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:taxo="http://purl.org/rss/1.0/modules/taxonomy/" version="2.0">
  <channel>
    <title>topic Re: Databricks orchestration job in Data Engineering</title>
    <link>https://community.databricks.com/t5/data-engineering/databricks-orchestration-job/m-p/150488#M53432</link>
    <description>&lt;P&gt;Hi&amp;nbsp;&lt;a href="https://community.databricks.com/t5/user/viewprofilepage/user-id/192995"&gt;@maikel&lt;/a&gt;&amp;nbsp;,&lt;/P&gt;
&lt;P&gt;Happy to help! By Lakeflow Spark Declarative Pipelines (SDP) I mean using the SDP framework instead of plain PySpark / SQL. Check here for more details:&lt;/P&gt;
&lt;OL&gt;
&lt;LI&gt;&lt;A href="https://docs.databricks.com/aws/en/ldp/" target="_blank" rel="noopener"&gt;https://docs.databricks.com/aws/en/ldp/&lt;/A&gt;&lt;/LI&gt;
&lt;LI&gt;&lt;LI-MESSAGE title="Spark Declarative Pipelines “How-To” Series. Part 1: How to Save Results Into A Table" uid="149180" url="https://community.databricks.com/t5/technical-blog/spark-declarative-pipelines-how-to-series-part-1-how-to-save/m-p/149180#U149180" discussion_style_icon_css="lia-mention-container-editor-message lia-img-icon-blog-thread lia-fa-icon lia-fa-blog lia-fa-thread lia-fa"&gt;&lt;/LI-MESSAGE&gt;&amp;nbsp;&lt;/LI&gt;
&lt;/OL&gt;
&lt;P&gt;Best regards,&lt;/P&gt;</description>
    <pubDate>Tue, 10 Mar 2026 13:07:00 GMT</pubDate>
    <dc:creator>aleksandra_ch</dc:creator>
    <dc:date>2026-03-10T13:07:00Z</dc:date>
    <item>
      <title>Databricks orchestration job</title>
      <link>https://community.databricks.com/t5/data-engineering/databricks-orchestration-job/m-p/148359#M52881</link>
      <description>&lt;P&gt;Hello Community,&lt;/P&gt;&lt;P&gt;We are currently building a system in Databricks where multiple tasks are combined into a single job that produces final output data.&lt;/P&gt;&lt;P&gt;So far, our approach is based on Python notebooks (with asset bundles) that orchestrate the workflow. Each notebook calls functions from separate Python modules responsible for smaller processing steps. We can unit test the Python modules without issues, but testing the notebook logic itself is challenging. At the moment, the only way to validate the full flow is to run everything directly in Databricks.&lt;/P&gt;&lt;P&gt;Because of this limitation, we are considering replacing notebooks with pure Python files. Before making this change, I have a few questions:&lt;/P&gt;&lt;OL&gt;&lt;LI&gt;&lt;P&gt;&lt;STRONG&gt;How can variables be passed between tasks when using pure Python files?&lt;/STRONG&gt;&lt;BR /&gt;I’m familiar with passing variables between notebook tasks, but I’m unsure how this would work with Python scripts.&lt;/P&gt;&lt;/LI&gt;&lt;LI&gt;&lt;P&gt;&lt;STRONG&gt;What is the recommended approach for writing end-to-end (E2E) integration tests for a Databricks job consisting of multiple tasks?&lt;/STRONG&gt;&lt;/P&gt;&lt;/LI&gt;&lt;LI&gt;&lt;P&gt;&lt;STRONG&gt;What is the general recommendation — notebooks or pure Python files?&lt;/STRONG&gt;&lt;BR /&gt;Regardless of the option, what are the main benefits and trade-offs of each approach?&lt;/P&gt;&lt;/LI&gt;&lt;/OL&gt;&lt;P&gt;I would appreciate any insights or best practices based on your experience.&lt;/P&gt;&lt;P&gt;Thank you!&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;</description>
      <pubDate>Fri, 13 Feb 2026 18:12:04 GMT</pubDate>
      <guid>https://community.databricks.com/t5/data-engineering/databricks-orchestration-job/m-p/148359#M52881</guid>
      <dc:creator>maikel</dc:creator>
      <dc:date>2026-02-13T18:12:04Z</dc:date>
    </item>
    <item>
      <title>Re: Databricks orchestration job</title>
      <link>https://community.databricks.com/t5/data-engineering/databricks-orchestration-job/m-p/148639#M52937</link>
      <description>&lt;P&gt;Hi&amp;nbsp;&lt;a href="https://community.databricks.com/t5/user/viewprofilepage/user-id/192995"&gt;@maikel&lt;/a&gt;&amp;nbsp;,&lt;/P&gt;
&lt;OL&gt;
&lt;LI&gt;In order to pass dynamic parameters between Python script tasks:
&lt;OL&gt;
&lt;LI&gt;In the upstream task (named "&lt;STRONG&gt;task_1&lt;/STRONG&gt;"), set the dynamic parameter via dbutils:&lt;BR /&gt;&lt;LI-CODE lang="python"&gt;from databricks.sdk.runtime import *
dbutils.jobs.taskValues.set(key = "fave_food", value = "beans")​&lt;/LI-CODE&gt;&lt;/LI&gt;
&lt;LI&gt;In the downstream task, set the input parameter from the upstream task, as explained &lt;A href="https://docs.databricks.com/aws/en/jobs/task-values#reference-task-values" target="_blank" rel="noopener"&gt;here:&lt;/A&gt;&lt;BR /&gt;&lt;span class="lia-inline-image-display-wrapper lia-image-align-inline" image-alt="Screenshot 2026-02-17 at 17.25.03.png" style="width: 999px;"&gt;&lt;img src="https://community.databricks.com/t5/image/serverpage/image-id/24114i33B9B491A026F798/image-size/large?v=v2&amp;amp;px=999" role="button" title="Screenshot 2026-02-17 at 17.25.03.png" alt="Screenshot 2026-02-17 at 17.25.03.png" /&gt;&lt;/span&gt;&lt;/LI&gt;
&lt;LI&gt;In the downstream Python task itself, the dynamic parameter is passed as command-line argument:&lt;BR /&gt;&lt;LI-CODE lang="python"&gt;import argparse

p = argparse.ArgumentParser()
p.add_argument("-input_dynamic_param")
args = p.parse_args()
print(args.input_dynamic_param)​&lt;/LI-CODE&gt;&lt;BR /&gt;&lt;BR /&gt;&lt;/LI&gt;
&lt;/OL&gt;
&lt;/LI&gt;
&lt;LI&gt;A typical integration test of the workflow would be:
&lt;OL&gt;
&lt;LI&gt;Deploy the workflow via Databricks Asset Bundles (to a separate integration/staging workspace, or to a separate &lt;STRONG&gt;&lt;FONT face="courier new,courier"&gt;target&amp;nbsp;&lt;/FONT&gt;&lt;/STRONG&gt;in the DAB definition).&lt;/LI&gt;
&lt;LI&gt;Run the workflow on a subset of data.&lt;/LI&gt;
&lt;LI&gt;Output the result into a separate catalog / schema.&lt;/LI&gt;
&lt;LI&gt;Optionally, add an additional step to the workflow to compare results with the ground truth.&amp;nbsp;&lt;/LI&gt;
&lt;LI&gt;Ensure that the workflow deployment, input and output data are isolated from other workloads.&lt;/LI&gt;
&lt;/OL&gt;
&lt;/LI&gt;
&lt;LI&gt;There is no general recommendation on whether to choose Python scripts or Notebooks - it all depends on your team's habits and overall practices:
&lt;OL&gt;
&lt;LI&gt;Notebooks give richer experience (Markdown, widgets, magic commands).&lt;/LI&gt;
&lt;LI&gt;Note also that you can &lt;A href="https://docs.databricks.com/aws/en/notebooks/notebook-format#notebook-formats" target="_blank" rel="noopener"&gt;save Notebooks as plain python scripts&lt;/A&gt; and run them locally (if the code doesn't depend on some rich experience).&lt;/LI&gt;
&lt;LI&gt;Also, you can run Databricks notebooks directly via your local IDE with &lt;A href="https://docs.databricks.com/aws/en/dev-tools/databricks-connect/python/" target="_blank" rel="noopener"&gt;Databricks connect.&lt;/A&gt;&lt;/LI&gt;
&lt;LI&gt;Please note that for Lakeflow Spark Declarative Pipelines it's different. Python files are strongly recommended over Notebooks in that case.&lt;/LI&gt;
&lt;/OL&gt;
&lt;/LI&gt;
&lt;/OL&gt;
&lt;P&gt;Hope it helps.&lt;/P&gt;
&lt;P&gt;Best regards,&lt;/P&gt;</description>
      <pubDate>Tue, 17 Feb 2026 18:05:55 GMT</pubDate>
      <guid>https://community.databricks.com/t5/data-engineering/databricks-orchestration-job/m-p/148639#M52937</guid>
      <dc:creator>aleksandra_ch</dc:creator>
      <dc:date>2026-02-17T18:05:55Z</dc:date>
    </item>
    <item>
      <title>Re: Databricks orchestration job</title>
      <link>https://community.databricks.com/t5/data-engineering/databricks-orchestration-job/m-p/150465#M53423</link>
      <description>&lt;P&gt;Hello&amp;nbsp;&lt;a href="https://community.databricks.com/t5/user/viewprofilepage/user-id/102072"&gt;@aleksandra_ch&lt;/a&gt;,&lt;/P&gt;&lt;P&gt;thanks a lot for your response! Very helpful! One thing I would like to ask - by&amp;nbsp;&lt;SPAN&gt;Lakeflow Spark Declarative Pipelines do you mean the chain of jobs to perform some data engineering operations?&lt;BR /&gt;&lt;BR /&gt;Thank you!&lt;/SPAN&gt;&lt;/P&gt;</description>
      <pubDate>Tue, 10 Mar 2026 08:09:11 GMT</pubDate>
      <guid>https://community.databricks.com/t5/data-engineering/databricks-orchestration-job/m-p/150465#M53423</guid>
      <dc:creator>maikel</dc:creator>
      <dc:date>2026-03-10T08:09:11Z</dc:date>
    </item>
    <item>
      <title>Re: Databricks orchestration job</title>
      <link>https://community.databricks.com/t5/data-engineering/databricks-orchestration-job/m-p/150488#M53432</link>
      <description>&lt;P&gt;Hi&amp;nbsp;&lt;a href="https://community.databricks.com/t5/user/viewprofilepage/user-id/192995"&gt;@maikel&lt;/a&gt;&amp;nbsp;,&lt;/P&gt;
&lt;P&gt;Happy to help! By Lakeflow Spark Declarative Pipelines (SDP) I mean using the SDP framework instead of plain PySpark / SQL. Check here for more details:&lt;/P&gt;
&lt;OL&gt;
&lt;LI&gt;&lt;A href="https://docs.databricks.com/aws/en/ldp/" target="_blank" rel="noopener"&gt;https://docs.databricks.com/aws/en/ldp/&lt;/A&gt;&lt;/LI&gt;
&lt;LI&gt;&lt;LI-MESSAGE title="Spark Declarative Pipelines “How-To” Series. Part 1: How to Save Results Into A Table" uid="149180" url="https://community.databricks.com/t5/technical-blog/spark-declarative-pipelines-how-to-series-part-1-how-to-save/m-p/149180#U149180" discussion_style_icon_css="lia-mention-container-editor-message lia-img-icon-blog-thread lia-fa-icon lia-fa-blog lia-fa-thread lia-fa"&gt;&lt;/LI-MESSAGE&gt;&amp;nbsp;&lt;/LI&gt;
&lt;/OL&gt;
&lt;P&gt;Best regards,&lt;/P&gt;</description>
      <pubDate>Tue, 10 Mar 2026 13:07:00 GMT</pubDate>
      <guid>https://community.databricks.com/t5/data-engineering/databricks-orchestration-job/m-p/150488#M53432</guid>
      <dc:creator>aleksandra_ch</dc:creator>
      <dc:date>2026-03-10T13:07:00Z</dc:date>
    </item>
  </channel>
</rss>

