<?xml version="1.0" encoding="UTF-8"?>
<rss xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:taxo="http://purl.org/rss/1.0/modules/taxonomy/" version="2.0">
  <channel>
    <title>topic Re: DAB git - sometimes doesn't see modules in Data Engineering</title>
    <link>https://community.databricks.com/t5/data-engineering/dab-git-sometimes-doesn-t-see-modules/m-p/156155#M54375</link>
    <description>&lt;P&gt;I'm sorry I'm not ready to accept this as a solution. I'm not saying you are not right though. The documentation is not clear on this, or I would say that there are contraindications.&lt;/P&gt;&lt;P&gt;&lt;A href="https://docs.databricks.com/aws/en/dev-tools/bundles/job-task-types" target="_blank"&gt;Add tasks to jobs in Declarative Automation Bundles | Databricks on AWS&lt;/A&gt;:&lt;/P&gt;&lt;P&gt;"...&lt;SPAN&gt;, because local relative paths may not point to the same content in the Git repository."&lt;/SPAN&gt;&lt;/P&gt;&lt;P&gt;I'm not using relative imports to import my shared modules.&lt;/P&gt;&lt;P&gt;"I&lt;SPAN&gt;nstead, &lt;STRONG&gt;clone the repository locally&lt;/STRONG&gt; and set up your bundle project within this repository, so that the source for tasks are the workspace.&lt;/SPAN&gt;"&lt;/P&gt;&lt;P&gt;It was my understanding that when job starts it clones the repository locally to the cluster and therefore it should behave correctly:&amp;nbsp;&lt;A href="https://docs.databricks.com/aws/en/jobs/git#using-a-git-repository-source-vs-using-git-folders" target="_blank"&gt;Use Git with Lakeflow Jobs | Databricks on AWS&lt;/A&gt;&amp;nbsp;&lt;/P&gt;</description>
    <pubDate>Tue, 05 May 2026 12:20:30 GMT</pubDate>
    <dc:creator>pepco</dc:creator>
    <dc:date>2026-05-05T12:20:30Z</dc:date>
    <item>
      <title>DAB git - sometimes doesn't see modules</title>
      <link>https://community.databricks.com/t5/data-engineering/dab-git-sometimes-doesn-t-see-modules/m-p/156040#M54344</link>
      <description>&lt;P&gt;We are using DABs to deploy our jobs. DABs have source set to git branch or git tag depending on the environment.&amp;nbsp; Repository is structured in mono repo fashion. We don't use wheels for our modules. Sometimes when the jobs run they "randomly" fail that some module is not found; i.e. "&lt;SPAN&gt;ModuleNotFoundError: No module named 'lib'&lt;/SPAN&gt;". The restart runs without any issues.&amp;nbsp;&lt;/P&gt;&lt;P&gt;I'm trying to understand what's happening, but it looks like PYTHONPATH is sometimes not set correctly.&amp;nbsp;&lt;/P&gt;&lt;P&gt;Did anyone see this behavior?&lt;/P&gt;</description>
      <pubDate>Mon, 04 May 2026 06:35:28 GMT</pubDate>
      <guid>https://community.databricks.com/t5/data-engineering/dab-git-sometimes-doesn-t-see-modules/m-p/156040#M54344</guid>
      <dc:creator>pepco</dc:creator>
      <dc:date>2026-05-04T06:35:28Z</dc:date>
    </item>
    <item>
      <title>Re: DAB git - sometimes doesn't see modules</title>
      <link>https://community.databricks.com/t5/data-engineering/dab-git-sometimes-doesn-t-see-modules/m-p/156046#M54345</link>
      <description>&lt;P&gt;&lt;a href="https://community.databricks.com/t5/user/viewprofilepage/user-id/154800"&gt;@pepco&lt;/a&gt;&amp;nbsp;Would you mind sharing your DAB yaml (hiding secrets)?&lt;/P&gt;</description>
      <pubDate>Mon, 04 May 2026 06:58:42 GMT</pubDate>
      <guid>https://community.databricks.com/t5/data-engineering/dab-git-sometimes-doesn-t-see-modules/m-p/156046#M54345</guid>
      <dc:creator>Sumit_7</dc:creator>
      <dc:date>2026-05-04T06:58:42Z</dc:date>
    </item>
    <item>
      <title>Re: DAB git - sometimes doesn't see modules</title>
      <link>https://community.databricks.com/t5/data-engineering/dab-git-sometimes-doesn-t-see-modules/m-p/156114#M54359</link>
      <description>&lt;P&gt;Hello&amp;nbsp;&lt;a href="https://community.databricks.com/t5/user/viewprofilepage/user-id/154800"&gt;@pepco&lt;/a&gt;&amp;nbsp;!&lt;/P&gt;&lt;P&gt;I will share with you my personal experience about a very similar behaviour I got like you.&lt;/P&gt;&lt;P&gt;If you check DBKS doc you will find that &amp;nbsp;git_source and task source: GIT are not recommended for DAB&amp;nbsp;because local relative paths may not point to the same content in the git repo and bundles expect the deployed job to run from the same files that were deployed from the local bundle copy.&amp;nbsp;&lt;/P&gt;&lt;P&gt;You need to use&amp;nbsp;workspace source for bundle tasks instead. &lt;A title="https://docs.databricks.com/aws/en/dev-tools/bundles/job-task-types" href="https://docs.databricks.com/aws/en/dev-tools/bundles/job-task-types" target="_self"&gt;https://docs.databricks.com/aws/en/dev-tools/bundles/job-task-types&lt;/A&gt;&lt;/P&gt;&lt;P&gt;In my case, this is what I had :&lt;/P&gt;&lt;PRE&gt;mono repo
DAB deployment
source = Git branch/tag
custom local modules
no wheels
imports like: import lib&lt;/PRE&gt;&lt;P&gt;I understood at that tome that combination can work most of the time but it depends heavily on what DBKS puts into cwd&amp;nbsp;or&amp;nbsp;&amp;nbsp;sys.path for that specific task run.&lt;/P&gt;&lt;P&gt;So I took time to understand what is happening behind and in reality when a task uses git source, DBKS retrieves the notebook or python file from the git repo at runtime. For python script tasks, DBKS says that workspace paths must be absolute, while git paths are relative so if source is empty the task uses GIT when git_source is defined.&amp;nbsp;&lt;/P&gt;&lt;P&gt;My task file was found but my shared lib folder is not consistently on sys.path, so python was failing with the famous error no module named lib ....&lt;/P&gt;&lt;P&gt;A retry can succeed because the run may start in a slightly different initialized state or because the cluster already has a path state from a previous successful run. That does not mean the setup is deterministic.&lt;/P&gt;&lt;P&gt;What I have done so far is I started&amp;nbsp;avoiding runtime git source for python imports and used source: WORKSPACE&lt;/P&gt;&lt;P&gt;(you can also remove git_source and let the bundle deploy the code into the workspace that works fine)&lt;/P&gt;&lt;P&gt;If you have a mono repo, you can use sync.paths so the shared code is deployed together with the bundle.&lt;/P&gt;</description>
      <pubDate>Mon, 04 May 2026 21:44:08 GMT</pubDate>
      <guid>https://community.databricks.com/t5/data-engineering/dab-git-sometimes-doesn-t-see-modules/m-p/156114#M54359</guid>
      <dc:creator>amirabedhiafi</dc:creator>
      <dc:date>2026-05-04T21:44:08Z</dc:date>
    </item>
    <item>
      <title>Re: DAB git - sometimes doesn't see modules</title>
      <link>https://community.databricks.com/t5/data-engineering/dab-git-sometimes-doesn-t-see-modules/m-p/156131#M54361</link>
      <description>&lt;P&gt;job.yml&lt;/P&gt;&lt;LI-CODE lang="markup"&gt;resources:
  jobs:
    pdm_general_ledger_details_hub_job:
      name: org_team_pdm_general_ledger_details_hub_job
      description: "Load General Ledger Details reference hub table"
      email_notifications:
        on_failure:
          - dl@redacted.com
      performance_target: PERFORMANCE_OPTIMIZED
      tasks:
        - task_key: load_ref_pah_general_ledger_details_hub
          timeout_seconds: "${var.timeout}"
          notebook_task:
            notebook_path: notebooks/run_query
            base_parameters:
              pipeline_name: ref_pah_general_ledger_details_hub
              target_table: schema.table
              query_file: ../pipelines/pdm_general_ledger_details_hub/src/load_ref_pah_general_ledger_details_hub.sql
            source: GIT
      git_source:
        git_url: https://github.com/redacted/redacted.git
        git_provider: gitHub
      tags:
        app-ci-id: ${var.configuration_item}
        job-type: child
&lt;/LI-CODE&gt;&lt;P&gt;databricks.yml&lt;/P&gt;&lt;P&gt;bundle:&lt;BR /&gt;name: pdm_ref_general_ledger_details&lt;/P&gt;&lt;P&gt;# These are any additional configuration files to include.&lt;BR /&gt;include:&lt;BR /&gt;- bundle/jobs/*.yml&lt;BR /&gt;- bundle/variables/*.yml&lt;/P&gt;&lt;P&gt;non_production_job_permissions: &amp;amp;non_prod_job_permissions&lt;BR /&gt;permissions:&lt;BR /&gt;- level: CAN_MANAGE&lt;BR /&gt;group_name: redacted&lt;BR /&gt;- level: CAN_MANAGE&lt;BR /&gt;service_principal_name: ${var.service_account}&lt;/P&gt;&lt;P&gt;production_job_permissions: &amp;amp;prod_job_permissions&lt;BR /&gt;permissions:&lt;BR /&gt;- level: CAN_MANAGE_RUN&lt;BR /&gt;group_name: redacted&lt;BR /&gt;- level: CAN_MANAGE&lt;BR /&gt;service_principal_name: ${var.service_account}&lt;BR /&gt;- level: CAN_MANAGE&lt;BR /&gt;service_principal_name: ${var.snow_service_account}&lt;/P&gt;&lt;P&gt;non_production_job_notifications: &amp;amp;non_prod_job_notifications&lt;BR /&gt;email_notifications:&lt;BR /&gt;on_failure:&lt;BR /&gt;- dl@redacted&lt;/P&gt;&lt;P&gt;production_job_notifications: &amp;amp;prod_job_notifications&lt;BR /&gt;email_notifications:&lt;BR /&gt;on_failure:&lt;BR /&gt;- dl@redacted&lt;BR /&gt;webhook_notifications:&lt;BR /&gt;on_failure:&lt;BR /&gt;- id: ${var.snow_webhook_id}&lt;/P&gt;&lt;P&gt;targets:&lt;BR /&gt;test:&lt;BR /&gt;mode: production&lt;BR /&gt;default: false&lt;BR /&gt;presets:&lt;BR /&gt;trigger_pause_status: UNPAUSED&lt;BR /&gt;jobs_max_concurrent_runs: 1&lt;BR /&gt;workspace:&lt;BR /&gt;host: &lt;A href="https://redacted.cloud.databricks.com" target="_blank"&gt;https://redacted.cloud.databricks.com&lt;/A&gt;&lt;BR /&gt;root_path: /Workspace/org/team/.bundle/${bundle.target}/${var.developer_id}/${bundle.name}&lt;BR /&gt;resources:&lt;BR /&gt;jobs:&lt;BR /&gt;pdm_general_ledger_details_hub_job:&lt;BR /&gt;git_source:&lt;BR /&gt;git_branch: ${var.git_branch}&lt;BR /&gt;&amp;lt;&amp;lt;:&lt;BR /&gt;- *non_prod_job_permissions&lt;BR /&gt;- *non_prod_job_notifications&lt;BR /&gt;variables:&lt;BR /&gt;uc_catalog:&lt;BR /&gt;description: Unity Catalog prefix.&lt;BR /&gt;default: "tst"&lt;BR /&gt;configuration_item:&lt;BR /&gt;default: redacted&lt;/P&gt;</description>
      <pubDate>Tue, 05 May 2026 06:40:45 GMT</pubDate>
      <guid>https://community.databricks.com/t5/data-engineering/dab-git-sometimes-doesn-t-see-modules/m-p/156131#M54361</guid>
      <dc:creator>pepco</dc:creator>
      <dc:date>2026-05-05T06:40:45Z</dc:date>
    </item>
    <item>
      <title>Re: DAB git - sometimes doesn't see modules</title>
      <link>https://community.databricks.com/t5/data-engineering/dab-git-sometimes-doesn-t-see-modules/m-p/156132#M54362</link>
      <description>&lt;DIV&gt;&lt;DIV&gt;&lt;SPAN&gt;I'm aware of the description in documentation. I observed this problem only with serverless clusters. With job clusters it never failed in 16+ months we use bundles with git.&lt;/SPAN&gt;&lt;/DIV&gt;&lt;BR /&gt;&lt;DIV&gt;&lt;SPAN&gt;If I move to the workspace, then I would need to add workspace path into the PATH variable (according to all posts on the forum) which brings other problems on the table.&lt;/SPAN&gt;&lt;/DIV&gt;&lt;/DIV&gt;</description>
      <pubDate>Tue, 05 May 2026 06:41:41 GMT</pubDate>
      <guid>https://community.databricks.com/t5/data-engineering/dab-git-sometimes-doesn-t-see-modules/m-p/156132#M54362</guid>
      <dc:creator>pepco</dc:creator>
      <dc:date>2026-05-05T06:41:41Z</dc:date>
    </item>
    <item>
      <title>Re: DAB git - sometimes doesn't see modules</title>
      <link>https://community.databricks.com/t5/data-engineering/dab-git-sometimes-doesn-t-see-modules/m-p/156141#M54367</link>
      <description>&lt;P&gt;Hi again&amp;nbsp;&lt;a href="https://community.databricks.com/t5/user/viewprofilepage/user-id/154800"&gt;@pepco&lt;/a&gt;&amp;nbsp;as I explained in my answer below, the issue is probably caused by using source: GIT / git_source together with DABs. DBKS does not recommend this pattern for bundles, because the job runs from Git at runtime instead of from the workspace files deployed by the bundle.&lt;/P&gt;&lt;P&gt;In a mono repo, this can make relative imports like lib unreliable. You should remove git_source and source: GIT, deploy the code with the bundle, use workspace source paths, and include shared folders through sync.paths. Also don't forget to make the repo root explicit in sys.path or package the shared code as a wheel.&lt;/P&gt;</description>
      <pubDate>Tue, 05 May 2026 09:54:28 GMT</pubDate>
      <guid>https://community.databricks.com/t5/data-engineering/dab-git-sometimes-doesn-t-see-modules/m-p/156141#M54367</guid>
      <dc:creator>amirabedhiafi</dc:creator>
      <dc:date>2026-05-05T09:54:28Z</dc:date>
    </item>
    <item>
      <title>Re: DAB git - sometimes doesn't see modules</title>
      <link>https://community.databricks.com/t5/data-engineering/dab-git-sometimes-doesn-t-see-modules/m-p/156155#M54375</link>
      <description>&lt;P&gt;I'm sorry I'm not ready to accept this as a solution. I'm not saying you are not right though. The documentation is not clear on this, or I would say that there are contraindications.&lt;/P&gt;&lt;P&gt;&lt;A href="https://docs.databricks.com/aws/en/dev-tools/bundles/job-task-types" target="_blank"&gt;Add tasks to jobs in Declarative Automation Bundles | Databricks on AWS&lt;/A&gt;:&lt;/P&gt;&lt;P&gt;"...&lt;SPAN&gt;, because local relative paths may not point to the same content in the Git repository."&lt;/SPAN&gt;&lt;/P&gt;&lt;P&gt;I'm not using relative imports to import my shared modules.&lt;/P&gt;&lt;P&gt;"I&lt;SPAN&gt;nstead, &lt;STRONG&gt;clone the repository locally&lt;/STRONG&gt; and set up your bundle project within this repository, so that the source for tasks are the workspace.&lt;/SPAN&gt;"&lt;/P&gt;&lt;P&gt;It was my understanding that when job starts it clones the repository locally to the cluster and therefore it should behave correctly:&amp;nbsp;&lt;A href="https://docs.databricks.com/aws/en/jobs/git#using-a-git-repository-source-vs-using-git-folders" target="_blank"&gt;Use Git with Lakeflow Jobs | Databricks on AWS&lt;/A&gt;&amp;nbsp;&lt;/P&gt;</description>
      <pubDate>Tue, 05 May 2026 12:20:30 GMT</pubDate>
      <guid>https://community.databricks.com/t5/data-engineering/dab-git-sometimes-doesn-t-see-modules/m-p/156155#M54375</guid>
      <dc:creator>pepco</dc:creator>
      <dc:date>2026-05-05T12:20:30Z</dc:date>
    </item>
    <item>
      <title>Re: DAB git - sometimes doesn't see modules</title>
      <link>https://community.databricks.com/t5/data-engineering/dab-git-sometimes-doesn-t-see-modules/m-p/156156#M54376</link>
      <description>&lt;P&gt;I think you can only observe this on serverless compute. The same DAB with git source setup has been stable on job clusters for over a year so what I understand is that the issue is with how the git repo root is added to python import path. As a workaround resolve the repo root from Path.cwd() and add it to sys.path at the start of the notebook instead of hardcoding /Workspace/... path or moving everything to workspace source.&lt;/P&gt;</description>
      <pubDate>Tue, 05 May 2026 12:20:42 GMT</pubDate>
      <guid>https://community.databricks.com/t5/data-engineering/dab-git-sometimes-doesn-t-see-modules/m-p/156156#M54376</guid>
      <dc:creator>amirabedhiafi</dc:creator>
      <dc:date>2026-05-05T12:20:42Z</dc:date>
    </item>
  </channel>
</rss>

