cancel
Showing results for 
Search instead for 
Did you mean: 
Data Engineering
Join discussions on data engineering best practices, architectures, and optimization strategies within the Databricks Community. Exchange insights and solutions with fellow data engineers.
cancel
Showing results for 
Search instead for 
Did you mean: 

Bug in Asset Bundle Sync

max_eg
New Contributor

I think I found a bug in the way asset bundles sync/ deploy, or at least I have a question if I understood it correctly. 

My Setup:

I have an asset bundle, consisting of a notebook nb1.py and a utils module utils.py.

nb1.py imports functions from utils.py. 

I develop locally, and push the bundle via dabaricks cli tool with "databricks bundle deploy" and "databricks bundle run". 

My Workflow:

1. I run the bundle

2. the job fails due to some reason, and I do some debugging in the notebook directly in the bundle folder in Databricks.

3. I find the error, which was located in the utils.py file, and adjust the utils.py file locally. At this point I have edited the utils.py file locally, and the nb1.py file in DBX.

4. I run "databricks bundle deploy" and "datarbicks bundle run"

5. I observed that my local changes in utils.py were successfully synced with the file in the online bundle, but the changes I did online in nb1.py were not overwritten (as I expected) by my unchanged local nb1.py file to remove the code I wrote for debugging.  

6. My workaround: Make a dummy change locally in nb1.py, by adding a line "print('hello world')", then deploy again, then the online nb1.py is identical to the local one again.

My question:

I guess this is due to the way the sync works - only looking for edits locally, and if any was found, the file is synced. Is this correct? 
Is there a better solution to my problem, as for example a cleaning of all files online before I deploy?

Thanks!

1 ACCEPTED SOLUTION

Accepted Solutions

bianca_unifeye
New Contributor II

Hi @max_eg

What you’re seeing is expected with Asset Bundles.
databricks bundle deploy computes what changed locally and only uploads those files. If you edited nb1.py in the workspace (not locally), the deploy won’t “see” a local delta for that file, so it leaves the remote copy as-is. That’s why your local tweak to utils.py synced, but your debug edits in the remote nb1.py persisted until you made a dummy local change.

Recommended practices / options

  1. Treat the workspace copy as read-only.
    Do your debugging locally (or in a separate scratch area in the workspace, not under the bundle’s target path), then deploy. With bundles, your local repo is the source of truth.

  2. Clean the target path before deploy (hard reset).
    If you need the remote to exactly mirror local, delete the deployed files and redeploy:

     

     
# find the bundle’s workspace path from your bundle config (workspace.root_path) 
#bash
databricks workspace rm -r /Workspace/…/your-bundle-path # ‌‌ deletes files under that path databricks bundle deploy

This removes any “drift” introduced by editing in the UI.

  • Package shared code as a wheel instead of loose .py files.
    Put utils.py into a small package (e.g., setup.cfg/pyproject.toml) and reference it in the job libraries. That way your job uses a versioned artifact and you avoid partial sync quirks.

  • If you must edit online:
    Copy the notebook to a scratch folder (outside the bundle path), debug there, then bring the fix back to local and redeploy. Avoid editing the deployed bundle path directly.

Your workaround (adding a local “hello world” so the file re-uploads) works because it nudges the local fingerprint, but the approaches above are cleaner and repeatable for CI/CD.

Hope that helps!

View solution in original post

1 REPLY 1

bianca_unifeye
New Contributor II

Hi @max_eg

What you’re seeing is expected with Asset Bundles.
databricks bundle deploy computes what changed locally and only uploads those files. If you edited nb1.py in the workspace (not locally), the deploy won’t “see” a local delta for that file, so it leaves the remote copy as-is. That’s why your local tweak to utils.py synced, but your debug edits in the remote nb1.py persisted until you made a dummy local change.

Recommended practices / options

  1. Treat the workspace copy as read-only.
    Do your debugging locally (or in a separate scratch area in the workspace, not under the bundle’s target path), then deploy. With bundles, your local repo is the source of truth.

  2. Clean the target path before deploy (hard reset).
    If you need the remote to exactly mirror local, delete the deployed files and redeploy:

     

     
# find the bundle’s workspace path from your bundle config (workspace.root_path) 
#bash
databricks workspace rm -r /Workspace/…/your-bundle-path # ‌‌ deletes files under that path databricks bundle deploy

This removes any “drift” introduced by editing in the UI.

  • Package shared code as a wheel instead of loose .py files.
    Put utils.py into a small package (e.g., setup.cfg/pyproject.toml) and reference it in the job libraries. That way your job uses a versioned artifact and you avoid partial sync quirks.

  • If you must edit online:
    Copy the notebook to a scratch folder (outside the bundle path), debug there, then bring the fix back to local and redeploy. Avoid editing the deployed bundle path directly.

Your workaround (adding a local “hello world” so the file re-uploads) works because it nudges the local fingerprint, but the approaches above are cleaner and repeatable for CI/CD.

Hope that helps!

Join Us as a Local Community Builder!

Passionate about hosting events and connecting people? Help us grow a vibrant local community—sign up today to get started!

Sign Up Now