11-20-2023 01:43 PM
I'm looking at this page (Databricks Asset Bundles development work tasks) in the Databricks documentation.
When repo assets are deployed to a databricks workspace, it is not clear if the "databricks bundle deploy" will remove files from the target workspace that aren't in the source repo. For example, if a repo contained a notebook named "test1.py" and had been deployed, but then "test1.py" was removed from the repo and a new notebook "test2.py" was created, what is the content of the target workspace after? I believe it will contain both "test1.py" and "test2.py".
Secondly, the description of "databricks bundle destroy" does not indicate that it would remove all files from the workspace - only that it will remove all the artifacts referenced by the bundle. So when the "test1.py" file has been removed from the repo, and the "databricks bundle destroy" is run, will it only remove "test2.py" (which has not yet been deployed)?
I am trying to determine how to ensure that the shared workspace contains only the files that are in the repo - that whatever I do in a release pipeline, I will only have the latest assets in the workspace that are in the repo, and none of the old files that were previously in the repo.
The semantics of "databricks bundle deploy" (in particular the term "deploy") would indicate to me that it should do a clean up of assets in the target workspace as part of the deployment.
But if that is not the case, then if I did a "databricks bundle destroy" prior to the "databricks bundle deploy", would that adequately clean up the target workspace? Or do I need to do something with "databricks fs rm" to delete all the files in the target workspace folder prior to the bundle deploy?
11-20-2023 09:38 PM
Hi @xhead ,
When deploying repo assets to a Databricks workspace using the “databricks bundle deploy” command, it’s essential to understand how it interacts with existing files in the target workspace.
Let’s address your concerns:
The behaviour of “databricks bundle deploy”:
“databricks bundle destroy”:
Ensuring Workspace Consistency:
Semantic Implications:
Remember to tailor your approach based on your specific requirements and workflow. Happy bundling!
11-20-2023 09:38 PM
Hi @xhead ,
When deploying repo assets to a Databricks workspace using the “databricks bundle deploy” command, it’s essential to understand how it interacts with existing files in the target workspace.
Let’s address your concerns:
The behaviour of “databricks bundle deploy”:
“databricks bundle destroy”:
Ensuring Workspace Consistency:
Semantic Implications:
Remember to tailor your approach based on your specific requirements and workflow. Happy bundling!
03-18-2024 04:18 PM
With thew newer Datbricks CLI (v0.215.0) this seems to be broken. Now I can't destroy a bundle if it doesn't exist - it used to be idempotent. Now I get this error (shortned my deploy area to <ws> below:
Starting plan computation
Planning complete and persisted at <ws>/dab-stage/pytest/.databricks/bundle/new-cluster/terraform/plan
No resources to destroy in plan. Skipping destroy!
Error: open <ws>/dab-stage/pytest/.databricks/bundle/new-cluster/terraform/terraform.tfstate: no such file or directory
make: *** [test-on-cluster] Error 1
03-27-2024 09:54 AM
Will you add a synchronization option that does not remove existing jobs and pipelines?
We are using DAB for DBT and generally it works well, however, lifecycling models is a bit of a issue at the moment 🙂
08-09-2024 08:21 AM
Quick update on this: Now if you remove a file locally (or from GIT in the case of CI/CD) and run "bundle deploy" from the CLI, it will remove the corresponding file from your Databricks workspace.
e.g.
1. Add new file locally, run "bundle deploy"
2. File appears in Databricks workspace
3. Remove file locally, run "bundle deploy"
4. File is removed automatically from the Databricks workspace
Therefore, I don't think there's a need to manually do a cleanup of files.
11-21-2023 06:53 AM
One further question:
Which bundle configuration files? The ones in the repo? Or are there bundle configuration files in the target workspace location that are used? If the previous version of the bundle contained a reference to test1.py and it has been deployed to a shared workspace, and the new version of the repo no longer contains test1.py, will the destroy command remove test1.py from the shared workspace?
08-12-2024 03:33 AM
xhead I think the configuration files it's referring to is the local ones in your repo. It checks these against what has been deployed in the workspace and will remove anything that you've got rid of in you repo in the new version. Behind the scenes it uses a terraform state file to keep track of what has been deployed, which is saved in the workspace along with your other files in the bundle.
In your example, yes it should remove test1.py from the shared workspace.
Join a Regional User Group to connect with local Databricks users. Events will be happening in your city, and you won’t want to miss the chance to attend and share knowledge.
If there isn’t a group near you, start one and help create a community that brings people together.
Request a New Group