cancel
Showing results forย 
Search instead forย 
Did you mean:ย 
Data Engineering
Join discussions on data engineering best practices, architectures, and optimization strategies within the Databricks Community. Exchange insights and solutions with fellow data engineers.
cancel
Showing results forย 
Search instead forย 
Did you mean:ย 

Behavior of the Databricks Asset Bundle using Github Actions

Nmtc9to5
New Contributor II

Hi everyone, I am new in the Databricks Asset Bundles world, so I need to understand how the .databricks directory works

1. I know that it is created when the databricks bundle deploy command is executed, and which is a place where metadata and the current state of a project is saved.

2.When I deploy using GitHub Actions, a GitHub virtual machine is enabled. This virtual machine clones the repository and executes the databricks command, which creates the .databricks directory locally, which now contains the current state of the deployed project. However, when the GitHub workflow finishes, everything is removed from the virtual machine, including the .databricks directory.

3. So, when I commit some updates to the repository and then deploy them with GitHub Actions, since the .databricks directory doesn't exist because the state of the VM is new, are the bundles deployed again? Is that how it works?, how I can update only the bundles instead re-deploy all?

1 ACCEPTED SOLUTION

Accepted Solutions

Pat
Esteemed Contributor

Youโ€™re right that everything is ephemeral on the GitHub runner, but that does not mean โ€œfull redeploy from scratchโ€ every time in the workspace. The .databricks directory is local state + cache, and the real, durable state lives in the Databricks workspace (in the bundleโ€™s state_path).

What the .databricks directory actually is
On each databricks bundle deploy the CLI creates a .databricks/ folder next to your databricks.yml that holds things like:

  • Rendered bundle config (all variables, target overrides, includes resolved).
  • Local representation of what was last deployed (resource IDs, mapping between logical names and workspace objects).
  • Some caches for faster subsequent commands on the same machine.

That directory is only for the CLI process on that machine. It is not the authoritative โ€œtruthโ€ of your deployment.

Where the real state lives
In your databricks.yml, under workspace, you have a state_path (or it defaults under workspace.root_path). For example:

workspace:
  root_path: /Shared/bundles/my_project
  state_path: /Shared/bundles/my_project/.state

Databricks stores bundle state in the workspace under that path (jobs, pipelines, IDs, checksums etc.).
When you run databricks bundle deploy again (even from a fresh VM), the CLI:

  1. Reads your local bundle definition (databricks.yml + included files).

  2. Reads the previous state from the workspace state_path.

  3. Computes a diff and applies only what changed (create/update/delete resources incrementally).โ€‹

So the incremental behavior depends on the workspace state, not on the GitHub runnerโ€™s .databricks directory.

View solution in original post

1 REPLY 1

Pat
Esteemed Contributor

Youโ€™re right that everything is ephemeral on the GitHub runner, but that does not mean โ€œfull redeploy from scratchโ€ every time in the workspace. The .databricks directory is local state + cache, and the real, durable state lives in the Databricks workspace (in the bundleโ€™s state_path).

What the .databricks directory actually is
On each databricks bundle deploy the CLI creates a .databricks/ folder next to your databricks.yml that holds things like:

  • Rendered bundle config (all variables, target overrides, includes resolved).
  • Local representation of what was last deployed (resource IDs, mapping between logical names and workspace objects).
  • Some caches for faster subsequent commands on the same machine.

That directory is only for the CLI process on that machine. It is not the authoritative โ€œtruthโ€ of your deployment.

Where the real state lives
In your databricks.yml, under workspace, you have a state_path (or it defaults under workspace.root_path). For example:

workspace:
  root_path: /Shared/bundles/my_project
  state_path: /Shared/bundles/my_project/.state

Databricks stores bundle state in the workspace under that path (jobs, pipelines, IDs, checksums etc.).
When you run databricks bundle deploy again (even from a fresh VM), the CLI:

  1. Reads your local bundle definition (databricks.yml + included files).

  2. Reads the previous state from the workspace state_path.

  3. Computes a diff and applies only what changed (create/update/delete resources incrementally).โ€‹

So the incremental behavior depends on the workspace state, not on the GitHub runnerโ€™s .databricks directory.