cancel
Showing results forย 
Search instead forย 
Did you mean:ย 
Administration & Architecture
Explore discussions on Databricks administration, deployment strategies, and architectural best practices. Connect with administrators and architects to optimize your Databricks environment for performance, scalability, and security.
cancel
Showing results forย 
Search instead forย 
Did you mean:ย 

Asset bundle vs terraform

ismaelhenzel
Contributor II

I would like to understand the differences between Terraform and Asset Bundles, especially since in some cases, they can do the same thing. Iโ€™m not talking about provisioning storage, networking, or the Databricks workspace itselfโ€”I know that is Terraformโ€™s role. However, since Bundles started supporting functions like creating all-purpose compute, secrets, databases, schemas, and apps, these are all things Terraform can do as well. Iโ€™ve seen Asset Bundles excel at creating workflows, but now Iโ€™m unsure when to use Bundles and when to use Terraform, given that they share similar capabilities. Are there best practices? For example: should pipelines be handled by Bundles, while clusters, secrets, and infrastructure pieces are left to Terraform? Should I use them together? I would love to hear some opinions.

1 ACCEPTED SOLUTION

Accepted Solutions

You can have a different repository with Databricks CLI scripts and/or Terraform IaC code to specific task such as assigning permissions, etc. that you do not want to share with developers. In the end you can access both of them in CI/CD pipelines to run those scripts in the order you need. So, you should include in DAB everything supported that meets your security (or other) requirements and in the other repo, those privileged scripts needed to apply along with DAB. Take into account that, in the end, DAB is only a subset of commands performed via CLI. You can manage via tasks in CI/CD.


Lifelong Learner Cloud & Data Solution Architect | https://www.youtube.com/@CafeConData

View solution in original post

4 REPLIES 4

Coffee77
Contributor III

First, DAB uses terraform in the background. Having said that, my recommendation is to use DAB for whatever component already included and only other tools for IaC not supported yet or non-databricks specific (private VNets, external storages, etc.) This is what I'm using in real-life applications with Databricks.


Lifelong Learner Cloud & Data Solution Architect | https://www.youtube.com/@CafeConData

Thanks for answering!

I really like this approach, but how do I manage when I can only develop in the repository, and when I can make changes in the bundle? Let's suppose that I'm creating a schema and granting permissions with the bundle. As the bundle is kept in the same repository as the code, the developers can see and change the file. This could be blocked with some code review and no privileges to merge the code in branches that run the CI/CD with a service principal, but it seems that it becomes easier to bypass the security than having a separate Terraform repository, with only people that have admin rights on the platform and know Terraform. On the other hand, this will make the development more bureaucratic. What do you think?

You can have a different repository with Databricks CLI scripts and/or Terraform IaC code to specific task such as assigning permissions, etc. that you do not want to share with developers. In the end you can access both of them in CI/CD pipelines to run those scripts in the order you need. So, you should include in DAB everything supported that meets your security (or other) requirements and in the other repo, those privileged scripts needed to apply along with DAB. Take into account that, in the end, DAB is only a subset of commands performed via CLI. You can manage via tasks in CI/CD.


Lifelong Learner Cloud & Data Solution Architect | https://www.youtube.com/@CafeConData

That is a very clear explanation; I understand it well now. Bundles seem better for simplicity and a repo-oriented workflow, making it easy to manage multiple repos where everyone creates their own Databricks resources. However, when I need stricter security, it makes sense to use a separate repository to deploy those sensitive resources. In that case, I think Terraform works very well, especially if the foundational infrastructure (like networks, buckets, etc.) was already created with it.

I appreciate your time ๐Ÿ˜„