cancel
Showing results forย 
Search instead forย 
Did you mean:ย 
Administration & Architecture
Explore discussions on Databricks administration, deployment strategies, and architectural best practices. Connect with administrators and architects to optimize your Databricks environment for performance, scalability, and security.
cancel
Showing results forย 
Search instead forย 
Did you mean:ย 

Modularization of Databricks Workflows

jaznarro
New Contributor II

Given the size of a Workflow may become too big to manage in a single Terraform project, what would be your recommendation as a best practice to manage and deploy the workflows via code to maintainer a predictable result between environments?

Would it be best to break the Workflow in multiple ones with references between each other or is there a better best practice option?

1 REPLY 1

Hello @Retired_mod,

Thank you for your prompt response. I truly appreciate it.

Regarding Terraform matters, I believe your insights adequately address the best practices for effectively handling extensive projects through reusable code.

However, my uncertainty lingers concerning Databricks Workflows. Despite the inclusion of reusable code via Terraform Modules or Workspaces, the prospect of managing an overwhelming number of job/task definitions within a single project as the Databricks Workflow expands to encompass 100, 200, or even more jobs raises concern.

Would you be inclined to suggest the division of the Databricks Workflow into multiple workflows right from the outset (assuming a growth beyond 100 jobs), or is it considered advisable to adhere to the practice of maintaining a Workflow housing hundreds of jobs?

In the context of Terraform, various strategies exist for segmenting definitions to enhance code manageability. However, my central focus is on the ease of management for the developer shaping the workflow. Personally, the concept of overseeing hundreds of jobs within a singular Workflow strikes me as a potentially intricate undertaking.

One alternative that comes to mind is establishing a "Main Workflow" responsible for orchestrating other individual workflows. This approach appears more wieldy and could also provide an elegant avenue within the Terraform code, whereby distinct projects correspond to separate workflows. Leveraging "data," these projects could then cross-reference workflows established by disparate Terraform endeavors.

Is there a perspective I'm perhaps overlooking? Your guidance would be greatly appreciated.

Connect with Databricks Users in Your Area

Join a Regional User Group to connect with local Databricks users. Events will be happening in your city, and you wonโ€™t want to miss the chance to attend and share knowledge.

If there isnโ€™t a group near you, start one and help create a community that brings people together.

Request a New Group