Modularization of Databricks Workflows
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
08-22-2023 10:58 AM
Given the size of a Workflow may become too big to manage in a single Terraform project, what would be your recommendation as a best practice to manage and deploy the workflows via code to maintainer a predictable result between environments?
Would it be best to break the Workflow in multiple ones with references between each other or is there a better best practice option?
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
08-23-2023 12:20 PM
Hello @Retired_mod,
Thank you for your prompt response. I truly appreciate it.
Regarding Terraform matters, I believe your insights adequately address the best practices for effectively handling extensive projects through reusable code.
However, my uncertainty lingers concerning Databricks Workflows. Despite the inclusion of reusable code via Terraform Modules or Workspaces, the prospect of managing an overwhelming number of job/task definitions within a single project as the Databricks Workflow expands to encompass 100, 200, or even more jobs raises concern.
Would you be inclined to suggest the division of the Databricks Workflow into multiple workflows right from the outset (assuming a growth beyond 100 jobs), or is it considered advisable to adhere to the practice of maintaining a Workflow housing hundreds of jobs?
In the context of Terraform, various strategies exist for segmenting definitions to enhance code manageability. However, my central focus is on the ease of management for the developer shaping the workflow. Personally, the concept of overseeing hundreds of jobs within a singular Workflow strikes me as a potentially intricate undertaking.
One alternative that comes to mind is establishing a "Main Workflow" responsible for orchestrating other individual workflows. This approach appears more wieldy and could also provide an elegant avenue within the Terraform code, whereby distinct projects correspond to separate workflows. Leveraging "data," these projects could then cross-reference workflows established by disparate Terraform endeavors.
Is there a perspective I'm perhaps overlooking? Your guidance would be greatly appreciated.

