Weโre currently designing our Medallion Architecture pipelines using Lakeflow Jobs, and I wanted to get some opinions on orchestration best practices.
Right now, our approach is essentially 1 job per target table (for example, each Bronze/Silver/Gold table has its own dedicated Lakeflow job). The idea is to keep pipelines isolated, modular, and easier to troubleshoot.
However, Iโm wondering about the long-term tradeoffs:
- Is this considered a good practice for scalability and maintainability?
- Could having a very large number of small jobs become inefficient in the future (job scheduling overhead, monitoring complexity, cost, etc.)?
- At what point does it make more sense to group multiple tables into a single workflow/job instead?
- How do teams usually balance modularity vs orchestration overhead in a Medallion Architecture setup?
Would love to hear how others structure their pipelines in production environments, especially for Databricks/Lakeflow-based architectures.