<?xml version="1.0" encoding="UTF-8"?>
<rss xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:taxo="http://purl.org/rss/1.0/modules/taxonomy/" version="2.0">
  <channel>
    <title>topic Re: Best Practices: 1 job per 1 target table in Data Engineering</title>
    <link>https://community.databricks.com/t5/data-engineering/best-practices-1-job-per-1-target-table/m-p/156914#M54491</link>
    <description>&lt;P&gt;We typically organize our workloads with &lt;STRONG&gt;one job per catalog&lt;/STRONG&gt;, and then use &lt;STRONG&gt;one or more pipelines to load tables into the appropriate schemas&lt;/STRONG&gt;. As our data engineers ingest raw data, this structure is primarily applied in the &lt;STRONG&gt;Silver and Gold layers&lt;/STRONG&gt; of our architecture.&lt;/P&gt;&lt;P&gt;For example, when loading Salesforce data, we might structure it like this:&lt;/P&gt;&lt;UL&gt;&lt;LI&gt;&lt;P&gt;&lt;STRONG&gt;salesforce_silver (job)&lt;/STRONG&gt;&lt;/P&gt;&lt;UL&gt;&lt;LI&gt;&lt;P&gt;&lt;STRONG&gt;sales (schema)&lt;/STRONG&gt; → Pipeline&lt;/P&gt;&lt;UL&gt;&lt;LI&gt;&lt;P&gt;Sales-related tables (as needed within the schema)&lt;/P&gt;&lt;/LI&gt;&lt;/UL&gt;&lt;/LI&gt;&lt;LI&gt;&lt;P&gt;&lt;STRONG&gt;procurement (schema)&lt;/STRONG&gt; → Pipeline&lt;/P&gt;&lt;UL&gt;&lt;LI&gt;&lt;P&gt;Procurement-related tables (as needed within the schema)&lt;/P&gt;&lt;/LI&gt;&lt;/UL&gt;&lt;/LI&gt;&lt;/UL&gt;&lt;/LI&gt;&lt;/UL&gt;&lt;P&gt;This same job-and-pipeline pattern is carried into the Gold layer. However, the structure often evolves there, since Gold datasets may combine data across multiple catalogs and schemas.&lt;/P&gt;&lt;P&gt;Ultimately, &lt;STRONG&gt;your naming conventions and structure should reflect your specific design and use cases&lt;/STRONG&gt;.&lt;/P&gt;&lt;P&gt;Larissa&lt;/P&gt;</description>
    <pubDate>Thu, 14 May 2026 13:46:04 GMT</pubDate>
    <dc:creator>LBoydston</dc:creator>
    <dc:date>2026-05-14T13:46:04Z</dc:date>
    <item>
      <title>Best Practices: 1 job per 1 target table</title>
      <link>https://community.databricks.com/t5/data-engineering/best-practices-1-job-per-1-target-table/m-p/156905#M54489</link>
      <description>&lt;P&gt;&lt;FONT face="arial,helvetica,sans-serif" size="4"&gt;We’re currently designing our Medallion Architecture pipelines using Lakeflow Jobs, and I wanted to get some opinions on orchestration best practices.&lt;/FONT&gt;&lt;/P&gt;&lt;P&gt;&lt;FONT face="arial,helvetica,sans-serif" size="4"&gt;Right now, our approach is essentially 1 job per target table (for example, each Bronze/Silver/Gold table has its own dedicated Lakeflow job). The idea is to keep pipelines isolated, modular, and easier to troubleshoot.&lt;/FONT&gt;&lt;/P&gt;&lt;P&gt;&lt;FONT face="arial,helvetica,sans-serif" size="4"&gt;However, I’m wondering about the long-term tradeoffs:&lt;/FONT&gt;&lt;/P&gt;&lt;UL&gt;&lt;LI&gt;&lt;FONT face="arial,helvetica,sans-serif" size="4"&gt;Is this considered a good practice for scalability and maintainability?&lt;/FONT&gt;&lt;/LI&gt;&lt;LI&gt;&lt;FONT face="arial,helvetica,sans-serif" size="4"&gt;Could having a very large number of small jobs become inefficient in the future (job scheduling overhead, monitoring complexity, cost, etc.)?&lt;/FONT&gt;&lt;/LI&gt;&lt;LI&gt;&lt;FONT face="arial,helvetica,sans-serif" size="4"&gt;At what point does it make more sense to group multiple tables into a single workflow/job instead?&lt;/FONT&gt;&lt;/LI&gt;&lt;LI&gt;&lt;FONT face="arial,helvetica,sans-serif" size="4"&gt;How do teams usually balance modularity vs orchestration overhead in a Medallion Architecture setup?&lt;/FONT&gt;&lt;/LI&gt;&lt;/UL&gt;&lt;P&gt;&lt;FONT face="arial,helvetica,sans-serif" size="4"&gt;Would love to hear how others structure their pipelines in production environments, especially for Databricks/Lakeflow-based architectures.&lt;/FONT&gt;&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;</description>
      <pubDate>Thu, 14 May 2026 11:47:54 GMT</pubDate>
      <guid>https://community.databricks.com/t5/data-engineering/best-practices-1-job-per-1-target-table/m-p/156905#M54489</guid>
      <dc:creator>DazzaiDe</dc:creator>
      <dc:date>2026-05-14T11:47:54Z</dc:date>
    </item>
    <item>
      <title>Re: Best Practices: 1 job per 1 target table</title>
      <link>https://community.databricks.com/t5/data-engineering/best-practices-1-job-per-1-target-table/m-p/156909#M54490</link>
      <description>&lt;P&gt;I am assuming you are talking about the job to load bronze and silver tables .&amp;nbsp;Having one job per table seems like a bad idea since at scale you will most likely start hitting the limits of the workspace apart from operational overhead of &lt;EM&gt;maintaining/monitoring/deployment/compute wastage&lt;/EM&gt; so many jobs . Typically you would use a metadata table/yaml file to define the configuration and then group your tables into diffrent pipelines based on various factors like ( business domains/trigger/schedule/volume/velocity etc ) .&lt;/P&gt;&lt;P&gt;Gold tables would have their own pipelines if they have complex dependencies but bronze and silver should be pretty straightforward metadata driven pipelines/jobs&lt;/P&gt;</description>
      <pubDate>Thu, 14 May 2026 12:55:06 GMT</pubDate>
      <guid>https://community.databricks.com/t5/data-engineering/best-practices-1-job-per-1-target-table/m-p/156909#M54490</guid>
      <dc:creator>pradeep_singh</dc:creator>
      <dc:date>2026-05-14T12:55:06Z</dc:date>
    </item>
    <item>
      <title>Re: Best Practices: 1 job per 1 target table</title>
      <link>https://community.databricks.com/t5/data-engineering/best-practices-1-job-per-1-target-table/m-p/156914#M54491</link>
      <description>&lt;P&gt;We typically organize our workloads with &lt;STRONG&gt;one job per catalog&lt;/STRONG&gt;, and then use &lt;STRONG&gt;one or more pipelines to load tables into the appropriate schemas&lt;/STRONG&gt;. As our data engineers ingest raw data, this structure is primarily applied in the &lt;STRONG&gt;Silver and Gold layers&lt;/STRONG&gt; of our architecture.&lt;/P&gt;&lt;P&gt;For example, when loading Salesforce data, we might structure it like this:&lt;/P&gt;&lt;UL&gt;&lt;LI&gt;&lt;P&gt;&lt;STRONG&gt;salesforce_silver (job)&lt;/STRONG&gt;&lt;/P&gt;&lt;UL&gt;&lt;LI&gt;&lt;P&gt;&lt;STRONG&gt;sales (schema)&lt;/STRONG&gt; → Pipeline&lt;/P&gt;&lt;UL&gt;&lt;LI&gt;&lt;P&gt;Sales-related tables (as needed within the schema)&lt;/P&gt;&lt;/LI&gt;&lt;/UL&gt;&lt;/LI&gt;&lt;LI&gt;&lt;P&gt;&lt;STRONG&gt;procurement (schema)&lt;/STRONG&gt; → Pipeline&lt;/P&gt;&lt;UL&gt;&lt;LI&gt;&lt;P&gt;Procurement-related tables (as needed within the schema)&lt;/P&gt;&lt;/LI&gt;&lt;/UL&gt;&lt;/LI&gt;&lt;/UL&gt;&lt;/LI&gt;&lt;/UL&gt;&lt;P&gt;This same job-and-pipeline pattern is carried into the Gold layer. However, the structure often evolves there, since Gold datasets may combine data across multiple catalogs and schemas.&lt;/P&gt;&lt;P&gt;Ultimately, &lt;STRONG&gt;your naming conventions and structure should reflect your specific design and use cases&lt;/STRONG&gt;.&lt;/P&gt;&lt;P&gt;Larissa&lt;/P&gt;</description>
      <pubDate>Thu, 14 May 2026 13:46:04 GMT</pubDate>
      <guid>https://community.databricks.com/t5/data-engineering/best-practices-1-job-per-1-target-table/m-p/156914#M54491</guid>
      <dc:creator>LBoydston</dc:creator>
      <dc:date>2026-05-14T13:46:04Z</dc:date>
    </item>
  </channel>
</rss>

