<?xml version="1.0" encoding="UTF-8"?>
<rss xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:taxo="http://purl.org/rss/1.0/modules/taxonomy/" version="2.0">
  <channel>
    <title>topic Re: Best-practice structure for config.yaml, utils, and databricks.yaml in ML project (Databricks) in Data Engineering</title>
    <link>https://community.databricks.com/t5/data-engineering/best-practice-structure-for-config-yaml-utils-and-databricks/m-p/152102#M53762</link>
    <description>&lt;P&gt;Hey Anil,&lt;/P&gt;&lt;P&gt;Though I don’t have direct experience in ML, since this question is primarily architectural, here’s my perspective:&lt;/P&gt;&lt;P&gt;1. Keep separate configs for separate env:&lt;BR /&gt;- base.yaml (default) and dev/stage/prod.yaml (env)&lt;BR /&gt;2. Avoid putting random helper or business logic in utils, instead keep:&lt;BR /&gt;- logging.py, constants.py, validation.py&amp;nbsp;&lt;BR /&gt;3. Follow separate workspace/targets for multi-env, check the MLOps-&amp;nbsp;&lt;A href="https://docs.databricks.com/aws/en/machine-learning/mlops/mlops-stacks" target="_blank"&gt;https://docs.databricks.com/aws/en/machine-learning/mlops/mlops-stacks&lt;/A&gt;&lt;BR /&gt;4. Check this for folder structure -&amp;nbsp;&lt;A href="https://academiatoindustry.substack.com/p/why-your-ml-project-looks-like-a" target="_blank"&gt;https://academiatoindustry.substack.com/p/why-your-ml-project-looks-like-a&lt;/A&gt;&lt;BR /&gt;&lt;BR /&gt;Hope this helps, thanks.&lt;/P&gt;</description>
    <pubDate>Thu, 26 Mar 2026 07:58:23 GMT</pubDate>
    <dc:creator>Sumit_7</dc:creator>
    <dc:date>2026-03-26T07:58:23Z</dc:date>
    <item>
      <title>Best-practice structure for config.yaml, utils, and databricks.yaml in ML project (Databricks)</title>
      <link>https://community.databricks.com/t5/data-engineering/best-practice-structure-for-config-yaml-utils-and-databricks/m-p/152088#M53757</link>
      <description>&lt;P&gt;Hi everyone,&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;I’m working on an ML project in Databricks and want to design a clean, scalable, and production-ready project structure. I’d really appreciate guidance from those with real-world experience.&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;&lt;span class="lia-unicode-emoji" title=":small_blue_diamond:"&gt;🔹&lt;/span&gt; My Requirement&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;I need to organize my project with:&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;- A "config.yaml" file for managing parameters (paths, model configs, environment-specific settings, etc.)&lt;/P&gt;&lt;P&gt;- A "utils" module/package for reusable code (data loading, logging, validation, helpers)&lt;/P&gt;&lt;P&gt;- A "databricks.yaml" file (for asset bundles / deployment setup)&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;&lt;span class="lia-unicode-emoji" title=":small_blue_diamond:"&gt;🔹&lt;/span&gt; What I’m Looking For&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;I want to follow industry best practices for:&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;1. Structuring "config.yaml"&lt;/P&gt;&lt;P&gt;&amp;nbsp; &amp;nbsp;&lt;/P&gt;&lt;P&gt;&amp;nbsp; &amp;nbsp;- How do you separate dev/stage/prod configs?&lt;/P&gt;&lt;P&gt;&amp;nbsp; &amp;nbsp;- Do you recommend a single config or multiple layered configs?&lt;/P&gt;&lt;P&gt;&amp;nbsp; &amp;nbsp;- How do you handle secrets (avoid hardcoding)?&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;2. Designing the "utils" layer&lt;/P&gt;&lt;P&gt;&amp;nbsp; &amp;nbsp;&lt;/P&gt;&lt;P&gt;&amp;nbsp; &amp;nbsp;- What kind of functions/classes should go here vs elsewhere?&lt;/P&gt;&lt;P&gt;&amp;nbsp; &amp;nbsp;- How do you avoid making it a “dumping ground”?&lt;/P&gt;&lt;P&gt;&amp;nbsp; &amp;nbsp;- Any recommended folder structure?&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;3. Using "databricks.yaml"&lt;/P&gt;&lt;P&gt;&amp;nbsp; &amp;nbsp;&lt;/P&gt;&lt;P&gt;&amp;nbsp; &amp;nbsp;- How should I structure it for multi-environment deployments?&lt;/P&gt;&lt;P&gt;&amp;nbsp; &amp;nbsp;- Best way to integrate with CI/CD pipelines?&lt;/P&gt;&lt;P&gt;&amp;nbsp; &amp;nbsp;- How do you manage job definitions and parameters cleanly?&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;4. Overall project structure&lt;/P&gt;&lt;P&gt;&amp;nbsp; &amp;nbsp;&lt;/P&gt;&lt;P&gt;&amp;nbsp; &amp;nbsp;- Example folder structure for a production-grade ML project in Databricks&lt;/P&gt;&lt;P&gt;&amp;nbsp; &amp;nbsp;- How do you organize notebooks vs Python modules?&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;&lt;span class="lia-unicode-emoji" title=":small_blue_diamond:"&gt;🔹&lt;/span&gt; Context&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;- Using Databricks (Asset Bundles / Jobs)&lt;/P&gt;&lt;P&gt;- ML workflow (data preprocessing → training → evaluation → deployment)&lt;/P&gt;&lt;P&gt;- Looking for scalable, maintainable design (team collaboration friendly)&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;&lt;span class="lia-unicode-emoji" title=":small_blue_diamond:"&gt;🔹&lt;/span&gt; Bonus (if possible)&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;- Sample repo / GitHub reference&lt;/P&gt;&lt;P&gt;- Common mistakes to avoid&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;Thanks in advance! I’m especially interested in real-world patterns used in production, not just theoretical suggestions.&lt;/P&gt;</description>
      <pubDate>Thu, 26 Mar 2026 04:03:13 GMT</pubDate>
      <guid>https://community.databricks.com/t5/data-engineering/best-practice-structure-for-config-yaml-utils-and-databricks/m-p/152088#M53757</guid>
      <dc:creator>AnilKumarM</dc:creator>
      <dc:date>2026-03-26T04:03:13Z</dc:date>
    </item>
    <item>
      <title>Re: Best-practice structure for config.yaml, utils, and databricks.yaml in ML project (Databricks)</title>
      <link>https://community.databricks.com/t5/data-engineering/best-practice-structure-for-config-yaml-utils-and-databricks/m-p/152102#M53762</link>
      <description>&lt;P&gt;Hey Anil,&lt;/P&gt;&lt;P&gt;Though I don’t have direct experience in ML, since this question is primarily architectural, here’s my perspective:&lt;/P&gt;&lt;P&gt;1. Keep separate configs for separate env:&lt;BR /&gt;- base.yaml (default) and dev/stage/prod.yaml (env)&lt;BR /&gt;2. Avoid putting random helper or business logic in utils, instead keep:&lt;BR /&gt;- logging.py, constants.py, validation.py&amp;nbsp;&lt;BR /&gt;3. Follow separate workspace/targets for multi-env, check the MLOps-&amp;nbsp;&lt;A href="https://docs.databricks.com/aws/en/machine-learning/mlops/mlops-stacks" target="_blank"&gt;https://docs.databricks.com/aws/en/machine-learning/mlops/mlops-stacks&lt;/A&gt;&lt;BR /&gt;4. Check this for folder structure -&amp;nbsp;&lt;A href="https://academiatoindustry.substack.com/p/why-your-ml-project-looks-like-a" target="_blank"&gt;https://academiatoindustry.substack.com/p/why-your-ml-project-looks-like-a&lt;/A&gt;&lt;BR /&gt;&lt;BR /&gt;Hope this helps, thanks.&lt;/P&gt;</description>
      <pubDate>Thu, 26 Mar 2026 07:58:23 GMT</pubDate>
      <guid>https://community.databricks.com/t5/data-engineering/best-practice-structure-for-config-yaml-utils-and-databricks/m-p/152102#M53762</guid>
      <dc:creator>Sumit_7</dc:creator>
      <dc:date>2026-03-26T07:58:23Z</dc:date>
    </item>
    <item>
      <title>Re: Best-practice structure for config.yaml, utils, and databricks.yaml in ML project (Databricks)</title>
      <link>https://community.databricks.com/t5/data-engineering/best-practice-structure-for-config-yaml-utils-and-databricks/m-p/152294#M53809</link>
      <description>&lt;P&gt;IMO there is not such thing as a best practice as there are many possibilities.&lt;BR /&gt;One may work in one company but not in the other.&lt;BR /&gt;F.e. we are a small team and use a monorepo + 2 databricks workspaces with a shared UC metastore.&lt;BR /&gt;What he have built here is probably not the way to go for large teams or companies.&lt;/P&gt;&lt;P&gt;One thing that is very important to know beforehand is this:&lt;BR /&gt;in case you have multiple workspaces, do they share the same UC metastore or not?&lt;BR /&gt;If not, you have to make your code workspace-aware concerning table names (you will have a schema or catalog for dev, qa and prod).&lt;/P&gt;</description>
      <pubDate>Fri, 27 Mar 2026 11:01:24 GMT</pubDate>
      <guid>https://community.databricks.com/t5/data-engineering/best-practice-structure-for-config-yaml-utils-and-databricks/m-p/152294#M53809</guid>
      <dc:creator>-werners-</dc:creator>
      <dc:date>2026-03-27T11:01:24Z</dc:date>
    </item>
    <item>
      <title>Re: Best-practice structure for config.yaml, utils, and databricks.yaml in ML project (Databricks)</title>
      <link>https://community.databricks.com/t5/data-engineering/best-practice-structure-for-config-yaml-utils-and-databricks/m-p/152349#M53813</link>
      <description>&lt;P&gt;Hi&amp;nbsp;&lt;a href="https://community.databricks.com/t5/user/viewprofilepage/user-id/182169"&gt;@AnilKumarM&lt;/a&gt;,&lt;/P&gt;
&lt;P&gt;Agree with&amp;nbsp;&lt;a href="https://community.databricks.com/t5/user/viewprofilepage/user-id/14792"&gt;@-werners-&lt;/a&gt;&amp;nbsp;here. There isn’t a single 'one true' repo layout we mandate, but there are a few public references that show the patterns Databricks recommends.&lt;/P&gt;
&lt;P&gt;For bundles/databricks.yml + multi‑env, you may want to check the&amp;nbsp;&lt;A href="http://&amp;nbsp;https://docs.databricks.com/aws/en/dev-tools/bundles" target="_self"&gt;Declarative Automation Bundles&lt;/A&gt; (DABs)&amp;nbsp;for the&lt;SPAN&gt;&amp;nbsp;concept and YAML structure as a starting point. The reference provided by&amp;nbsp;&lt;a href="https://community.databricks.com/t5/user/viewprofilepage/user-id/175319"&gt;@Sumit_7&lt;/a&gt;&amp;nbsp;for &lt;A href="https://docs.databricks.com/aws/en/machine-learning/mlops/mlops-stacks" target="_self"&gt;MLOps Stacks&lt;/A&gt; is also very good. It is &lt;/SPAN&gt;&lt;SPAN&gt;an opinionated ML project template built on bundles, including repo layout, bundle config, and CI/CD. You can also look at &lt;A href="https://docs.databricks.com/aws/en/dev-tools/bundles/mlops-stacks" target="_self"&gt;this&lt;/A&gt; to understand how to scaffold a stack project.&lt;/SPAN&gt;&lt;/P&gt;
&lt;P&gt;Those docs + the &lt;SPAN aria-expanded="false" aria-haspopup="dialog" data-base-ui-click-trigger=""&gt;&lt;A href="https://github.com/databricks/mlops-stacks" rel="noreferrer" target="_blank"&gt;databricks/mlops-stacks&lt;/A&gt;&lt;/SPAN&gt; repo effectively give you a reference implementation for where databricks.yml lives (project root) and how to define targets/resources... how to separate ML code (src/... and notebooks) from resource YAML (resources/...) and how to structure env‑specific config inside the bundle rather than hard‑coding it.&lt;/P&gt;
&lt;P&gt;For CI/CD and repo structure more generally, try this &lt;A href="https://docs.databricks.com/aws/en/dev-tools/ci-cd/best-practices" target="_self"&gt;link&lt;/A&gt;. It gives some&amp;nbsp;patterns for "single repo with code + bundle config" vs. "separate repos", with concrete examples.&lt;/P&gt;
&lt;P&gt;For code vs.&amp;nbsp;utils vs. notebooks, this &lt;A href="https://docs.databricks.com/aws/en/notebooks/best-practices" target="_self"&gt;page&lt;/A&gt;&amp;nbsp;walks through putting notebooks in Git, extracting shared code into modules, and testing it.&lt;/P&gt;
&lt;P&gt;Taken together, these do not specify your exact config.yaml / utils layout, but they do illustrate the structures Databricks uses internally for production ML projects and how to connect that to databricks.yml and CI/CD.&lt;/P&gt;
&lt;P&gt;I hope this provides some guidance.&lt;/P&gt;
&lt;P class="p1"&gt;&lt;FONT size="2" color="#FF6600"&gt;&lt;STRONG&gt;&lt;I&gt;If this answer resolves your question, could you mark it as “Accept as Solution”? That helps other users quickly find the correct fix.&lt;/I&gt;&lt;/STRONG&gt;&lt;/FONT&gt;&lt;I&gt;&lt;/I&gt;&lt;/P&gt;</description>
      <pubDate>Fri, 27 Mar 2026 21:13:52 GMT</pubDate>
      <guid>https://community.databricks.com/t5/data-engineering/best-practice-structure-for-config-yaml-utils-and-databricks/m-p/152349#M53813</guid>
      <dc:creator>Ashwin_DSA</dc:creator>
      <dc:date>2026-03-27T21:13:52Z</dc:date>
    </item>
  </channel>
</rss>

