<?xml version="1.0" encoding="UTF-8"?>
<rss xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:taxo="http://purl.org/rss/1.0/modules/taxonomy/" version="2.0">
  <channel>
    <title>topic Best practice on how to create a configuration yaml files for each workspace environment based? in Data Engineering</title>
    <link>https://community.databricks.com/t5/data-engineering/best-practice-on-how-to-create-a-configuration-yaml-files-for/m-p/109991#M43452</link>
    <description>&lt;P&gt;&lt;STRONG&gt;Hi Community,&lt;/STRONG&gt;&lt;/P&gt;&lt;P&gt;My team and I are working on refactoring our DAB repository, and we’re considering creating a configuration folder based on our environments—Dev, Staging, and Production workspaces.&lt;/P&gt;&lt;P&gt;What would be a common and best practice for structuring these configuration files by environment? For example, organizing settings for different cluster types, job configurations, and other environment-specific parameters.&lt;/P&gt;&lt;P&gt;Any suggestions or recommendations?&lt;/P&gt;</description>
    <pubDate>Wed, 12 Feb 2025 13:39:19 GMT</pubDate>
    <dc:creator>jeremy98</dc:creator>
    <dc:date>2025-02-12T13:39:19Z</dc:date>
    <item>
      <title>Best practice on how to create a configuration yaml files for each workspace environment based?</title>
      <link>https://community.databricks.com/t5/data-engineering/best-practice-on-how-to-create-a-configuration-yaml-files-for/m-p/109991#M43452</link>
      <description>&lt;P&gt;&lt;STRONG&gt;Hi Community,&lt;/STRONG&gt;&lt;/P&gt;&lt;P&gt;My team and I are working on refactoring our DAB repository, and we’re considering creating a configuration folder based on our environments—Dev, Staging, and Production workspaces.&lt;/P&gt;&lt;P&gt;What would be a common and best practice for structuring these configuration files by environment? For example, organizing settings for different cluster types, job configurations, and other environment-specific parameters.&lt;/P&gt;&lt;P&gt;Any suggestions or recommendations?&lt;/P&gt;</description>
      <pubDate>Wed, 12 Feb 2025 13:39:19 GMT</pubDate>
      <guid>https://community.databricks.com/t5/data-engineering/best-practice-on-how-to-create-a-configuration-yaml-files-for/m-p/109991#M43452</guid>
      <dc:creator>jeremy98</dc:creator>
      <dc:date>2025-02-12T13:39:19Z</dc:date>
    </item>
    <item>
      <title>Re: Best practice on how to create a configuration yaml files for each workspace environment based?</title>
      <link>https://community.databricks.com/t5/data-engineering/best-practice-on-how-to-create-a-configuration-yaml-files-for/m-p/112351#M44187</link>
      <description>&lt;P&gt;Heh, nobody answered in a month &lt;span class="lia-unicode-emoji" title=":grinning_face_with_smiling_eyes:"&gt;😄&lt;/span&gt; I have similar question. I see some guys store config data in SQL database, but it seems overcomplicated to me. I'm looking for better ways to do it, but having bunch of config files is questionable as well....&lt;/P&gt;</description>
      <pubDate>Wed, 12 Mar 2025 11:44:06 GMT</pubDate>
      <guid>https://community.databricks.com/t5/data-engineering/best-practice-on-how-to-create-a-configuration-yaml-files-for/m-p/112351#M44187</guid>
      <dc:creator>dmytro_starov</dc:creator>
      <dc:date>2025-03-12T11:44:06Z</dc:date>
    </item>
    <item>
      <title>Re: Best practice on how to create a configuration yaml files for each workspace environment based?</title>
      <link>https://community.databricks.com/t5/data-engineering/best-practice-on-how-to-create-a-configuration-yaml-files-for/m-p/112358#M44189</link>
      <description>&lt;P&gt;how about diff yml file per environment lives within repo for each dataset/workflow.?&lt;/P&gt;</description>
      <pubDate>Wed, 12 Mar 2025 12:29:38 GMT</pubDate>
      <guid>https://community.databricks.com/t5/data-engineering/best-practice-on-how-to-create-a-configuration-yaml-files-for/m-p/112358#M44189</guid>
      <dc:creator>saurabh18cs</dc:creator>
      <dc:date>2025-03-12T12:29:38Z</dc:date>
    </item>
    <item>
      <title>Re: Best practice on how to create a configuration yaml files for each workspace environment based?</title>
      <link>https://community.databricks.com/t5/data-engineering/best-practice-on-how-to-create-a-configuration-yaml-files-for/m-p/112431#M44206</link>
      <description>&lt;P&gt;Hi&amp;nbsp;&lt;a href="https://community.databricks.com/t5/user/viewprofilepage/user-id/133094"&gt;@jeremy98&lt;/a&gt;&amp;nbsp;and all,&lt;/P&gt;
&lt;P&gt;I agree with &lt;a href="https://community.databricks.com/t5/user/viewprofilepage/user-id/22314"&gt;@saurabh18cs&lt;/a&gt;&amp;nbsp;. Having configuration files for each deployment target is a very convenient and manageable solution. Since I couldn't find a plain example showing the project structure, I created one here.&amp;nbsp;&lt;A href="https://github.com/koji-kawamura-db/dab_targets_sample" target="_blank"&gt;https://github.com/koji-kawamura-db/dab_targets_sample&lt;/A&gt;&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;The databricks.yml file can be straightforward. It just has the "include" setting. The base job configurations are defined in the resources/dab_targets_sample.job.yml. The targets dir contains dev.yml and prod.yml files that override task and cluster configurations.&lt;/P&gt;
&lt;PRE&gt;include:&lt;BR /&gt;- resources/*.yml&lt;BR /&gt;- targets/*.yml&amp;nbsp;&amp;nbsp;&lt;/PRE&gt;
&lt;P&gt;Job configuration in the dev target:&lt;/P&gt;
&lt;P&gt;&lt;span class="lia-inline-image-display-wrapper lia-image-align-inline" image-alt="koji_kawamura_0-1741835489656.png" style="width: 400px;"&gt;&lt;img src="https://community.databricks.com/t5/image/serverpage/image-id/15372i246D05A450ACC815/image-size/medium?v=v2&amp;amp;px=400" role="button" title="koji_kawamura_0-1741835489656.png" alt="koji_kawamura_0-1741835489656.png" /&gt;&lt;/span&gt;&lt;/P&gt;
&lt;P&gt;In the prod target:&lt;/P&gt;
&lt;P&gt;&lt;span class="lia-inline-image-display-wrapper lia-image-align-inline" image-alt="koji_kawamura_1-1741835706607.png" style="width: 400px;"&gt;&lt;img src="https://community.databricks.com/t5/image/serverpage/image-id/15373i750B338FEB47417C/image-size/medium?v=v2&amp;amp;px=400" role="button" title="koji_kawamura_1-1741835706607.png" alt="koji_kawamura_1-1741835706607.png" /&gt;&lt;/span&gt;&lt;/P&gt;
&lt;P&gt;I hope this helps!&lt;/P&gt;</description>
      <pubDate>Thu, 13 Mar 2025 03:26:16 GMT</pubDate>
      <guid>https://community.databricks.com/t5/data-engineering/best-practice-on-how-to-create-a-configuration-yaml-files-for/m-p/112431#M44206</guid>
      <dc:creator>koji_kawamura</dc:creator>
      <dc:date>2025-03-13T03:26:16Z</dc:date>
    </item>
    <item>
      <title>Re: Best practice on how to create a configuration yaml files for each workspace environment based?</title>
      <link>https://community.databricks.com/t5/data-engineering/best-practice-on-how-to-create-a-configuration-yaml-files-for/m-p/112450#M44213</link>
      <description>&lt;P&gt;Hi&amp;nbsp;&lt;a href="https://community.databricks.com/t5/user/viewprofilepage/user-id/90461"&gt;@koji_kawamura&lt;/a&gt;&amp;nbsp; ,&lt;/P&gt;&lt;P&gt;Seems like you're talking about clusters and working environment configuration. But I guess&amp;nbsp;@jeremy98&amp;nbsp;asks for job-related configs, like tables/columns names for spark queries, paths to source/target files, names of column which will be used for partitioning of resulting delta table etc.&amp;nbsp;&lt;/P&gt;</description>
      <pubDate>Thu, 13 Mar 2025 08:48:03 GMT</pubDate>
      <guid>https://community.databricks.com/t5/data-engineering/best-practice-on-how-to-create-a-configuration-yaml-files-for/m-p/112450#M44213</guid>
      <dc:creator>dmytro_starov</dc:creator>
      <dc:date>2025-03-13T08:48:03Z</dc:date>
    </item>
    <item>
      <title>Re: Best practice on how to create a configuration yaml files for each workspace environment based?</title>
      <link>https://community.databricks.com/t5/data-engineering/best-practice-on-how-to-create-a-configuration-yaml-files-for/m-p/112458#M44217</link>
      <description>&lt;P&gt;Hi dmy,&lt;BR /&gt;Yes more general not only cluster configurations! But, we have created a custom example where setting this and it's working fine :). Btw, thanks Koji! Thanks all &lt;span class="lia-unicode-emoji" title=":slightly_smiling_face:"&gt;🙂&lt;/span&gt;&lt;/P&gt;</description>
      <pubDate>Thu, 13 Mar 2025 09:30:29 GMT</pubDate>
      <guid>https://community.databricks.com/t5/data-engineering/best-practice-on-how-to-create-a-configuration-yaml-files-for/m-p/112458#M44217</guid>
      <dc:creator>jeremy98</dc:creator>
      <dc:date>2025-03-13T09:30:29Z</dc:date>
    </item>
    <item>
      <title>Re: Best practice on how to create a configuration yaml files for each workspace environment based?</title>
      <link>https://community.databricks.com/t5/data-engineering/best-practice-on-how-to-create-a-configuration-yaml-files-for/m-p/112459#M44218</link>
      <description>&lt;P&gt;Hi&amp;nbsp;&lt;a href="https://community.databricks.com/t5/user/viewprofilepage/user-id/149083"&gt;@dmytro_starov&lt;/a&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;&lt;a href="https://community.databricks.com/t5/user/viewprofilepage/user-id/133094"&gt;@jeremy98&lt;/a&gt;&amp;nbsp;asked "&lt;SPAN&gt;organizing settings for different cluster types, job configurations, and other environment-specific parameters" so I provided the example showing how to change cluster configurations (I changed the number of worker nodes) and also task parameters based on environment ("dev" and "prod").&lt;/SPAN&gt;&lt;/P&gt;
&lt;P&gt;&lt;SPAN&gt;The job or &lt;A href="https://docs.databricks.com/aws/en/jobs/task-parameters" target="_blank"&gt;task parameters&lt;/A&gt; can be used to specify environment-specific table/column names or source/target file paths if needed. I updated the example project to illustrate how to utilize these env-specific parameters from the notebook executed by the job. Please see the screenshot below as a quick reference.&lt;/SPAN&gt;&lt;/P&gt;
&lt;P&gt;&lt;span class="lia-inline-image-display-wrapper lia-image-align-inline" image-alt="koji_kawamura_0-1741858144835.png" style="width: 400px;"&gt;&lt;img src="https://community.databricks.com/t5/image/serverpage/image-id/15376iF19C013AB7AE1CB0/image-size/medium?v=v2&amp;amp;px=400" role="button" title="koji_kawamura_0-1741858144835.png" alt="koji_kawamura_0-1741858144835.png" /&gt;&lt;/span&gt;&lt;/P&gt;
&lt;P&gt;&lt;SPAN&gt;In order to manage data organization structures across multiple environments, we can also (or I should say we should) utilize &lt;A href="https://docs.databricks.com/aws/en/data-governance/unity-catalog/best-practices" target="_blank"&gt;Unity Catalog&lt;/A&gt;. Separating catalogs to dev/staging/prod and changing such catalog names based on execution target env is common.&lt;/SPAN&gt;&lt;/P&gt;
&lt;P&gt;&lt;SPAN&gt;I hope the example covers the original&amp;nbsp;&lt;a href="https://community.databricks.com/t5/user/viewprofilepage/user-id/133094"&gt;@jeremy98&lt;/a&gt;&amp;nbsp;question, and also potentially cover what&amp;nbsp;&lt;a href="https://community.databricks.com/t5/user/viewprofilepage/user-id/149083"&gt;@dmytro_starov&lt;/a&gt;&amp;nbsp;needs. If not please elaborate. Thanks!&lt;/SPAN&gt;&lt;/P&gt;</description>
      <pubDate>Thu, 13 Mar 2025 09:32:50 GMT</pubDate>
      <guid>https://community.databricks.com/t5/data-engineering/best-practice-on-how-to-create-a-configuration-yaml-files-for/m-p/112459#M44218</guid>
      <dc:creator>koji_kawamura</dc:creator>
      <dc:date>2025-03-13T09:32:50Z</dc:date>
    </item>
  </channel>
</rss>

