<?xml version="1.0" encoding="UTF-8"?>
<rss xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:taxo="http://purl.org/rss/1.0/modules/taxonomy/" version="2.0">
  <channel>
    <title>topic Re: Databricks Asset Bundles and MLOps Structure for different model training -1 model per DAB or 1 in Data Engineering</title>
    <link>https://community.databricks.com/t5/data-engineering/databricks-asset-bundles-and-mlops-structure-for-different-model/m-p/97152#M39436</link>
    <description>&lt;P&gt;Hi&amp;nbsp;&lt;a href="https://community.databricks.com/t5/user/viewprofilepage/user-id/128380"&gt;@mlopsuser&lt;/a&gt;&amp;nbsp;,&lt;/P&gt;
&lt;P class="_1t7bu9h1 paragraph"&gt;For organizing Databricks Asset Bundles (DABs) in your scenario with two separate regression models and datasets, it is generally recommended to create one DAB per model and dataset. This approach aligns with best practices for modularity and maintainability, allowing each model and its associated preprocessing steps to be managed independently. Here are some detailed steps and considerations:&lt;/P&gt;
&lt;OL&gt;
&lt;LI&gt;
&lt;P class="_1t7bu9h1 paragraph"&gt;&lt;STRONG&gt;Create Separate DABs&lt;/STRONG&gt;:&lt;/P&gt;
&lt;UL class="_1t7bu9h7 _1t7bu9h2"&gt;
&lt;LI&gt;&lt;STRONG&gt;Modularity&lt;/STRONG&gt;: By creating separate DABs for each model and dataset, you ensure that changes in one model or dataset do not inadvertently affect the other. This modular approach simplifies debugging and enhances the clarity of your project structure.&lt;/LI&gt;
&lt;LI&gt;&lt;STRONG&gt;Scalability&lt;/STRONG&gt;: Independent DABs make it easier to scale and manage each model's lifecycle, including training, evaluation, and deployment.&lt;/LI&gt;
&lt;/UL&gt;
&lt;/LI&gt;
&lt;LI&gt;
&lt;P class="_1t7bu9h1 paragraph"&gt;&lt;STRONG&gt;Structuring the MLOps Pipeline&lt;/STRONG&gt;:&lt;/P&gt;
&lt;UL class="_1t7bu9h7 _1t7bu9h2"&gt;
&lt;LI&gt;&lt;STRONG&gt;Model Registry&lt;/STRONG&gt;: Use MLflow to register each model independently. This allows you to track versions, manage metadata, and monitor performance metrics for each model separately.&lt;/LI&gt;
&lt;LI&gt;&lt;STRONG&gt;Deployment&lt;/STRONG&gt;: Deploy each model using its respective DAB. This ensures that the deployment process is isolated and can be tailored to the specific requirements of each model.&lt;/LI&gt;
&lt;LI&gt;&lt;STRONG&gt;Monitoring&lt;/STRONG&gt;: Set up monitoring for each model independently. This includes tracking performance metrics, data drift, and other relevant indicators to ensure each model remains performant over time.&lt;/LI&gt;
&lt;/UL&gt;
&lt;/LI&gt;
&lt;LI&gt;
&lt;P class="_1t7bu9h1 paragraph"&gt;&lt;STRONG&gt;Monorepo Considerations&lt;/STRONG&gt;:&lt;/P&gt;
&lt;UL class="_1t7bu9h7 _1t7bu9h2"&gt;
&lt;LI&gt;&lt;STRONG&gt;Directory Structure&lt;/STRONG&gt;: Organize your monorepo with clear directory structures for each DAB. For example: &lt;CODE&gt;
/monorepo
├── model1
│   ├── databricks.yml
│   ├── src/
│   ├── tests/
├── model2
│   ├── databricks.yml
│   ├── src/
│   ├── tests/
&lt;/CODE&gt;&lt;/LI&gt;
&lt;LI&gt;&lt;STRONG&gt;CI/CD Integration&lt;/STRONG&gt;: Implement CI/CD pipelines that can handle multiple DABs. Ensure that each pipeline is capable of independently validating, testing, and deploying the respective DAB.&lt;/LI&gt;
&lt;/UL&gt;
&lt;/LI&gt;
&lt;LI&gt;
&lt;P class="_1t7bu9h1 paragraph"&gt;&lt;STRONG&gt;Best Practices&lt;/STRONG&gt;:&lt;/P&gt;
&lt;UL class="_1t7bu9h7 _1t7bu9h2"&gt;
&lt;LI&gt;&lt;STRONG&gt;Version Control&lt;/STRONG&gt;: Use version control to manage changes to each DAB. This includes tracking changes to preprocessing steps, model training code, and deployment configurations.&lt;/LI&gt;
&lt;LI&gt;&lt;STRONG&gt;Documentation&lt;/STRONG&gt;: Maintain comprehensive documentation for each DAB, detailing the preprocessing steps, model architecture, and deployment process. This aids in collaboration and future maintenance.&lt;/LI&gt;
&lt;/UL&gt;
&lt;/LI&gt;
&lt;/OL&gt;
&lt;P&gt;Thanks!&lt;/P&gt;</description>
    <pubDate>Fri, 01 Nov 2024 05:22:55 GMT</pubDate>
    <dc:creator>NandiniN</dc:creator>
    <dc:date>2024-11-01T05:22:55Z</dc:date>
    <item>
      <title>Databricks Asset Bundles and MLOps Structure for different model training -1 model per DAB or 1 DAB</title>
      <link>https://community.databricks.com/t5/data-engineering/databricks-asset-bundles-and-mlops-structure-for-different-model/m-p/95328#M39082</link>
      <description>&lt;P&gt;I have two different datasets that will be used to train two separate regression models Each dataset has its own preprocessing steps, and the models will have independent training pipelines.&lt;/P&gt;&lt;P&gt;What best practice approach for organizing Databricks Asset Bundles (DABs) in this scenario? Specifically, I’m wondering whether it’s better to create one DAB per model and dataset or to combine everything into a single DAB for simplicity.&lt;/P&gt;&lt;P&gt;Additionally, any insights on structuring the MLOps pipeline for model registry, deployment, and monitoring in such a setup would be greatly appreciated.&lt;/P&gt;&lt;P&gt;DAB will be on a monorepo for new use case&lt;/P&gt;</description>
      <pubDate>Mon, 21 Oct 2024 16:33:47 GMT</pubDate>
      <guid>https://community.databricks.com/t5/data-engineering/databricks-asset-bundles-and-mlops-structure-for-different-model/m-p/95328#M39082</guid>
      <dc:creator>mlopsuser</dc:creator>
      <dc:date>2024-10-21T16:33:47Z</dc:date>
    </item>
    <item>
      <title>Re: Databricks Asset Bundles and MLOps Structure for different model training -1 model per DAB or 1</title>
      <link>https://community.databricks.com/t5/data-engineering/databricks-asset-bundles-and-mlops-structure-for-different-model/m-p/97152#M39436</link>
      <description>&lt;P&gt;Hi&amp;nbsp;&lt;a href="https://community.databricks.com/t5/user/viewprofilepage/user-id/128380"&gt;@mlopsuser&lt;/a&gt;&amp;nbsp;,&lt;/P&gt;
&lt;P class="_1t7bu9h1 paragraph"&gt;For organizing Databricks Asset Bundles (DABs) in your scenario with two separate regression models and datasets, it is generally recommended to create one DAB per model and dataset. This approach aligns with best practices for modularity and maintainability, allowing each model and its associated preprocessing steps to be managed independently. Here are some detailed steps and considerations:&lt;/P&gt;
&lt;OL&gt;
&lt;LI&gt;
&lt;P class="_1t7bu9h1 paragraph"&gt;&lt;STRONG&gt;Create Separate DABs&lt;/STRONG&gt;:&lt;/P&gt;
&lt;UL class="_1t7bu9h7 _1t7bu9h2"&gt;
&lt;LI&gt;&lt;STRONG&gt;Modularity&lt;/STRONG&gt;: By creating separate DABs for each model and dataset, you ensure that changes in one model or dataset do not inadvertently affect the other. This modular approach simplifies debugging and enhances the clarity of your project structure.&lt;/LI&gt;
&lt;LI&gt;&lt;STRONG&gt;Scalability&lt;/STRONG&gt;: Independent DABs make it easier to scale and manage each model's lifecycle, including training, evaluation, and deployment.&lt;/LI&gt;
&lt;/UL&gt;
&lt;/LI&gt;
&lt;LI&gt;
&lt;P class="_1t7bu9h1 paragraph"&gt;&lt;STRONG&gt;Structuring the MLOps Pipeline&lt;/STRONG&gt;:&lt;/P&gt;
&lt;UL class="_1t7bu9h7 _1t7bu9h2"&gt;
&lt;LI&gt;&lt;STRONG&gt;Model Registry&lt;/STRONG&gt;: Use MLflow to register each model independently. This allows you to track versions, manage metadata, and monitor performance metrics for each model separately.&lt;/LI&gt;
&lt;LI&gt;&lt;STRONG&gt;Deployment&lt;/STRONG&gt;: Deploy each model using its respective DAB. This ensures that the deployment process is isolated and can be tailored to the specific requirements of each model.&lt;/LI&gt;
&lt;LI&gt;&lt;STRONG&gt;Monitoring&lt;/STRONG&gt;: Set up monitoring for each model independently. This includes tracking performance metrics, data drift, and other relevant indicators to ensure each model remains performant over time.&lt;/LI&gt;
&lt;/UL&gt;
&lt;/LI&gt;
&lt;LI&gt;
&lt;P class="_1t7bu9h1 paragraph"&gt;&lt;STRONG&gt;Monorepo Considerations&lt;/STRONG&gt;:&lt;/P&gt;
&lt;UL class="_1t7bu9h7 _1t7bu9h2"&gt;
&lt;LI&gt;&lt;STRONG&gt;Directory Structure&lt;/STRONG&gt;: Organize your monorepo with clear directory structures for each DAB. For example: &lt;CODE&gt;
/monorepo
├── model1
│   ├── databricks.yml
│   ├── src/
│   ├── tests/
├── model2
│   ├── databricks.yml
│   ├── src/
│   ├── tests/
&lt;/CODE&gt;&lt;/LI&gt;
&lt;LI&gt;&lt;STRONG&gt;CI/CD Integration&lt;/STRONG&gt;: Implement CI/CD pipelines that can handle multiple DABs. Ensure that each pipeline is capable of independently validating, testing, and deploying the respective DAB.&lt;/LI&gt;
&lt;/UL&gt;
&lt;/LI&gt;
&lt;LI&gt;
&lt;P class="_1t7bu9h1 paragraph"&gt;&lt;STRONG&gt;Best Practices&lt;/STRONG&gt;:&lt;/P&gt;
&lt;UL class="_1t7bu9h7 _1t7bu9h2"&gt;
&lt;LI&gt;&lt;STRONG&gt;Version Control&lt;/STRONG&gt;: Use version control to manage changes to each DAB. This includes tracking changes to preprocessing steps, model training code, and deployment configurations.&lt;/LI&gt;
&lt;LI&gt;&lt;STRONG&gt;Documentation&lt;/STRONG&gt;: Maintain comprehensive documentation for each DAB, detailing the preprocessing steps, model architecture, and deployment process. This aids in collaboration and future maintenance.&lt;/LI&gt;
&lt;/UL&gt;
&lt;/LI&gt;
&lt;/OL&gt;
&lt;P&gt;Thanks!&lt;/P&gt;</description>
      <pubDate>Fri, 01 Nov 2024 05:22:55 GMT</pubDate>
      <guid>https://community.databricks.com/t5/data-engineering/databricks-asset-bundles-and-mlops-structure-for-different-model/m-p/97152#M39436</guid>
      <dc:creator>NandiniN</dc:creator>
      <dc:date>2024-11-01T05:22:55Z</dc:date>
    </item>
  </channel>
</rss>

