<?xml version="1.0" encoding="UTF-8"?>
<rss xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:taxo="http://purl.org/rss/1.0/modules/taxonomy/" version="2.0">
  <channel>
    <title>topic Re: When is it time to change from ETL in notebooks to whl/py? in Get Started Discussions</title>
    <link>https://community.databricks.com/t5/get-started-discussions/when-is-it-time-to-change-from-etl-in-notebooks-to-whl-py/m-p/108480#M9212</link>
    <description>&lt;P&gt;Hey&amp;nbsp;&lt;a href="https://community.databricks.com/t5/user/viewprofilepage/user-id/146994"&gt;@Forssen&lt;/a&gt;&amp;nbsp;,&lt;BR /&gt;&lt;BR /&gt;&lt;/P&gt;&lt;P class=""&gt;My advice:&lt;/P&gt;&lt;P class=""&gt;Using &lt;SPAN class=""&gt;.py&lt;/SPAN&gt; files and &lt;SPAN class=""&gt;.whl&lt;/SPAN&gt; packages is generally more secure and scalable, especially when working in a team. One of the key advantages is that &lt;SPAN class=""&gt;&lt;STRONG&gt;code reviews and version control&lt;/STRONG&gt;&lt;/SPAN&gt; are much more efficient with &lt;SPAN class=""&gt;.py&lt;/SPAN&gt; files, as changes can be properly tracked via &lt;SPAN class=""&gt;&lt;STRONG&gt;pull requests&lt;/STRONG&gt;&lt;/SPAN&gt;.&lt;/P&gt;&lt;P class=""&gt;While notebooks can have permissions set for reading and version control, they are often harder to manage in collaborative environments. A common issue is that people forget to remove unnecessary &lt;SPAN class=""&gt;display()&lt;/SPAN&gt; statements or &lt;SPAN class=""&gt;collect()&lt;/SPAN&gt;, which makes reviewing and debugging easier in a notebook but is considered &lt;SPAN class=""&gt;&lt;STRONG&gt;bad practice&lt;/STRONG&gt;&lt;/SPAN&gt; in production. In addition, a single "," inserted in the notebook accidentally can make your production job fail.&lt;/P&gt;&lt;P class=""&gt;&lt;STRONG&gt;Advantages of .py and .whl over notebooks:&lt;/STRONG&gt;&lt;/P&gt;&lt;P class=""&gt;•&lt;SPAN class=""&gt;&lt;STRONG&gt;Better version control &amp;amp; code reviews&lt;/STRONG&gt;&lt;/SPAN&gt; (easier to track changes and enforce coding standards).&lt;BR /&gt;•&lt;SPAN class=""&gt;&lt;STRONG&gt;Better modularization &amp;amp; reusability&lt;/STRONG&gt;&lt;/SPAN&gt; (separating logic into reusable components).&lt;BR /&gt;•&lt;SPAN class=""&gt;&lt;STRONG&gt;Easier CI/CD integration&lt;/STRONG&gt;&lt;/SPAN&gt; (you can automate testing, packaging, and deployment).&lt;BR /&gt;•&lt;SPAN class=""&gt;&lt;STRONG&gt;More structured and maintainable codebase&lt;/STRONG&gt;&lt;/SPAN&gt; (better organization and scalability).&lt;/P&gt;&lt;P class=""&gt;&lt;STRONG&gt;Disadvantages:&lt;/STRONG&gt;&lt;/P&gt;&lt;P class=""&gt;•&lt;SPAN class=""&gt;&lt;STRONG&gt;Harder debugging compared to notebooks&lt;/STRONG&gt;&lt;/SPAN&gt; (notebooks allow quick testing and visualization).&lt;BR /&gt;•&lt;SPAN class=""&gt;&lt;STRONG&gt;Steeper learning curve for new users&lt;/STRONG&gt;&lt;/SPAN&gt; who are used to interactive workflows.&lt;/P&gt;&lt;P class=""&gt;Given your current setup, where you use notebooks only as &lt;SPAN class=""&gt;&lt;STRONG&gt;orchestrators&lt;/STRONG&gt;&lt;/SPAN&gt; and keep your logic in &lt;SPAN class=""&gt;.py&lt;/SPAN&gt; modules, you already have a &lt;SPAN class=""&gt;&lt;STRONG&gt;good balance&lt;/STRONG&gt;&lt;/SPAN&gt;. The next step could be &lt;SPAN class=""&gt;fully transitioning orchestration to workflows (like Airflow or Databricks Jobs) and packaging your code into .whl files&lt;/SPAN&gt; for better maintainability.&lt;BR /&gt;&lt;BR /&gt;&lt;span class="lia-unicode-emoji" title=":slightly_smiling_face:"&gt;🙂&lt;/span&gt;&lt;/P&gt;</description>
    <pubDate>Sun, 02 Feb 2025 23:17:06 GMT</pubDate>
    <dc:creator>Isi</dc:creator>
    <dc:date>2025-02-02T23:17:06Z</dc:date>
    <item>
      <title>When is it time to change from ETL in notebooks to whl/py?</title>
      <link>https://community.databricks.com/t5/get-started-discussions/when-is-it-time-to-change-from-etl-in-notebooks-to-whl-py/m-p/107910#M9210</link>
      <description>&lt;P&gt;Hi!&lt;BR /&gt;I would like some input/tips from the community regarding when is it time to go from a working solution in notebooks to something more "stable", like whl/py-files?&lt;/P&gt;&lt;P&gt;What are the pros/cons with notebooks compared to whl/py?&lt;/P&gt;&lt;P&gt;The way i structured things now is that i use notebooks as a orchestrator. The code is built as modules in py-files and just imported to the notebook. Everything needed for the etl to work is a config-file(yml or json), so nothing is hardcoded.&lt;/P&gt;&lt;P&gt;Thanks in advance &lt;span class="lia-unicode-emoji" title=":slightly_smiling_face:"&gt;🙂&lt;/span&gt;&lt;/P&gt;</description>
      <pubDate>Thu, 30 Jan 2025 18:44:09 GMT</pubDate>
      <guid>https://community.databricks.com/t5/get-started-discussions/when-is-it-time-to-change-from-etl-in-notebooks-to-whl-py/m-p/107910#M9210</guid>
      <dc:creator>Forssen</dc:creator>
      <dc:date>2025-01-30T18:44:09Z</dc:date>
    </item>
    <item>
      <title>Re: When is it time to change from ETL in notebooks to whl/py?</title>
      <link>https://community.databricks.com/t5/get-started-discussions/when-is-it-time-to-change-from-etl-in-notebooks-to-whl-py/m-p/108480#M9212</link>
      <description>&lt;P&gt;Hey&amp;nbsp;&lt;a href="https://community.databricks.com/t5/user/viewprofilepage/user-id/146994"&gt;@Forssen&lt;/a&gt;&amp;nbsp;,&lt;BR /&gt;&lt;BR /&gt;&lt;/P&gt;&lt;P class=""&gt;My advice:&lt;/P&gt;&lt;P class=""&gt;Using &lt;SPAN class=""&gt;.py&lt;/SPAN&gt; files and &lt;SPAN class=""&gt;.whl&lt;/SPAN&gt; packages is generally more secure and scalable, especially when working in a team. One of the key advantages is that &lt;SPAN class=""&gt;&lt;STRONG&gt;code reviews and version control&lt;/STRONG&gt;&lt;/SPAN&gt; are much more efficient with &lt;SPAN class=""&gt;.py&lt;/SPAN&gt; files, as changes can be properly tracked via &lt;SPAN class=""&gt;&lt;STRONG&gt;pull requests&lt;/STRONG&gt;&lt;/SPAN&gt;.&lt;/P&gt;&lt;P class=""&gt;While notebooks can have permissions set for reading and version control, they are often harder to manage in collaborative environments. A common issue is that people forget to remove unnecessary &lt;SPAN class=""&gt;display()&lt;/SPAN&gt; statements or &lt;SPAN class=""&gt;collect()&lt;/SPAN&gt;, which makes reviewing and debugging easier in a notebook but is considered &lt;SPAN class=""&gt;&lt;STRONG&gt;bad practice&lt;/STRONG&gt;&lt;/SPAN&gt; in production. In addition, a single "," inserted in the notebook accidentally can make your production job fail.&lt;/P&gt;&lt;P class=""&gt;&lt;STRONG&gt;Advantages of .py and .whl over notebooks:&lt;/STRONG&gt;&lt;/P&gt;&lt;P class=""&gt;•&lt;SPAN class=""&gt;&lt;STRONG&gt;Better version control &amp;amp; code reviews&lt;/STRONG&gt;&lt;/SPAN&gt; (easier to track changes and enforce coding standards).&lt;BR /&gt;•&lt;SPAN class=""&gt;&lt;STRONG&gt;Better modularization &amp;amp; reusability&lt;/STRONG&gt;&lt;/SPAN&gt; (separating logic into reusable components).&lt;BR /&gt;•&lt;SPAN class=""&gt;&lt;STRONG&gt;Easier CI/CD integration&lt;/STRONG&gt;&lt;/SPAN&gt; (you can automate testing, packaging, and deployment).&lt;BR /&gt;•&lt;SPAN class=""&gt;&lt;STRONG&gt;More structured and maintainable codebase&lt;/STRONG&gt;&lt;/SPAN&gt; (better organization and scalability).&lt;/P&gt;&lt;P class=""&gt;&lt;STRONG&gt;Disadvantages:&lt;/STRONG&gt;&lt;/P&gt;&lt;P class=""&gt;•&lt;SPAN class=""&gt;&lt;STRONG&gt;Harder debugging compared to notebooks&lt;/STRONG&gt;&lt;/SPAN&gt; (notebooks allow quick testing and visualization).&lt;BR /&gt;•&lt;SPAN class=""&gt;&lt;STRONG&gt;Steeper learning curve for new users&lt;/STRONG&gt;&lt;/SPAN&gt; who are used to interactive workflows.&lt;/P&gt;&lt;P class=""&gt;Given your current setup, where you use notebooks only as &lt;SPAN class=""&gt;&lt;STRONG&gt;orchestrators&lt;/STRONG&gt;&lt;/SPAN&gt; and keep your logic in &lt;SPAN class=""&gt;.py&lt;/SPAN&gt; modules, you already have a &lt;SPAN class=""&gt;&lt;STRONG&gt;good balance&lt;/STRONG&gt;&lt;/SPAN&gt;. The next step could be &lt;SPAN class=""&gt;fully transitioning orchestration to workflows (like Airflow or Databricks Jobs) and packaging your code into .whl files&lt;/SPAN&gt; for better maintainability.&lt;BR /&gt;&lt;BR /&gt;&lt;span class="lia-unicode-emoji" title=":slightly_smiling_face:"&gt;🙂&lt;/span&gt;&lt;/P&gt;</description>
      <pubDate>Sun, 02 Feb 2025 23:17:06 GMT</pubDate>
      <guid>https://community.databricks.com/t5/get-started-discussions/when-is-it-time-to-change-from-etl-in-notebooks-to-whl-py/m-p/108480#M9212</guid>
      <dc:creator>Isi</dc:creator>
      <dc:date>2025-02-02T23:17:06Z</dc:date>
    </item>
    <item>
      <title>Re: When is it time to change from ETL in notebooks to whl/py?</title>
      <link>https://community.databricks.com/t5/get-started-discussions/when-is-it-time-to-change-from-etl-in-notebooks-to-whl-py/m-p/112838#M9213</link>
      <description>&lt;P&gt;Hi!&lt;BR /&gt;Thanks for the reply and information!&lt;BR /&gt;I think i might keep some parts as notebooks, but only in workflows, since workflow variables cant be set any other way &lt;span class="lia-unicode-emoji" title=":confused_face:"&gt;😕&lt;/span&gt;&lt;/P&gt;</description>
      <pubDate>Mon, 17 Mar 2025 18:04:41 GMT</pubDate>
      <guid>https://community.databricks.com/t5/get-started-discussions/when-is-it-time-to-change-from-etl-in-notebooks-to-whl-py/m-p/112838#M9213</guid>
      <dc:creator>Forssen</dc:creator>
      <dc:date>2025-03-17T18:04:41Z</dc:date>
    </item>
  </channel>
</rss>

