<?xml version="1.0" encoding="UTF-8"?>
<rss xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:taxo="http://purl.org/rss/1.0/modules/taxonomy/" version="2.0">
  <channel>
    <title>topic Dear experts, need urgent help on logic. in Data Engineering</title>
    <link>https://community.databricks.com/t5/data-engineering/dear-experts-need-urgent-help-on-logic/m-p/110496#M43593</link>
    <description>&lt;P&gt;&lt;STRONG&gt;Dear experts,&lt;/STRONG&gt;&lt;/P&gt;&lt;P&gt;I am facing difficulty while developing pyspark automation logic on &lt;STRONG&gt;“Developing automation logic to delete/remove display() and cache() method used in scripts in multiple databricks notebooks (tasks)”.&lt;/STRONG&gt;&lt;/P&gt;&lt;P&gt;&lt;STRONG&gt;kindly advise on developing automation script.&lt;/STRONG&gt;&lt;/P&gt;</description>
    <pubDate>Tue, 18 Feb 2025 14:30:54 GMT</pubDate>
    <dc:creator>shubham_007</dc:creator>
    <dc:date>2025-02-18T14:30:54Z</dc:date>
    <item>
      <title>Dear experts, need urgent help on logic.</title>
      <link>https://community.databricks.com/t5/data-engineering/dear-experts-need-urgent-help-on-logic/m-p/110496#M43593</link>
      <description>&lt;P&gt;&lt;STRONG&gt;Dear experts,&lt;/STRONG&gt;&lt;/P&gt;&lt;P&gt;I am facing difficulty while developing pyspark automation logic on &lt;STRONG&gt;“Developing automation logic to delete/remove display() and cache() method used in scripts in multiple databricks notebooks (tasks)”.&lt;/STRONG&gt;&lt;/P&gt;&lt;P&gt;&lt;STRONG&gt;kindly advise on developing automation script.&lt;/STRONG&gt;&lt;/P&gt;</description>
      <pubDate>Tue, 18 Feb 2025 14:30:54 GMT</pubDate>
      <guid>https://community.databricks.com/t5/data-engineering/dear-experts-need-urgent-help-on-logic/m-p/110496#M43593</guid>
      <dc:creator>shubham_007</dc:creator>
      <dc:date>2025-02-18T14:30:54Z</dc:date>
    </item>
    <item>
      <title>Re: Dear experts, need urgent help on logic.</title>
      <link>https://community.databricks.com/t5/data-engineering/dear-experts-need-urgent-help-on-logic/m-p/137039#M50694</link>
      <description>&lt;P class="my-2 [&amp;amp;+p]:mt-4 [&amp;amp;_strong:has(+br)]:inline-block [&amp;amp;_strong:has(+br)]:pb-2"&gt;To automate the removal of&lt;SPAN&gt;&amp;nbsp;&lt;/SPAN&gt;&lt;CODE&gt;display()&lt;/CODE&gt;&lt;SPAN&gt;&amp;nbsp;&lt;/SPAN&gt;and&lt;SPAN&gt;&amp;nbsp;&lt;/SPAN&gt;&lt;CODE&gt;cache()&lt;/CODE&gt;&lt;SPAN&gt;&amp;nbsp;&lt;/SPAN&gt;method calls from multiple PySpark scripts in Databricks notebooks, develop a script that programmatically processes exportable notebook source files (usually in&lt;SPAN&gt;&amp;nbsp;&lt;/SPAN&gt;&lt;CODE&gt;.dbc&lt;/CODE&gt;&lt;SPAN&gt;&amp;nbsp;&lt;/SPAN&gt;or&lt;SPAN&gt;&amp;nbsp;&lt;/SPAN&gt;&lt;CODE&gt;.ipynb&lt;/CODE&gt;&lt;SPAN&gt;&amp;nbsp;&lt;/SPAN&gt;format) using text-based search and replace logic. This approach is effective and well-suited for bulk Notebook modifications, enabling automation across many files without manual intervention.​&lt;/P&gt;
&lt;H2 class="mb-2 mt-4 font-display font-semimedium text-base first:mt-0"&gt;Automation Approach: Step-by-Step&lt;/H2&gt;
&lt;UL class="marker:text-quiet list-disc"&gt;
&lt;LI class="py-0 my-0 prose-p:pt-0 prose-p:mb-2 prose-p:my-0 [&amp;amp;&amp;gt;p]:pt-0 [&amp;amp;&amp;gt;p]:mb-2 [&amp;amp;&amp;gt;p]:my-0"&gt;
&lt;P class="my-2 [&amp;amp;+p]:mt-4 [&amp;amp;_strong:has(+br)]:inline-block [&amp;amp;_strong:has(+br)]:pb-2"&gt;Export all relevant Databricks notebooks to source file formats (&lt;CODE&gt;.ipynb&lt;/CODE&gt;&lt;SPAN&gt;&amp;nbsp;&lt;/SPAN&gt;or&lt;SPAN&gt;&amp;nbsp;&lt;/SPAN&gt;&lt;CODE&gt;.dbc&lt;/CODE&gt;). Store these in a local or cloud workspace dedicated for batch processing.&lt;/P&gt;
&lt;/LI&gt;
&lt;LI class="py-0 my-0 prose-p:pt-0 prose-p:mb-2 prose-p:my-0 [&amp;amp;&amp;gt;p]:pt-0 [&amp;amp;&amp;gt;p]:mb-2 [&amp;amp;&amp;gt;p]:my-0"&gt;
&lt;P class="my-2 [&amp;amp;+p]:mt-4 [&amp;amp;_strong:has(+br)]:inline-block [&amp;amp;_strong:has(+br)]:pb-2"&gt;Develop a Python script (or use tools like Notepad++ or sed/awk for shell scripting) that reads each notebook as a text file and searches for lines containing&lt;SPAN&gt;&amp;nbsp;&lt;/SPAN&gt;&lt;CODE&gt;display(&lt;/CODE&gt;&lt;SPAN&gt;&amp;nbsp;&lt;/SPAN&gt;and&lt;SPAN&gt;&amp;nbsp;&lt;/SPAN&gt;&lt;CODE&gt;.cache(&lt;/CODE&gt;.&lt;/P&gt;
&lt;/LI&gt;
&lt;LI class="py-0 my-0 prose-p:pt-0 prose-p:mb-2 prose-p:my-0 [&amp;amp;&amp;gt;p]:pt-0 [&amp;amp;&amp;gt;p]:mb-2 [&amp;amp;&amp;gt;p]:my-0"&gt;
&lt;P class="my-2 [&amp;amp;+p]:mt-4 [&amp;amp;_strong:has(+br)]:inline-block [&amp;amp;_strong:has(+br)]:pb-2"&gt;Remove, comment out, or replace these lines according to your policy (for instance, remove visualization logic for production, or replace with logging/debug output).​&lt;/P&gt;
&lt;/LI&gt;
&lt;LI class="py-0 my-0 prose-p:pt-0 prose-p:mb-2 prose-p:my-0 [&amp;amp;&amp;gt;p]:pt-0 [&amp;amp;&amp;gt;p]:mb-2 [&amp;amp;&amp;gt;p]:my-0"&gt;
&lt;P class="my-2 [&amp;amp;+p]:mt-4 [&amp;amp;_strong:has(+br)]:inline-block [&amp;amp;_strong:has(+br)]:pb-2"&gt;After transformation, re-import cleaned notebooks back into Databricks using the CLI, API, or web interface for downstream use.​&lt;/P&gt;
&lt;/LI&gt;
&lt;LI class="py-0 my-0 prose-p:pt-0 prose-p:mb-2 prose-p:my-0 [&amp;amp;&amp;gt;p]:pt-0 [&amp;amp;&amp;gt;p]:mb-2 [&amp;amp;&amp;gt;p]:my-0"&gt;
&lt;P class="my-2 [&amp;amp;+p]:mt-4 [&amp;amp;_strong:has(+br)]:inline-block [&amp;amp;_strong:has(+br)]:pb-2"&gt;(Optional) Version the changes using Git integration or maintain backup copies of the raw notebooks to prevent accidental data loss.​&lt;/P&gt;
&lt;/LI&gt;
&lt;/UL&gt;
&lt;H2 class="mb-2 mt-4 font-display font-semimedium text-base first:mt-0"&gt;Example Python Script Snippet&lt;/H2&gt;
&lt;DIV class="w-full md:max-w-[90vw]"&gt;
&lt;DIV class="codeWrapper text-light selection:text-super selection:bg-super/10 my-md relative flex flex-col rounded font-mono text-sm font-normal bg-subtler"&gt;
&lt;DIV class="translate-y-xs -translate-x-xs bottom-xl mb-xl flex h-0 items-start justify-end md:sticky md:top-[100px]"&gt;
&lt;DIV class="overflow-hidden rounded-full border-subtlest ring-subtlest divide-subtlest bg-base"&gt;
&lt;DIV class="border-subtlest ring-subtlest divide-subtlest bg-subtler"&gt;&amp;nbsp;&lt;/DIV&gt;
&lt;/DIV&gt;
&lt;/DIV&gt;
&lt;DIV class="-mt-xl"&gt;
&lt;DIV&gt;
&lt;DIV class="text-quiet bg-subtle py-xs px-sm inline-block rounded-br rounded-tl-[3px] font-thin" data-testid="code-language-indicator"&gt;python&lt;/DIV&gt;
&lt;/DIV&gt;
&lt;DIV&gt;&lt;SPAN&gt;&lt;CODE&gt;&lt;SPAN class="token token"&gt;import&lt;/SPAN&gt; os&lt;SPAN class="token token punctuation"&gt;,&lt;/SPAN&gt; re

&lt;SPAN class="token token"&gt;def&lt;/SPAN&gt; &lt;SPAN class="token token"&gt;clean_notebook&lt;/SPAN&gt;&lt;SPAN class="token token punctuation"&gt;(&lt;/SPAN&gt;notebook_path&lt;SPAN class="token token punctuation"&gt;)&lt;/SPAN&gt;&lt;SPAN class="token token punctuation"&gt;:&lt;/SPAN&gt;
    &lt;SPAN class="token token"&gt;with&lt;/SPAN&gt; &lt;SPAN class="token token"&gt;open&lt;/SPAN&gt;&lt;SPAN class="token token punctuation"&gt;(&lt;/SPAN&gt;notebook_path&lt;SPAN class="token token punctuation"&gt;,&lt;/SPAN&gt; &lt;SPAN class="token token"&gt;'r'&lt;/SPAN&gt;&lt;SPAN class="token token punctuation"&gt;,&lt;/SPAN&gt; encoding&lt;SPAN class="token token operator"&gt;=&lt;/SPAN&gt;&lt;SPAN class="token token"&gt;'utf-8'&lt;/SPAN&gt;&lt;SPAN class="token token punctuation"&gt;)&lt;/SPAN&gt; &lt;SPAN class="token token"&gt;as&lt;/SPAN&gt; f&lt;SPAN class="token token punctuation"&gt;:&lt;/SPAN&gt;
        content &lt;SPAN class="token token operator"&gt;=&lt;/SPAN&gt; f&lt;SPAN class="token token punctuation"&gt;.&lt;/SPAN&gt;read&lt;SPAN class="token token punctuation"&gt;(&lt;/SPAN&gt;&lt;SPAN class="token token punctuation"&gt;)&lt;/SPAN&gt;

    &lt;SPAN class="token token"&gt;# Remove display() calls&lt;/SPAN&gt;
    content &lt;SPAN class="token token operator"&gt;=&lt;/SPAN&gt; re&lt;SPAN class="token token punctuation"&gt;.&lt;/SPAN&gt;sub&lt;SPAN class="token token punctuation"&gt;(&lt;/SPAN&gt;&lt;SPAN class="token token"&gt;r'display\([^)]*\)'&lt;/SPAN&gt;&lt;SPAN class="token token punctuation"&gt;,&lt;/SPAN&gt; &lt;SPAN class="token token"&gt;''&lt;/SPAN&gt;&lt;SPAN class="token token punctuation"&gt;,&lt;/SPAN&gt; content&lt;SPAN class="token token punctuation"&gt;)&lt;/SPAN&gt;
    &lt;SPAN class="token token"&gt;# Remove cache() calls&lt;/SPAN&gt;
    content &lt;SPAN class="token token operator"&gt;=&lt;/SPAN&gt; re&lt;SPAN class="token token punctuation"&gt;.&lt;/SPAN&gt;sub&lt;SPAN class="token token punctuation"&gt;(&lt;/SPAN&gt;&lt;SPAN class="token token"&gt;r'\.cache\(\)'&lt;/SPAN&gt;&lt;SPAN class="token token punctuation"&gt;,&lt;/SPAN&gt; &lt;SPAN class="token token"&gt;''&lt;/SPAN&gt;&lt;SPAN class="token token punctuation"&gt;,&lt;/SPAN&gt; content&lt;SPAN class="token token punctuation"&gt;)&lt;/SPAN&gt;

    &lt;SPAN class="token token"&gt;with&lt;/SPAN&gt; &lt;SPAN class="token token"&gt;open&lt;/SPAN&gt;&lt;SPAN class="token token punctuation"&gt;(&lt;/SPAN&gt;notebook_path&lt;SPAN class="token token punctuation"&gt;,&lt;/SPAN&gt; &lt;SPAN class="token token"&gt;'w'&lt;/SPAN&gt;&lt;SPAN class="token token punctuation"&gt;,&lt;/SPAN&gt; encoding&lt;SPAN class="token token operator"&gt;=&lt;/SPAN&gt;&lt;SPAN class="token token"&gt;'utf-8'&lt;/SPAN&gt;&lt;SPAN class="token token punctuation"&gt;)&lt;/SPAN&gt; &lt;SPAN class="token token"&gt;as&lt;/SPAN&gt; f&lt;SPAN class="token token punctuation"&gt;:&lt;/SPAN&gt;
        f&lt;SPAN class="token token punctuation"&gt;.&lt;/SPAN&gt;write&lt;SPAN class="token token punctuation"&gt;(&lt;/SPAN&gt;content&lt;SPAN class="token token punctuation"&gt;)&lt;/SPAN&gt;

&lt;SPAN class="token token"&gt;# Apply to directory of notebooks&lt;/SPAN&gt;
target_dir &lt;SPAN class="token token operator"&gt;=&lt;/SPAN&gt; &lt;SPAN class="token token"&gt;"path/to/notebooks"&lt;/SPAN&gt;
&lt;SPAN class="token token"&gt;for&lt;/SPAN&gt; fname &lt;SPAN class="token token"&gt;in&lt;/SPAN&gt; os&lt;SPAN class="token token punctuation"&gt;.&lt;/SPAN&gt;listdir&lt;SPAN class="token token punctuation"&gt;(&lt;/SPAN&gt;target_dir&lt;SPAN class="token token punctuation"&gt;)&lt;/SPAN&gt;&lt;SPAN class="token token punctuation"&gt;:&lt;/SPAN&gt;
    &lt;SPAN class="token token"&gt;if&lt;/SPAN&gt; fname&lt;SPAN class="token token punctuation"&gt;.&lt;/SPAN&gt;endswith&lt;SPAN class="token token punctuation"&gt;(&lt;/SPAN&gt;&lt;SPAN class="token token"&gt;'.py'&lt;/SPAN&gt;&lt;SPAN class="token token punctuation"&gt;)&lt;/SPAN&gt; &lt;SPAN class="token token"&gt;or&lt;/SPAN&gt; fname&lt;SPAN class="token token punctuation"&gt;.&lt;/SPAN&gt;endswith&lt;SPAN class="token token punctuation"&gt;(&lt;/SPAN&gt;&lt;SPAN class="token token"&gt;'.ipynb'&lt;/SPAN&gt;&lt;SPAN class="token token punctuation"&gt;)&lt;/SPAN&gt;&lt;SPAN class="token token punctuation"&gt;:&lt;/SPAN&gt;
        clean_notebook&lt;SPAN class="token token punctuation"&gt;(&lt;/SPAN&gt;os&lt;SPAN class="token token punctuation"&gt;.&lt;/SPAN&gt;path&lt;SPAN class="token token punctuation"&gt;.&lt;/SPAN&gt;join&lt;SPAN class="token token punctuation"&gt;(&lt;/SPAN&gt;target_dir&lt;SPAN class="token token punctuation"&gt;,&lt;/SPAN&gt; fname&lt;SPAN class="token token punctuation"&gt;)&lt;/SPAN&gt;&lt;SPAN class="token token punctuation"&gt;)&lt;/SPAN&gt;
&lt;/CODE&gt;&lt;/SPAN&gt;&lt;/DIV&gt;
&lt;/DIV&gt;
&lt;/DIV&gt;
&lt;/DIV&gt;
&lt;P class="my-2 [&amp;amp;+p]:mt-4 [&amp;amp;_strong:has(+br)]:inline-block [&amp;amp;_strong:has(+br)]:pb-2"&gt;This basic logic parses each notebook and removes the designated method calls. For&lt;SPAN&gt;&amp;nbsp;&lt;/SPAN&gt;&lt;CODE&gt;.ipynb&lt;/CODE&gt;&lt;SPAN&gt;&amp;nbsp;&lt;/SPAN&gt;files, you may want to process JSON cell structures instead of flat text for higher precision.&lt;/P&gt;
&lt;H2 class="mb-2 mt-4 font-display font-semimedium text-base first:mt-0"&gt;Best Practices &amp;amp; Notes&lt;/H2&gt;
&lt;UL class="marker:text-quiet list-disc"&gt;
&lt;LI class="py-0 my-0 prose-p:pt-0 prose-p:mb-2 prose-p:my-0 [&amp;amp;&amp;gt;p]:pt-0 [&amp;amp;&amp;gt;p]:mb-2 [&amp;amp;&amp;gt;p]:my-0"&gt;
&lt;P class="my-2 [&amp;amp;+p]:mt-4 [&amp;amp;_strong:has(+br)]:inline-block [&amp;amp;_strong:has(+br)]:pb-2"&gt;Always backup original notebooks before bulk modification.​&lt;/P&gt;
&lt;/LI&gt;
&lt;LI class="py-0 my-0 prose-p:pt-0 prose-p:mb-2 prose-p:my-0 [&amp;amp;&amp;gt;p]:pt-0 [&amp;amp;&amp;gt;p]:mb-2 [&amp;amp;&amp;gt;p]:my-0"&gt;
&lt;P class="my-2 [&amp;amp;+p]:mt-4 [&amp;amp;_strong:has(+br)]:inline-block [&amp;amp;_strong:has(+br)]:pb-2"&gt;Use regular expressions judiciously. Ensure that function calls spanning multiple lines, or embedded in longer statements, are thoroughly detected and removed.&lt;/P&gt;
&lt;/LI&gt;
&lt;LI class="py-0 my-0 prose-p:pt-0 prose-p:mb-2 prose-p:my-0 [&amp;amp;&amp;gt;p]:pt-0 [&amp;amp;&amp;gt;p]:mb-2 [&amp;amp;&amp;gt;p]:my-0"&gt;
&lt;P class="my-2 [&amp;amp;+p]:mt-4 [&amp;amp;_strong:has(+br)]:inline-block [&amp;amp;_strong:has(+br)]:pb-2"&gt;Test the automation logic on a small sample before full application, to avoid accidental removal of valid code.&lt;/P&gt;
&lt;/LI&gt;
&lt;LI class="py-0 my-0 prose-p:pt-0 prose-p:mb-2 prose-p:my-0 [&amp;amp;&amp;gt;p]:pt-0 [&amp;amp;&amp;gt;p]:mb-2 [&amp;amp;&amp;gt;p]:my-0"&gt;
&lt;P class="my-2 [&amp;amp;+p]:mt-4 [&amp;amp;_strong:has(+br)]:inline-block [&amp;amp;_strong:has(+br)]:pb-2"&gt;Consider using the Databricks CLI for notebook export/import operations programmatically for large scale or repeated runs.​&lt;/P&gt;
&lt;/LI&gt;
&lt;LI class="py-0 my-0 prose-p:pt-0 prose-p:mb-2 prose-p:my-0 [&amp;amp;&amp;gt;p]:pt-0 [&amp;amp;&amp;gt;p]:mb-2 [&amp;amp;&amp;gt;p]:my-0"&gt;
&lt;P class="my-2 [&amp;amp;+p]:mt-4 [&amp;amp;_strong:has(+br)]:inline-block [&amp;amp;_strong:has(+br)]:pb-2"&gt;For advanced workflows, integrate the logic into your CI/CD pipeline using Python, PowerShell, or Bash scripting for automated enforcement.&lt;/P&gt;
&lt;/LI&gt;
&lt;/UL&gt;
&lt;P class="my-2 [&amp;amp;+p]:mt-4 [&amp;amp;_strong:has(+br)]:inline-block [&amp;amp;_strong:has(+br)]:pb-2"&gt;This method streamlines the removal of visualization and caching calls from PySpark scripts, making your Databricks pipelines cleaner and more production-ready.​&lt;/P&gt;</description>
      <pubDate>Fri, 31 Oct 2025 15:22:04 GMT</pubDate>
      <guid>https://community.databricks.com/t5/data-engineering/dear-experts-need-urgent-help-on-logic/m-p/137039#M50694</guid>
      <dc:creator>mark_ott</dc:creator>
      <dc:date>2025-10-31T15:22:04Z</dc:date>
    </item>
  </channel>
</rss>

