<?xml version="1.0" encoding="UTF-8"?>
<rss xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:taxo="http://purl.org/rss/1.0/modules/taxonomy/" version="2.0">
  <channel>
    <title>topic Re: DLT and Modularity (best practices?) in Data Engineering</title>
    <link>https://community.databricks.com/t5/data-engineering/dlt-and-modularity-best-practices/m-p/20362#M13733</link>
    <description>&lt;P&gt;Hi @Greg Galloway​&amp;nbsp;,&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;thank you very much for your reply. I did according to your suggestion but now I face another error when executing the pipeline.&lt;/P&gt;&lt;P&gt;Can you please advice?&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;Best regards&lt;/P&gt;&lt;P&gt;Ruben&lt;/P&gt;</description>
    <pubDate>Mon, 09 Jan 2023 16:17:40 GMT</pubDate>
    <dc:creator>hardy1982</dc:creator>
    <dc:date>2023-01-09T16:17:40Z</dc:date>
    <item>
      <title>DLT and Modularity (best practices?)</title>
      <link>https://community.databricks.com/t5/data-engineering/dlt-and-modularity-best-practices/m-p/20355#M13726</link>
      <description>&lt;P&gt;I have [very] recently started using DLT for the first time. One of the challenges I have run into is how to include other "modules" within my pipelines. I missed the documentation where magic commands (with the exception of %pip) are ignored and was unpleasantly surprised when running the workflow for the first time.&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;What is the best practice for including common modules within workflows?&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;In my particular case, what I would like to do is create a separate module that can dynamically generate a dict of expectations given a specific table... and I definitely do not want to include this is all of my notebooks (DRY). Any ideas/suggestions/best-pratices for a newbie on how to accomplish this?&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;Thanks for the help and guidance!&lt;/P&gt;</description>
      <pubDate>Tue, 17 May 2022 19:57:47 GMT</pubDate>
      <guid>https://community.databricks.com/t5/data-engineering/dlt-and-modularity-best-practices/m-p/20355#M13726</guid>
      <dc:creator>jeremy1</dc:creator>
      <dc:date>2022-05-17T19:57:47Z</dc:date>
    </item>
    <item>
      <title>Re: DLT and Modularity (best practices?)</title>
      <link>https://community.databricks.com/t5/data-engineering/dlt-and-modularity-best-practices/m-p/20356#M13727</link>
      <description>&lt;P&gt;Hello @Jeremy Colson​&amp;nbsp;Thank you for reaching out to Databricks Community Forum.&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;Could you please give this a try if you already have a Repos linked in the workspace?&lt;/P&gt;&lt;P&gt;I think Engineering is working on some improvements on this front.&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;&lt;A href="https://docs.databricks.com/repos/index.html" alt="https://docs.databricks.com/repos/index.html" target="_blank"&gt;https://docs.databricks.com/repos/index.html&lt;/A&gt;&lt;/P&gt;&lt;P&gt;&lt;span class="lia-inline-image-display-wrapper" image-alt="image"&gt;&lt;img src="https://community.databricks.com/t5/image/serverpage/image-id/1853i301F710281AA940E/image-size/large?v=v2&amp;amp;px=999" role="button" title="image" alt="image" /&gt;&lt;/span&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;Below code snippet shows a simple example. You can implement your own logic and try to import it in the DLT pipeline.&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;PRE&gt;&lt;CODE&gt;import sys
import pprint
&amp;nbsp;
sys.path.append("/Workspace/Repos/arvind.ravish@databricks.com/arvindravish/dlt_import")
&amp;nbsp;
from my_file import myClass
newClass = myClass(5)
val = newClass.getVal()
print(val * 5)
&amp;nbsp;&lt;/CODE&gt;&lt;/PRE&gt;&lt;P&gt;Please provide your feedback so we can add any improvements to our product backlog.&lt;/P&gt;</description>
      <pubDate>Sat, 11 Jun 2022 07:46:23 GMT</pubDate>
      <guid>https://community.databricks.com/t5/data-engineering/dlt-and-modularity-best-practices/m-p/20356#M13727</guid>
      <dc:creator>User16764241763</dc:creator>
      <dc:date>2022-06-11T07:46:23Z</dc:date>
    </item>
    <item>
      <title>Re: DLT and Modularity (best practices?)</title>
      <link>https://community.databricks.com/t5/data-engineering/dlt-and-modularity-best-practices/m-p/20358#M13729</link>
      <description>&lt;P&gt;I like the approach @Arvind Ravish​&amp;nbsp;shared since you can't currently use %run in DLT pipelines. However, it took a little testing to be clear on how exactly to make it work. &lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;First, ensure in the Admin Console that the repos feature is configured as follows:&lt;/P&gt;&lt;P&gt;&lt;span class="lia-inline-image-display-wrapper" image-alt="image.png"&gt;&lt;img src="https://community.databricks.com/t5/image/serverpage/image-id/1852i52A52DF85BD43A55/image-size/large?v=v2&amp;amp;px=999" role="button" title="image.png" alt="image.png" /&gt;&lt;/span&gt;Then create a new arbitrary file named &lt;A href="https://Import.py" alt="https://Import.py" target="_blank"&gt;Import.py&lt;/A&gt; using the following menu option. (Note, it does not work with a Notebook.)&lt;/P&gt;&lt;P&gt;&lt;span class="lia-inline-image-display-wrapper" image-alt="create arbitrary file"&gt;&lt;img src="https://community.databricks.com/t5/image/serverpage/image-id/1849iAF17723B310DB20E/image-size/large?v=v2&amp;amp;px=999" role="button" title="create arbitrary file" alt="create arbitrary file" /&gt;&lt;/span&gt;The file should contain code like the following:&lt;/P&gt;&lt;PRE&gt;&lt;CODE&gt;MYVAR1 = "hi"
MYVAR2 = 99
MYVAR3 = "hello"
&amp;nbsp;
def factorial(num):
    fact=1
    for i in range(1,num+1):
        fact = fact*i
    return fact&lt;/CODE&gt;&lt;/PRE&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;In the DLT notebook, the following code loads &lt;A href="https://Import.py" alt="https://Import.py" target="_blank"&gt;Import.py&lt;/A&gt; and executes the Python code in it. Then MYVAR1, MYVAR2, MYVAR3, and the factorial function will be available for reference downstream in the pipeline. &lt;/P&gt;&lt;PRE&gt;&lt;CODE&gt;import pyspark.sql.functions as f
&amp;nbsp;
txt = spark.read.text("file:/Workspace/Repos/FolderName/RepoName/Import.py") 
&amp;nbsp;
#concatenate all lines of the file into a single string
singlerow = txt.agg(f.concat_ws("\r\n", f.collect_list(txt.value)))
data = "\r\n".join(singlerow.collect()[0])
&amp;nbsp;
#execute that string of python
exec(data)&lt;/CODE&gt;&lt;/PRE&gt;&lt;P&gt;This appears to work in both Current and Preview channel DLT pipelines at the moment.&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;Unfortunately, the os.getcwd() command &lt;B&gt;doesn't&lt;/B&gt; appear to be working in DLT pipelines (as it returns /databricks/driver even when the DLT pipeline notebook is in a Repo) so I haven't figured out a way to use a relative path even if your calling notebook is also in Repos. The following currently fails and Azure support case 2211240040000106 has been opened:&lt;/P&gt;&lt;PRE&gt;&lt;CODE&gt;import os
import pyspark.sql.functions as f
&amp;nbsp;
txt = spark.read.text(f"file:{os.getcwd()}/Import.py") 
&amp;nbsp;
#concatenate all lines of the file into a single string
singlerow = txt.agg(f.concat_ws("\r\n", f.collect_list(txt.value)))
data = "\r\n".join(singlerow.collect()[0])
&amp;nbsp;
#execute that string of python
exec(data)&lt;/CODE&gt;&lt;/PRE&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;I'm also having trouble using the import example from aravish without using a hardcoded path like sys.path.append("/Workspace/Repos/TopFolder/RepoName") when running in a DLT pipeline. The aravish approach is useful if you want to import function definitions but not execute any Python code and not define any variables which will be visible in the calling notebook's Spark session.&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;&lt;I&gt;Note: Edited from a previous post where I made a few mistakes.&lt;/I&gt;&lt;/P&gt;</description>
      <pubDate>Thu, 20 Oct 2022 18:14:13 GMT</pubDate>
      <guid>https://community.databricks.com/t5/data-engineering/dlt-and-modularity-best-practices/m-p/20358#M13729</guid>
      <dc:creator>Greg_Galloway</dc:creator>
      <dc:date>2022-10-20T18:14:13Z</dc:date>
    </item>
    <item>
      <title>Re: DLT and Modularity (best practices?)</title>
      <link>https://community.databricks.com/t5/data-engineering/dlt-and-modularity-best-practices/m-p/20359#M13730</link>
      <description>&lt;P&gt;Hi Arvind,&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;I did the configuration is per your description but still it fails.&lt;/P&gt;&lt;P&gt;Here are my screenshots:&lt;/P&gt;&lt;P&gt;&lt;span class="lia-inline-image-display-wrapper" image-alt="image"&gt;&lt;img src="https://community.databricks.com/t5/image/serverpage/image-id/1855i9FC65D657C03D4E0/image-size/large?v=v2&amp;amp;px=999" role="button" title="image" alt="image" /&gt;&lt;/span&gt;&lt;span class="lia-inline-image-display-wrapper" image-alt="image"&gt;&lt;img src="https://community.databricks.com/t5/image/serverpage/image-id/1861i8358AB45C64BEC24/image-size/large?v=v2&amp;amp;px=999" role="button" title="image" alt="image" /&gt;&lt;/span&gt;&lt;span class="lia-inline-image-display-wrapper" image-alt="image"&gt;&lt;img src="https://community.databricks.com/t5/image/serverpage/image-id/1854iD07BF6EFBEF78B87/image-size/large?v=v2&amp;amp;px=999" role="button" title="image" alt="image" /&gt;&lt;/span&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;Can you please suggest how to proceed?&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;Best regards&lt;/P&gt;&lt;P&gt;Ruben&lt;/P&gt;</description>
      <pubDate>Thu, 05 Jan 2023 16:35:20 GMT</pubDate>
      <guid>https://community.databricks.com/t5/data-engineering/dlt-and-modularity-best-practices/m-p/20359#M13730</guid>
      <dc:creator>hardy1982</dc:creator>
      <dc:date>2023-01-05T16:35:20Z</dc:date>
    </item>
    <item>
      <title>Re: DLT and Modularity (best practices?)</title>
      <link>https://community.databricks.com/t5/data-engineering/dlt-and-modularity-best-practices/m-p/20360#M13731</link>
      <description>&lt;P&gt;@Ruben Hartenstein​&amp;nbsp;notice the difference in the icons in your screenshot (they are notebooks) vs. the icons in the Arvind's post. You need to use this menu option to create an arbitrary file, not a notebook:&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;span class="lia-inline-image-display-wrapper" image-alt="image.png"&gt;&lt;img src="https://community.databricks.com/t5/image/serverpage/image-id/1867i096E800DE79F6522/image-size/large?v=v2&amp;amp;px=999" role="button" title="image.png" alt="image.png" /&gt;&lt;/span&gt;&lt;/P&gt;</description>
      <pubDate>Thu, 05 Jan 2023 16:42:38 GMT</pubDate>
      <guid>https://community.databricks.com/t5/data-engineering/dlt-and-modularity-best-practices/m-p/20360#M13731</guid>
      <dc:creator>Greg_Galloway</dc:creator>
      <dc:date>2023-01-05T16:42:38Z</dc:date>
    </item>
    <item>
      <title>Re: DLT and Modularity (best practices?)</title>
      <link>https://community.databricks.com/t5/data-engineering/dlt-and-modularity-best-practices/m-p/20361#M13732</link>
      <description>&lt;P&gt;What my setup looks like is two workspaces, one dev, one prod. The repos folder where dlt pipelines run from is called dev in dev and prod in prod. I use secret scopes to retrieve the appropriate text to be able to path to the correct environments sys.path.append(some/path/to/my.py) so I can from my.py import method/class​.&lt;/P&gt;</description>
      <pubDate>Fri, 06 Jan 2023 06:14:51 GMT</pubDate>
      <guid>https://community.databricks.com/t5/data-engineering/dlt-and-modularity-best-practices/m-p/20361#M13732</guid>
      <dc:creator>Jfoxyyc</dc:creator>
      <dc:date>2023-01-06T06:14:51Z</dc:date>
    </item>
    <item>
      <title>Re: DLT and Modularity (best practices?)</title>
      <link>https://community.databricks.com/t5/data-engineering/dlt-and-modularity-best-practices/m-p/20362#M13733</link>
      <description>&lt;P&gt;Hi @Greg Galloway​&amp;nbsp;,&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;thank you very much for your reply. I did according to your suggestion but now I face another error when executing the pipeline.&lt;/P&gt;&lt;P&gt;Can you please advice?&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;Best regards&lt;/P&gt;&lt;P&gt;Ruben&lt;/P&gt;</description>
      <pubDate>Mon, 09 Jan 2023 16:17:40 GMT</pubDate>
      <guid>https://community.databricks.com/t5/data-engineering/dlt-and-modularity-best-practices/m-p/20362#M13733</guid>
      <dc:creator>hardy1982</dc:creator>
      <dc:date>2023-01-09T16:17:40Z</dc:date>
    </item>
    <item>
      <title>Re: DLT and Modularity (best practices?)</title>
      <link>https://community.databricks.com/t5/data-engineering/dlt-and-modularity-best-practices/m-p/20363#M13734</link>
      <description>&lt;P&gt;@Ruben Hartenstein​&amp;nbsp; I don't see any @dlt.table mentions in your code. I'm assuming that error means the pipeline evaluated your code and didn't find any DLT tables in it. Maybe study a few of the &lt;A href="https://learn.microsoft.com/en-us/azure/databricks/workflows/delta-live-tables/delta-live-tables-quickstart" alt="https://learn.microsoft.com/en-us/azure/databricks/workflows/delta-live-tables/delta-live-tables-quickstart" target="_blank"&gt;samples&lt;/A&gt; as a template? &lt;/P&gt;</description>
      <pubDate>Mon, 09 Jan 2023 18:15:43 GMT</pubDate>
      <guid>https://community.databricks.com/t5/data-engineering/dlt-and-modularity-best-practices/m-p/20363#M13734</guid>
      <dc:creator>Greg_Galloway</dc:creator>
      <dc:date>2023-01-09T18:15:43Z</dc:date>
    </item>
    <item>
      <title>Re: DLT and Modularity (best practices?)</title>
      <link>https://community.databricks.com/t5/data-engineering/dlt-and-modularity-best-practices/m-p/20364#M13735</link>
      <description>&lt;P&gt;@Greg Galloway​&amp;nbsp;&lt;/P&gt;&lt;P&gt;Is there no way to run the pipeline without @dlt?&lt;/P&gt;&lt;P&gt;I just want to use hive tables in my coding.&lt;/P&gt;</description>
      <pubDate>Tue, 10 Jan 2023 08:01:04 GMT</pubDate>
      <guid>https://community.databricks.com/t5/data-engineering/dlt-and-modularity-best-practices/m-p/20364#M13735</guid>
      <dc:creator>hardy1982</dc:creator>
      <dc:date>2023-01-10T08:01:04Z</dc:date>
    </item>
    <item>
      <title>Re: DLT and Modularity (best practices?)</title>
      <link>https://community.databricks.com/t5/data-engineering/dlt-and-modularity-best-practices/m-p/20365#M13736</link>
      <description>&lt;P&gt;If you don't want any Delta Live Tables, then just use the Jobs tab under the Workflows tab. Or there are plenty of other ways of just running a notebook in whatever orchestration tool you use (e.g. Azure Data Factory, etc.) @Ruben Hartenstein​&amp;nbsp;&lt;/P&gt;</description>
      <pubDate>Tue, 10 Jan 2023 14:49:27 GMT</pubDate>
      <guid>https://community.databricks.com/t5/data-engineering/dlt-and-modularity-best-practices/m-p/20365#M13736</guid>
      <dc:creator>Greg_Galloway</dc:creator>
      <dc:date>2023-01-10T14:49:27Z</dc:date>
    </item>
  </channel>
</rss>

