<?xml version="1.0" encoding="UTF-8"?>
<rss xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:taxo="http://purl.org/rss/1.0/modules/taxonomy/" version="2.0">
  <channel>
    <title>topic Re: DLT Pipeline and Job Cluster in Data Engineering</title>
    <link>https://community.databricks.com/t5/data-engineering/dlt-pipeline-and-job-cluster/m-p/14238#M8765</link>
    <description>&lt;P&gt;Does it give you an error when running the DLT pipeline specifically on the %pip command or does it not work in some other way? &lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;If it's the former, could you share the path format that you're using for the %pip command path?&lt;/P&gt;</description>
    <pubDate>Fri, 08 Jul 2022 17:01:28 GMT</pubDate>
    <dc:creator>tomasz</dc:creator>
    <dc:date>2022-07-08T17:01:28Z</dc:date>
    <item>
      <title>DLT Pipeline and Job Cluster</title>
      <link>https://community.databricks.com/t5/data-engineering/dlt-pipeline-and-job-cluster/m-p/14236#M8763</link>
      <description>&lt;P&gt;We have written few python functions(methods within a class) and packaged them as a wheel library.&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;In the as-is situation we use to install that wheel library in All-Purpose cluster that we already have created.&lt;/P&gt;&lt;P&gt; It works fine.&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;In the to-be situtation(Delta Live Tables) we want this wheel library to be installed part of the Delta live pipeline execution, because when DLT pipeline runs it creates its own Job Cluster.&lt;/P&gt;&lt;P&gt; We use lot of python functions to do the transformations between Silver and Gold layer.&lt;/P&gt;&lt;P&gt;Hence we want the wheel library (which has all the UDF’s) to be installed in the Job Cluster which DLT pipeline creates.&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt; When we execute %pip install &amp;lt;wheel library location in DBFS&amp;gt; as a first step in the DLT notebook, it does not seem to work.&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt; But when we have %pip install numpy it works.&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;Its important for us to have the wheel library installed in the job cluster created by DLT pipeline.&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;Are we missing something?&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;Thanks&lt;/P&gt;</description>
      <pubDate>Fri, 08 Jul 2022 15:28:30 GMT</pubDate>
      <guid>https://community.databricks.com/t5/data-engineering/dlt-pipeline-and-job-cluster/m-p/14236#M8763</guid>
      <dc:creator>Deepak_Goldwyn</dc:creator>
      <dc:date>2022-07-08T15:28:30Z</dc:date>
    </item>
    <item>
      <title>Re: DLT Pipeline and Job Cluster</title>
      <link>https://community.databricks.com/t5/data-engineering/dlt-pipeline-and-job-cluster/m-p/14237#M8764</link>
      <description>&lt;P&gt;Are you sure that the DLT cluster sees your DBFS?&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;You can also use "files in repos" alternatively instead.&lt;/P&gt;&lt;P&gt;&lt;/P&gt;</description>
      <pubDate>Fri, 08 Jul 2022 16:55:46 GMT</pubDate>
      <guid>https://community.databricks.com/t5/data-engineering/dlt-pipeline-and-job-cluster/m-p/14237#M8764</guid>
      <dc:creator>Hubert-Dudek</dc:creator>
      <dc:date>2022-07-08T16:55:46Z</dc:date>
    </item>
    <item>
      <title>Re: DLT Pipeline and Job Cluster</title>
      <link>https://community.databricks.com/t5/data-engineering/dlt-pipeline-and-job-cluster/m-p/14238#M8765</link>
      <description>&lt;P&gt;Does it give you an error when running the DLT pipeline specifically on the %pip command or does it not work in some other way? &lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;If it's the former, could you share the path format that you're using for the %pip command path?&lt;/P&gt;</description>
      <pubDate>Fri, 08 Jul 2022 17:01:28 GMT</pubDate>
      <guid>https://community.databricks.com/t5/data-engineering/dlt-pipeline-and-job-cluster/m-p/14238#M8765</guid>
      <dc:creator>tomasz</dc:creator>
      <dc:date>2022-07-08T17:01:28Z</dc:date>
    </item>
    <item>
      <title>Re: DLT Pipeline and Job Cluster</title>
      <link>https://community.databricks.com/t5/data-engineering/dlt-pipeline-and-job-cluster/m-p/14239#M8766</link>
      <description>&lt;P&gt;@Tomasz Bacewicz​&amp;nbsp;&lt;/P&gt;&lt;P&gt;Thanks for your reply !&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;We are using the below command as a fist cmd (cell) in the DLT notebook,&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;%pip install /dbfs/dist/abnamro_acdpt_centraldatapoint-0.12.0.dev24-py3-none-any.whl&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;Fyi,&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;When we try to manually install the same on the Job Cluster which DLT pipeline creates it is getting installed.&lt;/P&gt;&lt;P&gt;Also when run the same above pip install command on the All purpose cluster its getting installed.&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;Only when its run from DLT pipeline it fails.&lt;/P&gt;</description>
      <pubDate>Mon, 11 Jul 2022 12:24:12 GMT</pubDate>
      <guid>https://community.databricks.com/t5/data-engineering/dlt-pipeline-and-job-cluster/m-p/14239#M8766</guid>
      <dc:creator>Deepak_Goldwyn</dc:creator>
      <dc:date>2022-07-11T12:24:12Z</dc:date>
    </item>
    <item>
      <title>Re: DLT Pipeline and Job Cluster</title>
      <link>https://community.databricks.com/t5/data-engineering/dlt-pipeline-and-job-cluster/m-p/14240#M8767</link>
      <description>&lt;P&gt;Makes sense, good to know that it works manually. Can you also share the error that you get? &lt;/P&gt;</description>
      <pubDate>Mon, 11 Jul 2022 13:52:32 GMT</pubDate>
      <guid>https://community.databricks.com/t5/data-engineering/dlt-pipeline-and-job-cluster/m-p/14240#M8767</guid>
      <dc:creator>tomasz</dc:creator>
      <dc:date>2022-07-11T13:52:32Z</dc:date>
    </item>
    <item>
      <title>Re: DLT Pipeline and Job Cluster</title>
      <link>https://community.databricks.com/t5/data-engineering/dlt-pipeline-and-job-cluster/m-p/14241#M8768</link>
      <description>&lt;P&gt;It said "it could not find the whl file"&lt;/P&gt;&lt;P&gt;Upon investigation we found our library sits in nexus and the cluster environment variable should be setup.&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;And when added the below in DLT pipeline settings json,&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;"spark_env_vars": {&lt;/P&gt;&lt;P&gt;  "PIP_INDEX_URL": "&amp;lt;URL for our repository&amp;gt;"&lt;/P&gt;&lt;P&gt;  },&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;it worked.&lt;/P&gt;</description>
      <pubDate>Mon, 11 Jul 2022 13:57:40 GMT</pubDate>
      <guid>https://community.databricks.com/t5/data-engineering/dlt-pipeline-and-job-cluster/m-p/14241#M8768</guid>
      <dc:creator>Deepak_Goldwyn</dc:creator>
      <dc:date>2022-07-11T13:57:40Z</dc:date>
    </item>
  </channel>
</rss>

