<?xml version="1.0" encoding="UTF-8"?>
<rss xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:taxo="http://purl.org/rss/1.0/modules/taxonomy/" version="2.0">
  <channel>
    <title>topic Adding maven dependency to ETL pipeline in Data Engineering</title>
    <link>https://community.databricks.com/t5/data-engineering/adding-maven-dependency-to-etl-pipeline/m-p/139458#M51207</link>
    <description>&lt;P&gt;Hello guys,&lt;/P&gt;&lt;P&gt;I'm building ETL pipeline and need to access HANA data lake file system. In order to do that I need to have&amp;nbsp;&lt;SPAN&gt;sap-hdlfs library in compute environment, library is available in maven repository.&lt;/SPAN&gt;&lt;/P&gt;&lt;P&gt;&lt;SPAN&gt;My job will have multiple notebook task and ETL pipeline and from what I've researched, notebook tasks will use the same compute with the job, but ETL pipeline will have its own compute. And from UI, I cannot see where to add library into it.&lt;BR /&gt;&lt;/SPAN&gt;&lt;/P&gt;&lt;P&gt;&lt;span class="lia-inline-image-display-wrapper lia-image-align-inline" image-alt="anhnnguyen_0-1763437214864.png" style="width: 400px;"&gt;&lt;img src="https://community.databricks.com/t5/image/serverpage/image-id/21790i13BB24E03090335A/image-size/medium?v=v2&amp;amp;px=400" role="button" title="anhnnguyen_0-1763437214864.png" alt="anhnnguyen_0-1763437214864.png" /&gt;&lt;/span&gt;&lt;/P&gt;&lt;P&gt;Could anyone confirm whether my understanding is correct and how to add library to ETL pipeline compute?&lt;/P&gt;&lt;P&gt;Thanks in advanced.&lt;/P&gt;&lt;P&gt;&lt;SPAN&gt;&amp;nbsp;&lt;/SPAN&gt;&lt;/P&gt;</description>
    <pubDate>Tue, 18 Nov 2025 03:48:19 GMT</pubDate>
    <dc:creator>anhnnguyen</dc:creator>
    <dc:date>2025-11-18T03:48:19Z</dc:date>
    <item>
      <title>Adding maven dependency to ETL pipeline</title>
      <link>https://community.databricks.com/t5/data-engineering/adding-maven-dependency-to-etl-pipeline/m-p/139458#M51207</link>
      <description>&lt;P&gt;Hello guys,&lt;/P&gt;&lt;P&gt;I'm building ETL pipeline and need to access HANA data lake file system. In order to do that I need to have&amp;nbsp;&lt;SPAN&gt;sap-hdlfs library in compute environment, library is available in maven repository.&lt;/SPAN&gt;&lt;/P&gt;&lt;P&gt;&lt;SPAN&gt;My job will have multiple notebook task and ETL pipeline and from what I've researched, notebook tasks will use the same compute with the job, but ETL pipeline will have its own compute. And from UI, I cannot see where to add library into it.&lt;BR /&gt;&lt;/SPAN&gt;&lt;/P&gt;&lt;P&gt;&lt;span class="lia-inline-image-display-wrapper lia-image-align-inline" image-alt="anhnnguyen_0-1763437214864.png" style="width: 400px;"&gt;&lt;img src="https://community.databricks.com/t5/image/serverpage/image-id/21790i13BB24E03090335A/image-size/medium?v=v2&amp;amp;px=400" role="button" title="anhnnguyen_0-1763437214864.png" alt="anhnnguyen_0-1763437214864.png" /&gt;&lt;/span&gt;&lt;/P&gt;&lt;P&gt;Could anyone confirm whether my understanding is correct and how to add library to ETL pipeline compute?&lt;/P&gt;&lt;P&gt;Thanks in advanced.&lt;/P&gt;&lt;P&gt;&lt;SPAN&gt;&amp;nbsp;&lt;/SPAN&gt;&lt;/P&gt;</description>
      <pubDate>Tue, 18 Nov 2025 03:48:19 GMT</pubDate>
      <guid>https://community.databricks.com/t5/data-engineering/adding-maven-dependency-to-etl-pipeline/m-p/139458#M51207</guid>
      <dc:creator>anhnnguyen</dc:creator>
      <dc:date>2025-11-18T03:48:19Z</dc:date>
    </item>
    <item>
      <title>Re: Adding maven dependency to ETL pipeline</title>
      <link>https://community.databricks.com/t5/data-engineering/adding-maven-dependency-to-etl-pipeline/m-p/139473#M51210</link>
      <description>&lt;P&gt;Hi&amp;nbsp;&lt;a href="https://community.databricks.com/t5/user/viewprofilepage/user-id/198085"&gt;@anhnnguyen&lt;/a&gt;&amp;nbsp;,&lt;/P&gt;&lt;P&gt;Unfortunately, Scala or Java libraries are not supported in lakeflow declarative pipeline (ETL Pipelines). So you need to use regular job if you want to install maven dependencies.&lt;/P&gt;&lt;P&gt;&lt;A href="https://docs.databricks.com/aws/en/ldp/developer/external-dependencies#can-i-use-scala-or-java-libraries-in-pipelines" target="_blank"&gt;Manage Python dependencies for pipelines | Databricks on AWS&lt;/A&gt;&lt;/P&gt;&lt;P&gt;&lt;span class="lia-inline-image-display-wrapper lia-image-align-inline" image-alt="szymon_dybczak_0-1763447590675.png" style="width: 400px;"&gt;&lt;img src="https://community.databricks.com/t5/image/serverpage/image-id/21796i331B7DE44A09CECE/image-size/medium?v=v2&amp;amp;px=400" role="button" title="szymon_dybczak_0-1763447590675.png" alt="szymon_dybczak_0-1763447590675.png" /&gt;&lt;/span&gt;&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;</description>
      <pubDate>Tue, 18 Nov 2025 06:33:25 GMT</pubDate>
      <guid>https://community.databricks.com/t5/data-engineering/adding-maven-dependency-to-etl-pipeline/m-p/139473#M51210</guid>
      <dc:creator>szymon_dybczak</dc:creator>
      <dc:date>2025-11-18T06:33:25Z</dc:date>
    </item>
    <item>
      <title>Re: Adding maven dependency to ETL pipeline</title>
      <link>https://community.databricks.com/t5/data-engineering/adding-maven-dependency-to-etl-pipeline/m-p/139547#M51224</link>
      <description>&lt;P&gt;DLT doesn’t have a UI for library installation, but you can:&lt;/P&gt;&lt;P&gt;&lt;!--  StartFragment   --&gt;&lt;SPAN class=""&gt;Use libraries configuration in the pipeline JSON or YAML spec:&lt;/SPAN&gt;&lt;!--  EndFragment   --&gt;&lt;/P&gt;&lt;LI-CODE lang="python"&gt;{
  "libraries": [
    {
      "maven": {
        "coordinates": "com.sap.hana.hadoop:sap-hdlfs:&amp;lt;version&amp;gt;"
      }
    }
  ]
}&lt;/LI-CODE&gt;&lt;P&gt;Or, if you’re using &lt;STRONG&gt;Python&lt;/STRONG&gt;, add the dependency in your &lt;STRONG&gt;requirements.txt&lt;/STRONG&gt; and reference it in the pipeline settings.&lt;/P&gt;&lt;DIV class=""&gt;&lt;DIV class=""&gt;&lt;DIV class=""&gt;&lt;DIV class=""&gt;&lt;DIV class=""&gt;&lt;DIV class=""&gt;&amp;nbsp;&lt;/DIV&gt;&lt;/DIV&gt;&lt;/DIV&gt;&lt;/DIV&gt;&lt;/DIV&gt;&lt;/DIV&gt;</description>
      <pubDate>Tue, 18 Nov 2025 15:55:51 GMT</pubDate>
      <guid>https://community.databricks.com/t5/data-engineering/adding-maven-dependency-to-etl-pipeline/m-p/139547#M51224</guid>
      <dc:creator>nayan_wylde</dc:creator>
      <dc:date>2025-11-18T15:55:51Z</dc:date>
    </item>
    <item>
      <title>Re: Adding maven dependency to ETL pipeline</title>
      <link>https://community.databricks.com/t5/data-engineering/adding-maven-dependency-to-etl-pipeline/m-p/139584#M51239</link>
      <description>&lt;P&gt;I tried and it does not work. After saving config, Databricks will revert it back.&lt;/P&gt;&lt;P&gt;The way seem possible is load library from init script but as&amp;nbsp;&lt;a href="https://community.databricks.com/t5/user/viewprofilepage/user-id/110502"&gt;@szymon_dybczak&lt;/a&gt;&amp;nbsp;mentioned, it's not a good way since it will cause unexpected behavior&lt;/P&gt;</description>
      <pubDate>Wed, 19 Nov 2025 02:03:20 GMT</pubDate>
      <guid>https://community.databricks.com/t5/data-engineering/adding-maven-dependency-to-etl-pipeline/m-p/139584#M51239</guid>
      <dc:creator>anhnnguyen</dc:creator>
      <dc:date>2025-11-19T02:03:20Z</dc:date>
    </item>
    <item>
      <title>Re: Adding maven dependency to ETL pipeline</title>
      <link>https://community.databricks.com/t5/data-engineering/adding-maven-dependency-to-etl-pipeline/m-p/139585#M51240</link>
      <description>&lt;P&gt;Hey &lt;a href="https://community.databricks.com/t5/user/viewprofilepage/user-id/198085"&gt;@anhnnguyen&lt;/a&gt;, you can add libraries a few ways when building a notebook-based ETL pipeline:&lt;/P&gt;
&lt;P&gt;The best practice, scalable approach to add libraries across multiple workloads or clusters is to use Policy-scoped libraries. Any compute that uses the cluster policy you define will add any dependencies to the cluster at runtime. Check this: &lt;A title="policy-scoped-libraries" href="https://learn.microsoft.com/en-us/azure/databricks/admin/clusters/policies" target="_blank" rel="noopener"&gt;Policy-scoped libraries&lt;/A&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;If you only need to add libraries to a single workload or cluster, use compute-scoped libraries. &lt;BR /&gt;Check this: &lt;A title="compute-scoped-libraries" href="https://learn.microsoft.com/en-us/azure/databricks/libraries/cluster-libraries" target="_blank" rel="noopener"&gt;Compute-scoped Libraries&lt;/A&gt;&amp;nbsp;&lt;/P&gt;</description>
      <pubDate>Wed, 19 Nov 2025 02:05:24 GMT</pubDate>
      <guid>https://community.databricks.com/t5/data-engineering/adding-maven-dependency-to-etl-pipeline/m-p/139585#M51240</guid>
      <dc:creator>XP</dc:creator>
      <dc:date>2025-11-19T02:05:24Z</dc:date>
    </item>
    <item>
      <title>Re: Adding maven dependency to ETL pipeline</title>
      <link>https://community.databricks.com/t5/data-engineering/adding-maven-dependency-to-etl-pipeline/m-p/139587#M51241</link>
      <description>&lt;P&gt;thanks&amp;nbsp;&lt;a href="https://community.databricks.com/t5/user/viewprofilepage/user-id/142356"&gt;@XP&lt;/a&gt;, it worked like a charm&lt;/P&gt;&lt;P&gt;actually I did try with policy before but the one I tried is usage policy so that I could not find where to add library lol&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;</description>
      <pubDate>Wed, 19 Nov 2025 02:30:16 GMT</pubDate>
      <guid>https://community.databricks.com/t5/data-engineering/adding-maven-dependency-to-etl-pipeline/m-p/139587#M51241</guid>
      <dc:creator>anhnnguyen</dc:creator>
      <dc:date>2025-11-19T02:30:16Z</dc:date>
    </item>
  </channel>
</rss>

