<?xml version="1.0" encoding="UTF-8"?>
<rss xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:taxo="http://purl.org/rss/1.0/modules/taxonomy/" version="2.0">
  <channel>
    <title>topic Re: Using a proxy server to install packages from PyPI in Azure Databricks in Administration &amp; Architecture</title>
    <link>https://community.databricks.com/t5/administration-architecture/using-a-proxy-server-to-install-packages-from-pypi-in-azure/m-p/113357#M3169</link>
    <description>&lt;P&gt;Thanks Isi, this is great info. I'll update once I've tried it.&lt;/P&gt;</description>
    <pubDate>Sat, 22 Mar 2025 19:18:24 GMT</pubDate>
    <dc:creator>mzs</dc:creator>
    <dc:date>2025-03-22T19:18:24Z</dc:date>
    <item>
      <title>Using a proxy server to install packages from PyPI in Azure Databricks</title>
      <link>https://community.databricks.com/t5/administration-architecture/using-a-proxy-server-to-install-packages-from-pypi-in-azure/m-p/112743#M3128</link>
      <description>&lt;P&gt;Hi,&lt;/P&gt;&lt;P&gt;I'm setting up a workspace in Azure and would like to put some restrictions in place on outbound Internet access to reduce the risk of data exfiltration from notebooks and jobs. I plan to use VNet Injection and SCC + back-end private link for compute to control plane traffic. I understand that means the compute subnets can be set up without direct outbound Internet access.&lt;/P&gt;&lt;P&gt;I've seen guides like&amp;nbsp;&lt;A href="https://www.databricks.com/blog/data-exfiltration-protection-with-azure-databricks" target="_blank" rel="noopener"&gt;https://www.databricks.com/blog/data-exfiltration-protection-with-azure-databricks&lt;/A&gt;&amp;nbsp;where a network virtual appliance like Azure Firewall is used to allow traffic to certain domains (.pypi.org, .pythonhosted.org, etc.).&lt;/P&gt;&lt;P&gt;As an alternative to Azure Firewall, I'd like to use an explicit HTTP proxy to better align with other infrastructure. I know in general pip can work behind a proxy if the http_proxy / https_proxy environment variables are set.&lt;/P&gt;&lt;OL&gt;&lt;LI&gt;Is there a way to configure a compute cluster to use an HTTP proxy for installing libraries? In particular, I'm interested in making it easy for a user to install notebook-scoped Python libraries from PyPI using a normal %pip command. Is there something I could do in a cluster-scoped init script to set the environment variables http_proxy and https_proxy so they're available to notebooks? Would I need to add anything to no_proxy, to allow normal connections to the control plane via the back-end private link?&lt;/LI&gt;&lt;LI&gt;Are there other outbound connections needed for normal job / notebook execution, other than package repositories like PyPI?&lt;/LI&gt;&lt;LI&gt;If a user clones a repo from GitHub in the workspace UI, is that traffic coming from the control plane or compute plane?&lt;/LI&gt;&lt;/OL&gt;&lt;P&gt;Thanks!&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;</description>
      <pubDate>Sun, 16 Mar 2025 19:42:31 GMT</pubDate>
      <guid>https://community.databricks.com/t5/administration-architecture/using-a-proxy-server-to-install-packages-from-pypi-in-azure/m-p/112743#M3128</guid>
      <dc:creator>mzs</dc:creator>
      <dc:date>2025-03-16T19:42:31Z</dc:date>
    </item>
    <item>
      <title>Re: Using a proxy server to install packages from PyPI in Azure Databricks</title>
      <link>https://community.databricks.com/t5/administration-architecture/using-a-proxy-server-to-install-packages-from-pypi-in-azure/m-p/112848#M3135</link>
      <description>&lt;P&gt;Hey&amp;nbsp;&lt;a href="https://community.databricks.com/t5/user/viewprofilepage/user-id/153687"&gt;@mzs&lt;/a&gt;&amp;nbsp;,&lt;BR /&gt;&lt;BR /&gt;&lt;/P&gt;&lt;P class=""&gt;If I understood correctly, you want to configure a &lt;SPAN class=""&gt;Databricks compute cluster&lt;/SPAN&gt; to use an &lt;SPAN class=""&gt;HTTP proxy&lt;/SPAN&gt; for installing libraries via &lt;SPAN class=""&gt;%pip install&lt;/SPAN&gt;, instead of using Azure Firewall.&lt;/P&gt;&lt;P class=""&gt;Yes, this should be possible by setting the &lt;SPAN class=""&gt;&lt;STRONG&gt;http_proxy&lt;/STRONG&gt;&lt;/SPAN&gt; and &lt;SPAN class=""&gt;&lt;STRONG&gt;https_proxy&lt;/STRONG&gt;&lt;/SPAN&gt; environment variables in an &lt;SPAN class=""&gt;&lt;STRONG&gt;init script&lt;/STRONG&gt;&lt;/SPAN&gt;. This way, any request from the compute plane (like installing packages from PyPI) will go through the proxy.&lt;BR /&gt;&lt;BR /&gt;&lt;/P&gt;&lt;P class=""&gt;You can try adding the following &lt;SPAN class=""&gt;&lt;STRONG&gt;init script&lt;/STRONG&gt;&lt;/SPAN&gt; to your cluster:&lt;/P&gt;&lt;LI-CODE lang="markup"&gt;#!/bin/bash
echo "export http_proxy=http://&amp;lt;proxy-address&amp;gt;:&amp;lt;port&amp;gt;" &amp;gt;&amp;gt; /etc/environment
echo "export https_proxy=http://&amp;lt;proxy-address&amp;gt;:&amp;lt;port&amp;gt;" &amp;gt;&amp;gt; /etc/environment
echo "export NO_PROXY=169.254.169.254,*.azuredatabricks.net,*.blob.core.windows.net,*.dfs.core.windows.net,*.table.core.windows.net,*.queue.core.windows.net,*.service.signalr.net" &amp;gt;&amp;gt; /etc/environment
source /etc/environment&lt;/LI-CODE&gt;&lt;P class=""&gt;•&lt;SPAN class=""&gt;%pip install&lt;/SPAN&gt; uses the proxy automatically.&lt;BR /&gt;•Internal traffic to Azure services and the &lt;SPAN class=""&gt;Databricks control plane&lt;/SPAN&gt; still works (via &lt;SPAN class=""&gt;NO_PROXY&lt;/SPAN&gt;).&lt;BR /&gt;&lt;BR /&gt;&lt;/P&gt;&lt;P class=""&gt;I’ve never tested this exact setup before, so if you try it out, I’d really appreciate it if you could share your results.&lt;BR /&gt;&lt;BR /&gt;Hope this helps &lt;span class="lia-unicode-emoji" title=":slightly_smiling_face:"&gt;🙂&lt;/span&gt;&lt;BR /&gt;&lt;BR /&gt;Isi&lt;/P&gt;&lt;P class=""&gt;&lt;BR /&gt;&lt;BR /&gt;&lt;/P&gt;</description>
      <pubDate>Mon, 17 Mar 2025 21:10:03 GMT</pubDate>
      <guid>https://community.databricks.com/t5/administration-architecture/using-a-proxy-server-to-install-packages-from-pypi-in-azure/m-p/112848#M3135</guid>
      <dc:creator>Isi</dc:creator>
      <dc:date>2025-03-17T21:10:03Z</dc:date>
    </item>
    <item>
      <title>Re: Using a proxy server to install packages from PyPI in Azure Databricks</title>
      <link>https://community.databricks.com/t5/administration-architecture/using-a-proxy-server-to-install-packages-from-pypi-in-azure/m-p/113357#M3169</link>
      <description>&lt;P&gt;Thanks Isi, this is great info. I'll update once I've tried it.&lt;/P&gt;</description>
      <pubDate>Sat, 22 Mar 2025 19:18:24 GMT</pubDate>
      <guid>https://community.databricks.com/t5/administration-architecture/using-a-proxy-server-to-install-packages-from-pypi-in-azure/m-p/113357#M3169</guid>
      <dc:creator>mzs</dc:creator>
      <dc:date>2025-03-22T19:18:24Z</dc:date>
    </item>
  </channel>
</rss>

