cancel
Showing results for 
Search instead for 
Did you mean: 
Administration & Architecture
Explore discussions on Databricks administration, deployment strategies, and architectural best practices. Connect with administrators and architects to optimize your Databricks environment for performance, scalability, and security.
cancel
Showing results for 
Search instead for 
Did you mean: 

Using a proxy server to install packages from PyPI in Azure Databricks

mzs
Visitor

Hi,

I'm setting up a workspace in Azure and would like to put some restrictions in place on outbound Internet access to reduce the risk of data exfiltration from notebooks and jobs. I plan to use VNet Injection and SCC + back-end private link for compute to control plane traffic. I understand that means the compute subnets can be set up without direct outbound Internet access.

I've seen guides like https://www.databricks.com/blog/data-exfiltration-protection-with-azure-databricks where a network virtual appliance like Azure Firewall is used to allow traffic to certain domains (.pypi.org, .pythonhosted.org, etc.).

As an alternative to Azure Firewall, I'd like to use an explicit HTTP proxy to better align with other infrastructure. I know in general pip can work behind a proxy if the http_proxy / https_proxy environment variables are set.

  1. Is there a way to configure a compute cluster to use an HTTP proxy for installing libraries? In particular, I'm interested in making it easy for a user to install notebook-scoped Python libraries from PyPI using a normal %pip command. Is there something I could do in a cluster-scoped init script to set the environment variables http_proxy and https_proxy so they're available to notebooks? Would I need to add anything to no_proxy, to allow normal connections to the control plane via the back-end private link?
  2. Are there other outbound connections needed for normal job / notebook execution, other than package repositories like PyPI?
  3. If a user clones a repo from GitHub in the workspace UI, is that traffic coming from the control plane or compute plane?

Thanks!

 

0 REPLIES 0

Connect with Databricks Users in Your Area

Join a Regional User Group to connect with local Databricks users. Events will be happening in your city, and you won’t want to miss the chance to attend and share knowledge.

If there isn’t a group near you, start one and help create a community that brings people together.

Request a New Group