Hey @mzs ,
If I understood correctly, you want to configure a Databricks compute cluster to use an HTTP proxy for installing libraries via %pip install, instead of using Azure Firewall.
Yes, this should be possible by setting the http_proxy and https_proxy environment variables in an init script. This way, any request from the compute plane (like installing packages from PyPI) will go through the proxy.
You can try adding the following init script to your cluster:
#!/bin/bash
echo "export http_proxy=http://<proxy-address>:<port>" >> /etc/environment
echo "export https_proxy=http://<proxy-address>:<port>" >> /etc/environment
echo "export NO_PROXY=169.254.169.254,*.azuredatabricks.net,*.blob.core.windows.net,*.dfs.core.windows.net,*.table.core.windows.net,*.queue.core.windows.net,*.service.signalr.net" >> /etc/environment
source /etc/environment
•%pip install uses the proxy automatically.
•Internal traffic to Azure services and the Databricks control plane still works (via NO_PROXY).
I’ve never tested this exact setup before, so if you try it out, I’d really appreciate it if you could share your results.
Hope this helps 🙂
Isi