Connect to Databricks using Java SDK through proxy
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
02-12-2024 06:03 AM - edited 02-12-2024 06:05 AM
I'm trying to connect to databricks from java using the java sdk and get cluster/sqlWarehouse state. I'm able to connect and get cluster state from my local. But, once I deploy it to the server, my company's network is not allowing the connection. We need to use proxy here but I'm not sure how to use proxy with the databricks java sdk.
Below is the code that works in local env:
DatabricksConfig config = new DatabricksConfig().setHost("https://name.databricks.com").setToken("myToken").resolve();
WorkspaceClient wc = new WorkspaceClient (config); wc.clusters().get("myClusterId").getState().toString();
Any hint or suggestion would be very helpful.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
02-16-2024 03:58 AM
Hi @Retired_mod
Thanks for the reply. I tried adding the arguments in manifest.yml file like this:
JAVA_OPTS: "-Dhttp.proxyHost='your_proxy_hos't -Dhttp.proxyPort='your_proxy_port' -Dhttps.proxyHost='your_proxy_host' -Dhttps.proxyPort='your_proxy_port'"
Still, I'm getting connection refused to databricks when I deploy this to PCF.
Is using WorkspaceClient the right way? Or, do I need to use AccountClient?
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
02-14-2024 02:13 AM
You can make use of the below init script inorder to use a proxy server with Databricks cluster. The content of the init script can be added at "Workspace/shared/setproxy.sh"
==================================================
val proxy = "http://localhost:8888" // set this to your actual proxy
val proxy_host = "localhost"
val proxy_port = "8888"
val no_proxy = "127.0.0.1,.local,169.254.169.254,s3.amazonaws.com,s3.us-east-1.amazonaws.com" // make sure to update no proxy as needed (e.g. for S3 region or any other internal domains)
val java_no_proxy = "localhost|127.*|[::1]|169.254.169.254|s3.amazonaws.com|*.s3.amazonaws.com|s3.us-east-1.amazonaws.com|*.s3.us-east-1.amazonaws.com|10.*" // replace 10.* with cluster IP range!!!!!!
dbutils.fs.put("Workspace/shared/setproxy.sh", s"""#!/bin/bash
echo "export http_proxy=$proxy" >> /databricks/spark/conf/spark-env.sh
echo "export https_proxy=$proxy" >> /databricks/spark/conf/spark-env.sh
echo "export no_proxy=$no_proxy" >> /databricks/spark/conf/spark-env.sh
echo "export HTTP_PROXY=$proxy" >> /databricks/spark/conf/spark-env.sh
echo "export HTTPS_PROXY=$proxy" >> /databricks/spark/conf/spark-env.sh
echo "export NO_PROXY=$no_proxy" >> /databricks/spark/conf/spark-env.sh
echo "export _JAVA_OPTIONS=\"-Dhttps.proxyHost=${proxy_host} -Dhttps.proxyPort=${proxy_port} -Dhttp.proxyHost=${proxy_host} -Dhttp.proxyPort=${proxy_port} -Dhttp.nonProxyHosts=${java_no_proxy}\"" >> /databricks/spark/conf/spark-env.sh
echo "http_proxy=$proxy" >> /etc/environment
echo "https_proxy=$proxy" >> /etc/environment
echo "no_proxy=$no_proxy" >> /etc/environment
echo "HTTP_PROXY=$proxy" >> /etc/environment
echo "HTTPS_PROXY=$proxy" >> /etc/environment
echo "NO_PROXY=$no_proxy" >> /etc/environment
cat >> /etc/R/Renviron << EOF
http_proxy=$proxy
https_proxy=$proxy
no_proxy=$no_proxy
EOF
""", true)
==================================================
Please test it carefully in your environment.
Let me know if that helps.

