cancel
Showing results for 
Search instead for 
Did you mean: 
Community Discussions
cancel
Showing results for 
Search instead for 
Did you mean: 

Connect to Databricks using Java SDK through proxy

Nagasundaram
New Contributor II

I'm trying to connect to databricks from java using the java sdk and get cluster/sqlWarehouse state. I'm able to connect and get cluster state from my local. But, once I deploy it to the server, my company's network is not allowing the connection. We need to use proxy here but I'm not sure how to use proxy with the databricks java sdk.

Below is the code that works in local env:

DatabricksConfig config = new DatabricksConfig().setHost("https://name.databricks.com").setToken("myToken").resolve();

WorkspaceClient wc = new WorkspaceClient (config); wc.clusters().get("myClusterId").getState().toString();

 

Any hint or suggestion would be very helpful.

3 REPLIES 3

Kaniz
Community Manager
Community Manager

Hi @Nagasundaram

To connect to Databricks from Java using the Databricks SDK and handle proxy settings, follow these steps:

  1. Add Databricks SDK Dependency: In your project’s pom.xml file (if using Maven), add the Databricks SDK for Java as a dependency. Replace 0.0.1 with the latest version of the SDK:

    <dependencies>
        <dependency>
            <groupId>com.databricks</groupId>
            <artifactId>databricks-sdk-java</artifactId>
            <version>0.0.1</version>
        </dependency>
    </dependencies>
    

    Make sure to reload your project in your IDE (e.g., IntelliJ IDEA) after adding the dependency.

  2. Initialize WorkspaceClient: Create a WorkspaceClient instance and authenticate it with your Databricks account or workspace. Here’s an example code snippet:

    import com.databricks.sdk.WorkspaceClient;
    import com.databricks.sdk.service.compute.ClusterInfo;
    import com.databricks.sdk.service.compute.ListClustersRequest;
    
    public class Main {
        public static void main(String[] args) {
            WorkspaceClient workspaceClient = new WorkspaceClient();
            for (ClusterInfo clusterInfo : workspaceClient.clusters().list(new ListClustersRequest())) {
                System.out.println("Cluster Name: " + clusterInfo.getClusterName());
            }
        }
    }
    
  3. Proxy Configuration: To use a proxy with the Databricks SDK, you’ll need to set JVM command-line options. If your company’s network requires a proxy, you can pass the proxy settings as JVM arguments. Here’s how you can do it:

Remember to adjust the proxy settings according to your company’s network configuration. Once you’ve set up the proxy, your Java application should be able to connect to Databricks even when deployed on the server. 🚀

 

Nagasundaram
New Contributor II

Hi @Kaniz 

Thanks for the reply. I tried adding the arguments in manifest.yml file like this:

JAVA_OPTS: "-Dhttp.proxyHost='your_proxy_hos't -Dhttp.proxyPort='your_proxy_port' -Dhttps.proxyHost='your_proxy_host' -Dhttps.proxyPort='your_proxy_port'"

 

Still, I'm getting connection refused to databricks when I deploy this to PCF.

Is using WorkspaceClient the right way? Or, do I need to use AccountClient?

AlliaKhosla
New Contributor III
New Contributor III

Hi  @Nagasundaram 

 

You can make use of the below init script inorder to use a proxy server with Databricks cluster. The content of the init script can be added at  "Workspace/shared/setproxy.sh" 

==================================================

val proxy = "http://localhost:8888" // set this to your actual proxy

val proxy_host = "localhost"

val proxy_port = "8888"

val no_proxy = "127.0.0.1,.local,169.254.169.254,s3.amazonaws.com,s3.us-east-1.amazonaws.com" // make sure to update no proxy as needed (e.g. for S3 region or any other internal domains)

val java_no_proxy = "localhost|127.*|[::1]|169.254.169.254|s3.amazonaws.com|*.s3.amazonaws.com|s3.us-east-1.amazonaws.com|*.s3.us-east-1.amazonaws.com|10.*" // replace 10.* with cluster IP range!!!!!!

 

dbutils.fs.put("Workspace/shared/setproxy.sh", s"""#!/bin/bash

echo "export http_proxy=$proxy" >> /databricks/spark/conf/spark-env.sh

echo "export https_proxy=$proxy" >> /databricks/spark/conf/spark-env.sh

echo "export no_proxy=$no_proxy" >> /databricks/spark/conf/spark-env.sh

echo "export HTTP_PROXY=$proxy" >> /databricks/spark/conf/spark-env.sh

echo "export HTTPS_PROXY=$proxy" >> /databricks/spark/conf/spark-env.sh

echo "export NO_PROXY=$no_proxy" >> /databricks/spark/conf/spark-env.sh

echo "export _JAVA_OPTIONS=\"-Dhttps.proxyHost=${proxy_host} -Dhttps.proxyPort=${proxy_port} -Dhttp.proxyHost=${proxy_host} -Dhttp.proxyPort=${proxy_port} -Dhttp.nonProxyHosts=${java_no_proxy}\"" >> /databricks/spark/conf/spark-env.sh

 

echo "http_proxy=$proxy" >> /etc/environment

echo "https_proxy=$proxy" >> /etc/environment

echo "no_proxy=$no_proxy" >> /etc/environment

echo "HTTP_PROXY=$proxy" >> /etc/environment

echo "HTTPS_PROXY=$proxy" >> /etc/environment

echo "NO_PROXY=$no_proxy" >> /etc/environment

 

cat >> /etc/R/Renviron << EOF

http_proxy=$proxy

https_proxy=$proxy

no_proxy=$no_proxy

EOF

""", true)

==================================================

Please test it carefully in your environment.

 

Let me know if that helps. 

 

Welcome to Databricks Community: Lets learn, network and celebrate together

Join our fast-growing data practitioner and expert community of 80K+ members, ready to discover, help and collaborate together while making meaningful connections. 

Click here to register and join today! 

Engage in exciting technical discussions, join a group with your peers and meet our Featured Members.