Databricks Community

Nagasundaram · ‎02-12-2024

I'm trying to connect to databricks from java using the java sdk and get cluster/sqlWarehouse state. I'm able to connect and get cluster state from my local. But, once I deploy it to the server, my company's network is not allowing the connection. We need to use proxy here but I'm not sure how to use proxy with the databricks java sdk.

Below is the code that works in local env:

DatabricksConfig config = new DatabricksConfig().setHost("https://name.databricks.com").setToken("myToken").resolve();

WorkspaceClient wc = new WorkspaceClient (config); wc.clusters().get("myClusterId").getState().toString();

Any hint or suggestion would be very helpful.

Kaniz_Fatma · ‎02-14-2024

Hi @Nagasundaram,

To connect to Databricks from Java using the Databricks SDK and handle proxy settings, follow these steps:

Add Databricks SDK Dependency: In your project’s pom.xml file (if using Maven), add the Databricks SDK for Java as a dependency. Replace 0.0.1 with the latest version of the SDK:
```
<dependencies>
    <dependency>
        <groupId>com.databricks</groupId>
        <artifactId>databricks-sdk-java</artifactId>
        <version>0.0.1</version>
    </dependency>
</dependencies>
```
Make sure to reload your project in your IDE (e.g., IntelliJ IDEA) after adding the dependency.

Initialize WorkspaceClient: Create a WorkspaceClient instance and authenticate it with your Databricks account or workspace. Here’s an example code snippet:

import com.databricks.sdk.WorkspaceClient;
import com.databricks.sdk.service.compute.ClusterInfo;
import com.databricks.sdk.service.compute.ListClustersRequest;

public class Main {
    public static void main(String[] args) {
        WorkspaceClient workspaceClient = new WorkspaceClient();
        for (ClusterInfo clusterInfo : workspaceClient.clusters().list(new ListClustersRequest())) {
            System.out.println("Cluster Name: " + clusterInfo.getClusterName());
        }
    }
}

Proxy Configuration: To use a proxy with the Databricks SDK, you’ll need to set JVM command-line options. If your company’s network requires a proxy, you can pass the proxy settings as JVM arguments. Here’s how you can do it:
- If the Data Integration Service is on a private network, you can connect to Databricks using a proxy server. Set the following JVM command-line options in the advanced process properties for the Data Integrati...
```
-Dhttp.proxyHost=your_proxy_host -Dhttp.proxyPort=your_proxy_port
-Dhttps.proxyHost=your_proxy_host -Dhttps.proxyPort=your_proxy_port
```
  Replace your_proxy_host and your_proxy_port with the actual proxy server details.

Remember to adjust the proxy settings according to your company’s network configuration. Once you’ve set up the proxy, your Java application should be able to connect to Databricks even when deployed on the server. 🚀

Nagasundaram · ‎02-16-2024

Hi @Kaniz_Fatma

Thanks for the reply. I tried adding the arguments in manifest.yml file like this:

JAVA_OPTS: "-Dhttp.proxyHost='your_proxy_hos't -Dhttp.proxyPort='your_proxy_port' -Dhttps.proxyHost='your_proxy_host' -Dhttps.proxyPort='your_proxy_port'"

Still, I'm getting connection refused to databricks when I deploy this to PCF.

Is using WorkspaceClient the right way? Or, do I need to use AccountClient?

AlliaKhosla · ‎02-14-2024

Hi @Nagasundaram

You can make use of the below init script inorder to use a proxy server with Databricks cluster. The content of the init script can be added at "Workspace/shared/setproxy.sh"

==================================================

val proxy = "http://localhost:8888" // set this to your actual proxy

val proxy_host = "localhost"

val proxy_port = "8888"

val no_proxy = "127.0.0.1,.local,169.254.169.254,s3.amazonaws.com,s3.us-east-1.amazonaws.com" // make sure to update no proxy as needed (e.g. for S3 region or any other internal domains)

dbutils.fs.put("Workspace/shared/setproxy.sh", s"""#!/bin/bash

echo "export http_proxy=$proxy" >> /databricks/spark/conf/spark-env.sh

echo "export https_proxy=$proxy" >> /databricks/spark/conf/spark-env.sh

echo "export no_proxy=$no_proxy" >> /databricks/spark/conf/spark-env.sh

echo "export HTTP_PROXY=$proxy" >> /databricks/spark/conf/spark-env.sh

echo "export HTTPS_PROXY=$proxy" >> /databricks/spark/conf/spark-env.sh

echo "export NO_PROXY=$no_proxy" >> /databricks/spark/conf/spark-env.sh

echo "export _JAVA_OPTIONS=\"-Dhttps.proxyHost=${proxy_host} -Dhttps.proxyPort=${proxy_port} -Dhttp.proxyHost=${proxy_host} -Dhttp.proxyPort=${proxy_port} -Dhttp.nonProxyHosts=${java_no_proxy}\"" >> /databricks/spark/conf/spark-env.sh

echo "http_proxy=$proxy" >> /etc/environment

echo "https_proxy=$proxy" >> /etc/environment

echo "no_proxy=$no_proxy" >> /etc/environment

echo "HTTP_PROXY=$proxy" >> /etc/environment

echo "HTTPS_PROXY=$proxy" >> /etc/environment

echo "NO_PROXY=$no_proxy" >> /etc/environment

cat >> /etc/R/Renviron << EOF

http_proxy=$proxy

https_proxy=$proxy

no_proxy=$no_proxy

EOF

""", true)