topic Re: Can not connect to databricks on Azure Machine Learning Compute Cluster. in Data Engineering

Can not connect to databricks on Azure Machine Learning Compute Cluster.

Etyr — Tue, 30 Jan 2024 10:20:37 GMT

Hello,

I'am having an issue where I have :

A local machine in WSL 1,
- Python 3.8 and 3.10
- OpenJDK 19.0.1 (version "build 19.0.1+10-21")
Compute Instance In Azure Machine Learning
- Python 3.8
- OpenJDK 8 (version "1.8.0_392")
Compute Cluster in Azure Machine Learning with custom Dockerfile
- Python 3.10
- OpenJDK 19.0.1 (version "build 19.0.1+10-21")

And I can not acces/launch my pyspark in compute cluster where others I can. Here is how I install OpenJDK in Compute Cluster (dockerfile) + local WSL:

RUN wget https://download.java.net/java/GA/jdk19.0.1/afdd2e245b014143b62ccb916125e3ce/10/GPL/openjdk-19.0.1_linux-x64_bin.tar.gz \ && tar xvf openjdk-19.0.1_linux-x64_bin.tar.gz \ && mv jdk-19.0.1 /opt/ \ && rm openjdk-19.0.1_linux-x64_bin.tar.gz ENV JAVA_HOME /opt/jdk-19.0.1 ENV PATH="${PATH}:$JAVA_HOME/bin"

In both of them I have this output of `java --version` to:

openjdk 19.0.1 2022-10-18
OpenJDK Runtime Environment (build 19.0.1+10-21)
OpenJDK 64-Bit Server VM (build 19.0.1+10-21, mixed mode, sharing)

I did not installed OpenJDK 8 on the compute instance, it was preinstalled by Azure in the VM.

Both Compute Instance and Compute Cluster are in the same subnet in Azure, so they don't have network issue to access databricks (all private endpoints are working).

Here is the error I have when launching a simple spark command in Compute Cluster:

Exception in thread "main" java.lang.ExceptionInInitializerError at org.apache.spark.deploy.SparkSubmitArguments.$anonfun$loadEnvironmentArguments$5(SparkSubmitArguments.scala:163) at scala.Option.orElse(Option.scala:447) at org.apache.spark.deploy.SparkSubmitArguments.loadEnvironmentArguments(SparkSubmitArguments.scala:163) at org.apache.spark.deploy.SparkSubmitArguments.<init>(SparkSubmitArguments.scala:118) at org.apache.spark.deploy.SparkSubmit$$anon$2$$anon$3.<init>(SparkSubmit.scala:1046) at org.apache.spark.deploy.SparkSubmit$$anon$2.parseArguments(SparkSubmit.scala:1046) at org.apache.spark.deploy.SparkSubmit.doSubmit(SparkSubmit.scala:85) at org.apache.spark.deploy.SparkSubmit$$anon$2.doSubmit(SparkSubmit.scala:1063) at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:1072) at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala) Caused by: java.net.UnknownHostException: 019325cc430b495e91604bf9052029ac000000: 019325cc430b495e91604bf9052029ac000000: Name or service not known at java.base/java.net.InetAddress.getLocalHost(InetAddress.java:1776) at org.apache.spark.util.Utils$.findLocalInetAddress(Utils.scala:1211) at org.apache.spark.util.Utils$.localIpAddress$lzycompute(Utils.scala:1204) at org.apache.spark.util.Utils$.localIpAddress(Utils.scala:1204) at org.apache.spark.util.Utils$.$anonfun$localCanonicalHostName$1(Utils.scala:1261) at scala.Option.getOrElse(Option.scala:189) at org.apache.spark.util.Utils$.localCanonicalHostName(Utils.scala:1261) at org.apache.spark.internal.config.package$.<init>(package.scala:1080) at org.apache.spark.internal.config.package$.<clinit>(package.scala) ... 10 more Caused by: java.net.UnknownHostException: 019325cc430b495e91604bf9052029ac000000: Name or service not known at java.base/java.net.Inet6AddressImpl.lookupAllHostAddr(Native Method) at java.base/java.net.Inet6AddressImpl.lookupAllHostAddr(Inet6AddressImpl.java:52) at java.base/java.net.InetAddress$PlatformResolver.lookupByName(InetAddress.java:1059) at java.base/java.net.InetAddress.getAddressesFromNameService(InetAddress.java:1668) at java.base/java.net.InetAddress$NameServiceAddresses.get(InetAddress.java:1003) at java.base/java.net.InetAddress.getAllByName0(InetAddress.java:1658) at java.base/java.net.InetAddress.getLocalHost(InetAddress.java:1771) ... 18 more

From the compute cluster, I can curl to the databricks API to genetate a Personnal Access Token.

I also did a class that will automatically generate an Oauth2 token from Azure then use it to generate a databricks PAT then set up "databricks-connect":

stdin_list = [ "https://" + settings.databricks_address, DatabricksTokenManager(settings.databricks_address).pat, settings.databricks_cluster_id, settings.databricks_org_id, str(settings.databricks_port), ] stdin_string = "\n".join(stdin_list) with subprocess.Popen( (["echo", "-e", stdin_string]), stdout=subprocess.PIPE ) as echo: subprocess.check_output( ("databricks-connect", "configure"), stdin=echo.stdout ) echo.wait()

settings.databricks_address have a string if this format "adb-xxxxxxxxxxxx.x.azuredatabricks.net/"

settings.databricks_cluster_id is taken from the databricks URL and a cluster, same for the organisation id and port.

{

"host": "https://adb-xxxxxxxxxxxxxxx.x.azuredatabricks.net",

"token": "dapixxxxxxxxxxxxxxxxxxxxxxx-2",

"cluster_id": "0119-xxxxxx-xxxxxxx",

"org_id": "542xxxxxxxxxxx",

"port": "15001"

}

So I can not understand why it is working everywhere expect compute cluster with the same configuration of python code and OpenJDK/python version.

Re: Can not connect to databricks on Azure Machine Learning Compute Cluster.

Etyr — Tue, 30 Jan 2024 10:54:32 GMT

Additional information I forgot to write.

Compute Instance has a User managed Identity in Azure, a Service Principal access is created in databricks with its Application ID. Same with the compute cluster, it has its own User Managed Identity that is also a SP in Databricks.

Both of them have the correct roles/rights to access clusters.

In local, I do "az login" to get my personal user, which also in databricks but has user.

In the compute cluster, I outputed the .databricks-connect file, to put it in my local computer. And tried to run the python code and it worked. I checked the Cluster logs, and I was using the Computer Cluster Service Principal which is the managed identity on Azure. So The Service Principal has the correct rights.

Re: Can not connect to databricks on Azure Machine Learning Compute Cluster.

Etyr — Tue, 30 Jan 2024 14:25:45 GMT

I managed to recreate the error in compute instance, by deleting the /etc/hosts file.

Exception in thread "main" java.lang.ExceptionInInitializerError at org.apache.spark.deploy.SparkSubmitArguments.$anonfun$loadEnvironmentArguments$5(SparkSubmitArguments.scala:163) at scala.Option.orElse(Option.scala:447) at org.apache.spark.deploy.SparkSubmitArguments.loadEnvironmentArguments(SparkSubmitArguments.scala:163) at org.apache.spark.deploy.SparkSubmitArguments.<init>(SparkSubmitArguments.scala:118) at org.apache.spark.deploy.SparkSubmit$$anon$2$$anon$3.<init>(SparkSubmit.scala:1046) at org.apache.spark.deploy.SparkSubmit$$anon$2.parseArguments(SparkSubmit.scala:1046) at org.apache.spark.deploy.SparkSubmit.doSubmit(SparkSubmit.scala:85) at org.apache.spark.deploy.SparkSubmit$$anon$2.doSubmit(SparkSubmit.scala:1063) at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:1072) at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala) Caused by: java.net.UnknownHostException: my-compute-instance: my-compute-instance: Name or service not known at java.net.InetAddress.getLocalHost(InetAddress.java:1432) at org.apache.spark.util.Utils$.findLocalInetAddress(Utils.scala:1211) at org.apache.spark.util.Utils$.localIpAddress$lzycompute(Utils.scala:1204) at org.apache.spark.util.Utils$.localIpAddress(Utils.scala:1204) at org.apache.spark.util.Utils$.$anonfun$localCanonicalHostName$1(Utils.scala:1261) at scala.Option.getOrElse(Option.scala:189) at org.apache.spark.util.Utils$.localCanonicalHostName(Utils.scala:1261) at org.apache.spark.internal.config.package$.<init>(package.scala:1080) at org.apache.spark.internal.config.package$.<clinit>(package.scala) ... 10 more Caused by: java.net.UnknownHostException: my-compute-instance: Name or service not known at java.net.Inet6AddressImpl.lookupAllHostAddr(Native Method) at java.net.InetAddress$2.lookupAllHostAddr(InetAddress.java:867) at java.net.InetAddress.getAddressesFromNameService(InetAddress.java:1302) at java.net.InetAddress$NameServiceAddresses.get(InetAddress.java:815) at java.net.InetAddress.getAllByName0(InetAddress.java:1291) at java.net.InetAddress.getLocalHost(InetAddress.java:1427) ... 18 more

Here is the file in the compute instance:

127.0.0.1 localhost my-compute-instance my-compute-instance my-compute-instance my-compute-instance my-compute-instance my-compute-instance my-compute-instance my-compute-instance my-compute-instance my-compute-instance my-compute-instance my-compute-instance my-compute-instance my-compute-instance # The following lines are desirable for IPv6 capable hosts ::1 ip6-localhost ip6-loopback fe00::0 ip6-localnet ff00::0 ip6-mcastprefix ff02::1 ip6-allnodes ff02::2 ip6-allrouters ff02::3 ip6-allhosts

Here is the hosts file in compute clusters:

127.0.0.1 localhost # The following lines are desirable for IPv6 capable hosts ::1 ip6-localhost ip6-loopback fe00::0 ip6-localnet ff00::0 ip6-mcastprefix ff02::1 ip6-allnodes ff02::2 ip6-allrouters ff02::3 ip6-allhosts