cancel
Showing results for 
Search instead for 
Did you mean: 
Data Engineering
Join discussions on data engineering best practices, architectures, and optimization strategies within the Databricks Community. Exchange insights and solutions with fellow data engineers.
cancel
Showing results for 
Search instead for 
Did you mean: 

Can not connect to databricks on Azure Machine Learning Compute Cluster.

Etyr
Contributor

Hello,

I'am having an issue where I have :

  • A local machine in WSL 1,
    • Python 3.8 and 3.10
    • OpenJDK 19.0.1 (version "build 19.0.1+10-21")
  • Compute Instance In Azure Machine Learning
    • Python 3.8
    • OpenJDK 8 (version "1.8.0_392")
  • Compute Cluster in Azure Machine Learning with custom Dockerfile
    • Python 3.10
    • OpenJDK 19.0.1 (version "build 19.0.1+10-21")

And I can not acces/launch my pyspark in compute cluster where others I can. Here is how I install OpenJDK in Compute Cluster (dockerfile) + local WSL:

 

 

RUN wget https://download.java.net/java/GA/jdk19.0.1/afdd2e245b014143b62ccb916125e3ce/10/GPL/openjdk-19.0.1_linux-x64_bin.tar.gz \
    && tar xvf openjdk-19.0.1_linux-x64_bin.tar.gz \
    && mv jdk-19.0.1 /opt/ \
    && rm openjdk-19.0.1_linux-x64_bin.tar.gz

ENV JAVA_HOME /opt/jdk-19.0.1
ENV PATH="${PATH}:$JAVA_HOME/bin"

 

In both of them I have this output of `java --version` to:

openjdk 19.0.1 2022-10-18
OpenJDK Runtime Environment (build 19.0.1+10-21)
OpenJDK 64-Bit Server VM (build 19.0.1+10-21, mixed mode, sharing)

I did not installed OpenJDK 8 on the compute instance, it was preinstalled by Azure in the VM.

Both Compute Instance and Compute Cluster are in the same subnet in Azure, so they don't have network issue to access databricks (all private endpoints are working).

Here is the error I have when launching a simple spark command in Compute Cluster:

 

 

Exception in thread "main" java.lang.ExceptionInInitializerError
	at org.apache.spark.deploy.SparkSubmitArguments.$anonfun$loadEnvironmentArguments$5(SparkSubmitArguments.scala:163)
	at scala.Option.orElse(Option.scala:447)
	at org.apache.spark.deploy.SparkSubmitArguments.loadEnvironmentArguments(SparkSubmitArguments.scala:163)
	at org.apache.spark.deploy.SparkSubmitArguments.<init>(SparkSubmitArguments.scala:118)
	at org.apache.spark.deploy.SparkSubmit$$anon$2$$anon$3.<init>(SparkSubmit.scala:1046)
	at org.apache.spark.deploy.SparkSubmit$$anon$2.parseArguments(SparkSubmit.scala:1046)
	at org.apache.spark.deploy.SparkSubmit.doSubmit(SparkSubmit.scala:85)
	at org.apache.spark.deploy.SparkSubmit$$anon$2.doSubmit(SparkSubmit.scala:1063)
	at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:1072)
	at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)
Caused by: java.net.UnknownHostException: 019325cc430b495e91604bf9052029ac000000: 019325cc430b495e91604bf9052029ac000000: Name or service not known
	at java.base/java.net.InetAddress.getLocalHost(InetAddress.java:1776)
	at org.apache.spark.util.Utils$.findLocalInetAddress(Utils.scala:1211)
	at org.apache.spark.util.Utils$.localIpAddress$lzycompute(Utils.scala:1204)
	at org.apache.spark.util.Utils$.localIpAddress(Utils.scala:1204)
	at org.apache.spark.util.Utils$.$anonfun$localCanonicalHostName$1(Utils.scala:1261)
	at scala.Option.getOrElse(Option.scala:189)
	at org.apache.spark.util.Utils$.localCanonicalHostName(Utils.scala:1261)
	at org.apache.spark.internal.config.package$.<init>(package.scala:1080)
	at org.apache.spark.internal.config.package$.<clinit>(package.scala)
	... 10 more
Caused by: java.net.UnknownHostException: 019325cc430b495e91604bf9052029ac000000: Name or service not known
	at java.base/java.net.Inet6AddressImpl.lookupAllHostAddr(Native Method)
	at java.base/java.net.Inet6AddressImpl.lookupAllHostAddr(Inet6AddressImpl.java:52)
	at java.base/java.net.InetAddress$PlatformResolver.lookupByName(InetAddress.java:1059)
	at java.base/java.net.InetAddress.getAddressesFromNameService(InetAddress.java:1668)
	at java.base/java.net.InetAddress$NameServiceAddresses.get(InetAddress.java:1003)
	at java.base/java.net.InetAddress.getAllByName0(InetAddress.java:1658)
	at java.base/java.net.InetAddress.getLocalHost(InetAddress.java:1771)
	... 18 more

 

From the compute cluster, I can curl to the databricks API to genetate a Personnal Access Token.

I also did a class that will automatically generate an Oauth2 token from Azure then use it to generate a databricks PAT then set up "databricks-connect":

 

stdin_list = [
    "https://" + settings.databricks_address,
    DatabricksTokenManager(settings.databricks_address).pat,
    settings.databricks_cluster_id,
    settings.databricks_org_id,
    str(settings.databricks_port),
]

stdin_string = "\n".join(stdin_list)
with subprocess.Popen(
    (["echo", "-e", stdin_string]), stdout=subprocess.PIPE
) as echo:
    subprocess.check_output(
        ("databricks-connect", "configure"), stdin=echo.stdout
    )
    echo.wait()

 

 

settings.databricks_address have a string if this format "adb-xxxxxxxxxxxx.x.azuredatabricks.net/"

settings.databricks_cluster_id is taken from the databricks URL and a cluster, same for the organisation id and port.

{
  "token": "dapixxxxxxxxxxxxxxxxxxxxxxx-2",
  "cluster_id": "0119-xxxxxx-xxxxxxx",
  "org_id": "542xxxxxxxxxxx",
  "port": "15001"
}

So I can not understand why it is working everywhere expect compute cluster with the same configuration of python code and OpenJDK/python version.

 

 

2 REPLIES 2

Etyr
Contributor

Additional information I forgot to write.

Compute Instance has a User managed Identity in Azure, a Service Principal access is created in databricks with its Application ID. Same with the compute cluster, it has its own User Managed Identity that is also a SP in Databricks.

Both of them have the correct roles/rights to access clusters.

In local, I do "az login" to get my personal user, which also in databricks but has user.

In the compute cluster, I outputed the .databricks-connect file, to put it in my local computer. And tried to run the python code and it worked. I checked the Cluster logs, and I was using the Computer Cluster Service Principal which is the managed identity on Azure. So The Service Principal has the correct rights.

Etyr_0-1706612065861.png

 

I managed to recreate the error in compute instance, by deleting the /etc/hosts file.

Exception in thread "main" java.lang.ExceptionInInitializerError
        at org.apache.spark.deploy.SparkSubmitArguments.$anonfun$loadEnvironmentArguments$5(SparkSubmitArguments.scala:163)
        at scala.Option.orElse(Option.scala:447)
        at org.apache.spark.deploy.SparkSubmitArguments.loadEnvironmentArguments(SparkSubmitArguments.scala:163)
        at org.apache.spark.deploy.SparkSubmitArguments.<init>(SparkSubmitArguments.scala:118)
        at org.apache.spark.deploy.SparkSubmit$$anon$2$$anon$3.<init>(SparkSubmit.scala:1046)
        at org.apache.spark.deploy.SparkSubmit$$anon$2.parseArguments(SparkSubmit.scala:1046)
        at org.apache.spark.deploy.SparkSubmit.doSubmit(SparkSubmit.scala:85)
        at org.apache.spark.deploy.SparkSubmit$$anon$2.doSubmit(SparkSubmit.scala:1063)
        at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:1072)
        at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)
Caused by: java.net.UnknownHostException: my-compute-instance: my-compute-instance: Name or service not known
        at java.net.InetAddress.getLocalHost(InetAddress.java:1432)
        at org.apache.spark.util.Utils$.findLocalInetAddress(Utils.scala:1211)
        at org.apache.spark.util.Utils$.localIpAddress$lzycompute(Utils.scala:1204)
        at org.apache.spark.util.Utils$.localIpAddress(Utils.scala:1204)
        at org.apache.spark.util.Utils$.$anonfun$localCanonicalHostName$1(Utils.scala:1261)
        at scala.Option.getOrElse(Option.scala:189)
        at org.apache.spark.util.Utils$.localCanonicalHostName(Utils.scala:1261)
        at org.apache.spark.internal.config.package$.<init>(package.scala:1080)
        at org.apache.spark.internal.config.package$.<clinit>(package.scala)
        ... 10 more
Caused by: java.net.UnknownHostException: my-compute-instance: Name or service not known
        at java.net.Inet6AddressImpl.lookupAllHostAddr(Native Method)
        at java.net.InetAddress$2.lookupAllHostAddr(InetAddress.java:867)
        at java.net.InetAddress.getAddressesFromNameService(InetAddress.java:1302)
        at java.net.InetAddress$NameServiceAddresses.get(InetAddress.java:815)
        at java.net.InetAddress.getAllByName0(InetAddress.java:1291)
        at java.net.InetAddress.getLocalHost(InetAddress.java:1427)
        ... 18 more

Here is the file in the compute instance:

127.0.0.1 localhost my-compute-instance my-compute-instance my-compute-instance my-compute-instance my-compute-instance my-compute-instance my-compute-instance my-compute-instance my-compute-instance my-compute-instance my-compute-instance my-compute-instance my-compute-instance my-compute-instance

# The following lines are desirable for IPv6 capable hosts
::1 ip6-localhost ip6-loopback
fe00::0 ip6-localnet
ff00::0 ip6-mcastprefix
ff02::1 ip6-allnodes
ff02::2 ip6-allrouters
ff02::3 ip6-allhosts

Here is the hosts file in compute clusters:

127.0.0.1 localhost

# The following lines are desirable for IPv6 capable hosts
::1 ip6-localhost ip6-loopback
fe00::0 ip6-localnet
ff00::0 ip6-mcastprefix
ff02::1 ip6-allnodes
ff02::2 ip6-allrouters
ff02::3 ip6-allhosts

Connect with Databricks Users in Your Area

Join a Regional User Group to connect with local Databricks users. Events will be happening in your city, and you won’t want to miss the chance to attend and share knowledge.

If there isn’t a group near you, start one and help create a community that brings people together.

Request a New Group