cancel
Showing results forย 
Search instead forย 
Did you mean:ย 
Data Engineering
Join discussions on data engineering best practices, architectures, and optimization strategies within the Databricks Community. Exchange insights and solutions with fellow data engineers.
cancel
Showing results forย 
Search instead forย 
Did you mean:ย 

Data bricks -connect error

neeth
New Contributor III

Hello, 

I new to Databricks and Scala. I created a scala application in my local machine and tried to connect to my cluster in databricks workspace using databricks connect as per the documentation. My cluster is using Databricks runtime version 16.0 (includes Apache Spark 3.5.0, Scala 2.12). 

I have added the below dependencies in my build.sbt :

libraryDependencies += "org.apache.spark" %% "spark-sql" % "3.5.1"
libraryDependencies += "org.apache.spark" %% "spark-core" % "3.5.1"
libraryDependencies += "com.databricks" % "databricks-connect" % "16.0.0"
 
I create spark session by 
val spark = DatabricksSession.builder().remote().getOrCreate().
I have a .databrickscfg configuration file with DEFAULT profile with host and cluster_id values.
While running the application , I am getting the below error

Exception in thread "sbt-bg-threads-7" java.lang.NoSuchMethodError: 'org.apache.spark.sql.SparkSession$Builder org.apache.spark.sql.SparkSession$Builder.client(org.apache.spark.sql.connect.client.SparkConnectClient)'

at com.databricks.connect.DatabricksSession$Builder.fromSparkClientConf(DatabricksSession.scala:522)

at com.databricks.connect.DatabricksSession$Builder.fromSdkConfig(DatabricksSession.scala:515)

at com.databricks.connect.DatabricksSession$Builder.getOrCreate(DatabricksSession.scala:446)

at leakagetest.Main$.createSparkSession(Main.scala:41)

at leakagetest.Main$.delayedEndpoint$leakagetest$Main$1(Main.scala:27)

at leakagetest.Main$delayedInit$body.apply(Main.scala:18)

at scala.Function0.apply$mcV$sp(Function0.scala:39)

at scala.Function0.apply$mcV$sp$(Function0.scala:39)

at scala.runtime.AbstractFunction0.apply$mcV$sp(AbstractFunction0.scala:17)

at scala.App.$anonfun$main$1$adapted(App.scala:80)

at scala.collection.immutable.List.foreach(List.scala:431)

at scala.App.main(App.scala:80)

at scala.App.main$(App.scala:78)

at leakagetest.Main$.main(Main.scala:18)

at leakagetest.Main.main(Main.scala)

at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke0(Native Method)

at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:77)

at java.base/jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)

at java.base/java.lang.reflect.Method.invoke(Method.java:568)

at sbt.Run.invokeMain(Run.scala:144)

at sbt.Run.execute$1(Run.scala:94)

at sbt.Run.$anonfun$runWithLoader$5(Run.scala:121)

at sbt.Run$.executeSuccess(Run.scala:187)

at sbt.Run.runWithLoader(Run.scala:121)

at sbt.Defaults$.$anonfun$bgRunTask$6(Defaults.scala:1988)

at sbt.Defaults$.$anonfun$termWrapper$2(Defaults.scala:1927)

at scala.runtime.java8.JFunction0$mcV$sp.apply(JFunction0$mcV$sp.java:23)

at scala.util.Try$.apply(Try.scala:213)

at sbt.internal.BackgroundThreadPool$BackgroundRunnable.run(DefaultBackgroundJobService.scala:367)

at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1136)

at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:635)

at java.base/java.lang.Thread.run(Thread.java:840)

Could some one please help me to solve this error.

9 REPLIES 9

Walter_C
Databricks Employee
Databricks Employee

The error you are encountering, java.lang.NoSuchMethodError: 'org.apache.spark.sql.SparkSession$Builder org.apache.spark.sql.SparkSession$Builder.client(org.apache.spark.sql.connect.client.SparkConnectClient)', suggests that there is a mismatch between the versions of the libraries you are using.

Check Library Versions: Ensure that the versions of spark-core and databricks-connect you are using are compatible with each other. According to the context, the databricks-connect package version must match the Databricks Runtime version. For example, if you are using Databricks Runtime 12.2 LTS, you should use databricks-connect==12.2.*

neeth
New Contributor III

I am using Databricks runtime version  16.0.0, therefore the databricks-connect ==16.0.0 and I changed the spark-core to 3.5.0 as per documentation. Still I am getting the same error.

 

Walter_C
Databricks Employee
Databricks Employee

If you try with lower DBR does it work or same exact issue?

 

neeth
New Contributor III

Got the same error with lower DBR.

I am using OAuth user-to-machine (U2M) authentication and a configuration profile named DEFAULT which contains host, clusterId and auth_type saved in .databrickscfg file. I logged in using databricks cli before running the code.

To create SparkSession I used, 

 val spark = DatabricksSession.builder().remote().getOrCreate()

On debugging I could see that the, cluster_id ,host ,sdkConfig,token is None.

 I followed this tutorial, https://docs.databricks.com/en/dev-tools/databricks-connect/scala/index.html#tutorial 

Walter_C
Databricks Employee
Databricks Employee

so the issue is happening on Step 4, is this the only workspace you have synced in the CLI, or you have done works with any other workspace as well

 

neeth
New Contributor III

This is the only workspace that I have synced.

Walter_C
Databricks Employee
Databricks Employee

Can you try creating another profile instead of the Default one and try with it, it seems that what it is not collecting is the cluster details but wanted to check with a new profile

 

neeth
New Contributor III

I tried creating another profile and used the below code:

val config = new DatabricksConfig().setProfile("myprofile")
val spark = DatabricksSession.builder().sdkConfig(config).getOrCreate()
I got the same error. While debugging I could see always DatabricksConfig is returned empty.

saurabh18cs
Valued Contributor III

try this with parameters once:

 

def get_remote_spark(host: str, cluster_id: str, token: str) -> SparkSession:
    from databricks.connect import DatabricksSession
    return DatabricksSession.builder.remote(host=host, cluster_id=cluster_id, token=token).getOrCreate()
 
 
OR 
 

Run the following command to configure Databricks Connect to use the .databrickscfg file inside CLI:

databricks-connect configure
 
 
 

Connect with Databricks Users in Your Area

Join a Regional User Group to connect with local Databricks users. Events will be happening in your city, and you wonโ€™t want to miss the chance to attend and share knowledge.

If there isnโ€™t a group near you, start one and help create a community that brings people together.

Request a New Group