cancel
Showing results for 
Search instead for 
Did you mean: 
Data Engineering
Join discussions on data engineering best practices, architectures, and optimization strategies within the Databricks Community. Exchange insights and solutions with fellow data engineers.
cancel
Showing results for 
Search instead for 
Did you mean: 

Why adding the package 'org.apache.spark:spark-sql-kafka-0-10_2.12:3.0.1' failed in runtime 9.1.x-scala2.12 but was successful using runtime 8.2.x-scala2.12 ?

raymund
New Contributor III

Using Databricks spark submit job, setting new cluster

1] "spark_version": "8.2.x-scala2.12" => OK, works fine

2] "spark_version": "9.1.x-scala2.12" => FAIL, with errors

Exception in thread "main" java.lang.ExceptionInInitializerError
	at com.databricks.backend.daemon.driver.WSFSCredentialForwardingHelper.withWSFSCredentials(WorkspaceLocalFileSystem.scala:156)
	at com.databricks.backend.daemon.driver.WSFSCredentialForwardingHelper.withWSFSCredentials$(WorkspaceLocalFileSystem.scala:155)
	at com.databricks.backend.daemon.driver.WorkspaceLocalFileSystem.withWSFSCredentials(WorkspaceLocalFileSystem.scala:30)
	at com.databricks.backend.daemon.driver.WorkspaceLocalFileSystem.getFileStatus(WorkspaceLocalFileSystem.scala:63)
	at org.apache.hadoop.fs.Globber.getFileStatus(Globber.java:57)
	at org.apache.hadoop.fs.Globber.glob(Globber.java:252)
	at org.apache.hadoop.fs.FileSystem.globStatus(FileSystem.java:1657)
	at org.apache.spark.deploy.DependencyUtils$.resolveGlobPath(DependencyUtils.scala:192)
	at org.apache.spark.deploy.DependencyUtils$.$anonfun$resolveGlobPaths$2(DependencyUtils.scala:147)
	at org.apache.spark.deploy.DependencyUtils$.$anonfun$resolveGlobPaths$2$adapted(DependencyUtils.scala:145)
	at scala.collection.TraversableLike.$anonfun$flatMap$1(TraversableLike.scala:245)
	at scala.collection.IndexedSeqOptimized.foreach(IndexedSeqOptimized.scala:36)
	at scala.collection.IndexedSeqOptimized.foreach$(IndexedSeqOptimized.scala:33)
	at scala.collection.mutable.WrappedArray.foreach(WrappedArray.scala:38)
	at scala.collection.TraversableLike.flatMap(TraversableLike.scala:245)
	at scala.collection.TraversableLike.flatMap$(TraversableLike.scala:242)
	at scala.collection.AbstractTraversable.flatMap(Traversable.scala:108)
	at org.apache.spark.deploy.DependencyUtils$.resolveGlobPaths(DependencyUtils.scala:145)
	at org.apache.spark.deploy.SparkSubmit.$anonfun$prepareSubmitEnvironment$4(SparkSubmit.scala:363)
	at scala.Option.map(Option.scala:230)
	at org.apache.spark.deploy.SparkSubmit.prepareSubmitEnvironment(SparkSubmit.scala:363)
	at org.apache.spark.deploy.SparkSubmit.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:894)
	at org.apache.spark.deploy.SparkSubmit.doRunMain$1(SparkSubmit.scala:180)
	at org.apache.spark.deploy.SparkSubmit.submit(SparkSubmit.scala:203)
	at org.apache.spark.deploy.SparkSubmit.doSubmit(SparkSubmit.scala:90)
	at org.apache.spark.deploy.SparkSubmit$$anon$2.doSubmit(SparkSubmit.scala:1039)
	at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:1048)
	at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)
Caused by: java.lang.NullPointerException
	at com.databricks.backend.daemon.driver.WsfsDriverHttpClient.<init>(WSFSDriverHttpClient.scala:26)
	at com.databricks.backend.daemon.driver.WSFSCredentialForwardingHelper$.<init>(WorkspaceLocalFileSystem.scala:277)
	at com.databricks.backend.daemon.driver.WSFSCredentialForwardingHelper$.<clinit>(WorkspaceLocalFileSystem.scala)
	... 28 more

1 ACCEPTED SOLUTION

Accepted Solutions

raymund
New Contributor III

this has been resolved by adding the following spark_conf (not thru --conf)

 "spark.hadoop.fs.file.impl": "org.apache.hadoop.fs.LocalFileSystem"

example:

------

"new_cluster": {

"spark_version": "9.1.x-scala2.12",

...

"spark_conf": {

"spark.hadoop.fs.file.impl": "org.apache.hadoop.fs.LocalFileSystem"

}

},

"spark_submit_task": {

"parameters": [

"--packages",

"org.apache.spark:spark-sql-kafka-0-10_2.12:3.0.1",

...

------------

View solution in original post

7 REPLIES 7

Anonymous
Not applicable

@Raymund Beltran​ - So everything works as expected now? Is that right? If yes, would you be happy to mark your answer as best so others can find it easily?

raymund
New Contributor III

@Piper Wilson​  I removed the comment that its working. Issue still exists with 9.1. Kafka streaming jars doesn't exists. It throws error when kafka streaming jars are not provided and in pyspark code kafka streaming is used. When the kafka streaming jars are explicitly provided in pyspark packages, it throws same error as original issue above

Anonymous
Not applicable

@Raymund Beltran​ - Thanks for letting us know. Let's see what the community has to say about this. We'll circle back if we need to.

raymund
New Contributor III

Additional info:

Using databricks spark-submit api with pyspark

"spark_submit_task": {

    "parameters": [

      "--packages",

      "org.apache.spark:spark-sql-kafka-0-10_2.12:3.0.1",

....

Anonymous
Not applicable

Thank you. I'm passing the information on. Thanks for your patience!

raymund
New Contributor III

this has been resolved by adding the following spark_conf (not thru --conf)

 "spark.hadoop.fs.file.impl": "org.apache.hadoop.fs.LocalFileSystem"

example:

------

"new_cluster": {

"spark_version": "9.1.x-scala2.12",

...

"spark_conf": {

"spark.hadoop.fs.file.impl": "org.apache.hadoop.fs.LocalFileSystem"

}

},

"spark_submit_task": {

"parameters": [

"--packages",

"org.apache.spark:spark-sql-kafka-0-10_2.12:3.0.1",

...

------------

Thank you for sharing the solution to this issue. I think I saw another question with the same error message.

Join 100K+ Data Experts: Register Now & Grow with Us!

Excited to expand your horizons with us? Click here to Register and begin your journey to success!

Already a member? Login and join your local regional user group! If there isn’t one near you, fill out this form and we’ll create one for you to join!