- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
06-23-2021 11:39 PM
Using OSS jars is causing classpath issues always when running the job on Databricks. The same job works fine on EMR/on-premise.
Accepted Solutions
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
06-24-2021 01:33 AM
Databricks does not host the jars in its own maven repository. The jars from OSS can be used to compile the application. The OSS jars should not be used at the execution time. ie: the application jars should not be fat jars, but rather thin jars.
If internal Spark APIs are used, then it's possible the Databricks version of those classes has a different method, and compiling your application against the OSS jars causes Classpath issues. In such scenarios the jars available on the cluster should be sourced from your local and used for compilation. The jars are available at :
https://docs.databricks.com/dev-tools/databricks-connect.html#intellij-scala-or-java
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
06-24-2021 01:33 AM
Databricks does not host the jars in its own maven repository. The jars from OSS can be used to compile the application. The OSS jars should not be used at the execution time. ie: the application jars should not be fat jars, but rather thin jars.
If internal Spark APIs are used, then it's possible the Databricks version of those classes has a different method, and compiling your application against the OSS jars causes Classpath issues. In such scenarios the jars available on the cluster should be sourced from your local and used for compilation. The jars are available at :
https://docs.databricks.com/dev-tools/databricks-connect.html#intellij-scala-or-java
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
07-27-2022 10:10 AM
I following the https://docs.databricks.com/dev-tools/databricks-connect.html#intellij-scala-or-java to obtain spark-avro jar since databricks have it's custom from_avro method to use with kafka schema registry, But i am not able to find spark-avro jar using databricks connect ..We need that jar to be used with compilation. Any ideas why spark-avro jar is missing when we get jars using "databricks-connect get-jar-dir"?

