cancel
Showing results for 
Search instead for 
Did you mean: 
Data Engineering
Join discussions on data engineering best practices, architectures, and optimization strategies within the Databricks Community. Exchange insights and solutions with fellow data engineers.
cancel
Showing results for 
Search instead for 
Did you mean: 

Cannot load spark-avro jars with databricksversion 10.4

Lazloo
New Contributor III

Currently, I am facing an issue since the `databricks-connect` runtime on our cluster was updated to 10.4. Since then, I cannot load the jars for spark-avro anymore. By Running the following code

from pyspark.sql import SparkSession
 
spark = SparkSession.builder.config("spark.jars.packages", "org.apache.spark:spark-avro_2.12:3.3.0").getOrCreate()

I get the following error:

The jars for the packages stored in: C:\Users\lazlo\.ivy2\jars
 
org.apache.spark#spark-avro_2.12 added as a dependency
 
:: resolving dependencies :: org.apache.spark#spark-submit-parent-dc011dfd-9d25-4d6f-9d0e-354626e7c1f8;1.0
 
   confs: [default]
 
   found org.apache.spark#spark-avro_2.12;3.3.0 in central
 
   found org.tukaani#xz;1.8 in central
 
   found org.spark-project.spark#unused;1.0.0 in central
 
:: resolution report :: resolve 156ms :: artifacts dl 4ms
 
   :: modules in use:
 
   org.apache.spark#spark-avro_2.12;3.3.0 from central in [default]
 
   org.spark-project.spark#unused;1.0.0 from central in [default]
 
   org.tukaani#xz;1.8 from central in [default]
 
   ---------------------------------------------------------------------
 
   |                 |           modules           ||  artifacts  |
 
   |      conf      | number| search|dwnlded|evicted|| number|dwnlded|
 
   ---------------------------------------------------------------------
 
   |     default    |  3  |  0  |  0  |  0  ||  3  |  0  |
 
   ---------------------------------------------------------------------
 
:: retrieving :: org.apache.spark#spark-submit-parent-dc011dfd-9d25-4d6f-9d0e-354626e7c1f8
 
   confs: [default]
 
   0 artifacts copied, 3 already retrieved (0kB/5ms)
 
22/08/16 13:15:57 WARN Shell: Did not find winutils.exe: {}

...

Traceback (most recent call last):
 
 File "C:/Aifora/repositories/test_poetry/tmp_jars.py", line 4, in 
 
   spark = SparkSession.builder.config("spark.jars.packages", "org.apache.spark:spark-avro_2.12:3.3.0").getOrCreate()
 
 File "C:\Users\lazlo\AppData\Local\pypoetry\Cache\virtualenvs\test-poetry-vvodToDL-py3.8\lib\site-packages\pyspark\sql\session.py", line 229, in getOrCreate
 
   sc = SparkContext.getOrCreate(sparkConf)
 
 File "C:\Users\lazlo\AppData\Local\pypoetry\Cache\virtualenvs\test-poetry-vvodToDL-py3.8\lib\site-packages\pyspark\context.py", line 400, in getOrCreate
 
   SparkContext(conf=conf or SparkConf())
 
 File "C:\Users\lazlo\AppData\Local\pypoetry\Cache\virtualenvs\test-poetry-vvodToDL-py3.8\lib\site-packages\pyspark\context.py", line 147, in __init__
 
   self._do_init(master, appName, sparkHome, pyFiles, environment, batchSize, serializer,
 
 File "C:\Users\lazlo\AppData\Local\pypoetry\Cache\virtualenvs\test-poetry-vvodToDL-py3.8\lib\site-packages\pyspark\context.py", line 210, in _do_init
 
   self._jsc = jsc or self._initialize_context(self._conf._jconf)
 
 File "C:\Users\lazlo\AppData\Local\pypoetry\Cache\virtualenvs\test-poetry-vvodToDL-py3.8\lib\site-packages\pyspark\context.py", line 337, in _initialize_context
 
   return self._jvm.JavaSparkContext(jconf)
 
 File "C:\Users\lazlo\AppData\Local\pypoetry\Cache\virtualenvs\test-poetry-vvodToDL-py3.8\lib\site-packages\py4j\java_gateway.py", line 1568, in __call__
 
   return_value = get_return_value(
 
 File "C:\Users\lazlo\AppData\Local\pypoetry\Cache\virtualenvs\test-poetry-vvodToDL-py3.8\lib\site-packages\py4j\protocol.py", line 326, in get_return_value
 
   raise Py4JJavaError(
 
py4j.protocol.Py4JJavaError: An error occurred while calling None.org.apache.spark.api.java.JavaSparkContext.

If important: I use a windows machine (Windows 11) and manage the packages via poetry. Here my pyproject.toml

[tool.poetry]
 
name = "test_poetry"
 
version = "1.37.5"
 
description = ""
 
authors = [
 
    "lazloo xp ",
 
 ]
 
 
[[tool.poetry.source]]
 
name = "***_nexus"
 
url = "https://nexus.infrastructure.xxxx.net/repository/pypi-all/simple/"
 
default = true
 
 
[tool.poetry.dependencies]
 
python = "==3.8.*"
 
databricks-connect = "^10.4"
 
 

0 REPLIES 0

Connect with Databricks Users in Your Area

Join a Regional User Group to connect with local Databricks users. Events will be happening in your city, and you won’t want to miss the chance to attend and share knowledge.

If there isn’t a group near you, start one and help create a community that brings people together.

Request a New Group