cancel
Showing results for 
Search instead for 
Did you mean: 
Data Engineering
cancel
Showing results for 
Search instead for 
Did you mean: 

Cannot load spark-avro jars with databricksversion 10.4

Lazloo
New Contributor III

Currently, I am facing an issue since the `databricks-connect` runtime on our cluster was updated to 10.4. Since then, I cannot load the jars for spark-avro anymore. By Running the following code

from pyspark.sql import SparkSession
 
spark = SparkSession.builder.config("spark.jars.packages", "org.apache.spark:spark-avro_2.12:3.3.0").getOrCreate()

I get the following error:

The jars for the packages stored in: C:\Users\lazlo\.ivy2\jars
 
org.apache.spark#spark-avro_2.12 added as a dependency
 
:: resolving dependencies :: org.apache.spark#spark-submit-parent-dc011dfd-9d25-4d6f-9d0e-354626e7c1f8;1.0
 
   confs: [default]
 
   found org.apache.spark#spark-avro_2.12;3.3.0 in central
 
   found org.tukaani#xz;1.8 in central
 
   found org.spark-project.spark#unused;1.0.0 in central
 
:: resolution report :: resolve 156ms :: artifacts dl 4ms
 
   :: modules in use:
 
   org.apache.spark#spark-avro_2.12;3.3.0 from central in [default]
 
   org.spark-project.spark#unused;1.0.0 from central in [default]
 
   org.tukaani#xz;1.8 from central in [default]
 
   ---------------------------------------------------------------------
 
   |                 |           modules           ||  artifacts  |
 
   |      conf      | number| search|dwnlded|evicted|| number|dwnlded|
 
   ---------------------------------------------------------------------
 
   |     default    |  3  |  0  |  0  |  0  ||  3  |  0  |
 
   ---------------------------------------------------------------------
 
:: retrieving :: org.apache.spark#spark-submit-parent-dc011dfd-9d25-4d6f-9d0e-354626e7c1f8
 
   confs: [default]
 
   0 artifacts copied, 3 already retrieved (0kB/5ms)
 
22/08/16 13:15:57 WARN Shell: Did not find winutils.exe: {}

...

Traceback (most recent call last):
 
 File "C:/Aifora/repositories/test_poetry/tmp_jars.py", line 4, in 
 
   spark = SparkSession.builder.config("spark.jars.packages", "org.apache.spark:spark-avro_2.12:3.3.0").getOrCreate()
 
 File "C:\Users\lazlo\AppData\Local\pypoetry\Cache\virtualenvs\test-poetry-vvodToDL-py3.8\lib\site-packages\pyspark\sql\session.py", line 229, in getOrCreate
 
   sc = SparkContext.getOrCreate(sparkConf)
 
 File "C:\Users\lazlo\AppData\Local\pypoetry\Cache\virtualenvs\test-poetry-vvodToDL-py3.8\lib\site-packages\pyspark\context.py", line 400, in getOrCreate
 
   SparkContext(conf=conf or SparkConf())
 
 File "C:\Users\lazlo\AppData\Local\pypoetry\Cache\virtualenvs\test-poetry-vvodToDL-py3.8\lib\site-packages\pyspark\context.py", line 147, in __init__
 
   self._do_init(master, appName, sparkHome, pyFiles, environment, batchSize, serializer,
 
 File "C:\Users\lazlo\AppData\Local\pypoetry\Cache\virtualenvs\test-poetry-vvodToDL-py3.8\lib\site-packages\pyspark\context.py", line 210, in _do_init
 
   self._jsc = jsc or self._initialize_context(self._conf._jconf)
 
 File "C:\Users\lazlo\AppData\Local\pypoetry\Cache\virtualenvs\test-poetry-vvodToDL-py3.8\lib\site-packages\pyspark\context.py", line 337, in _initialize_context
 
   return self._jvm.JavaSparkContext(jconf)
 
 File "C:\Users\lazlo\AppData\Local\pypoetry\Cache\virtualenvs\test-poetry-vvodToDL-py3.8\lib\site-packages\py4j\java_gateway.py", line 1568, in __call__
 
   return_value = get_return_value(
 
 File "C:\Users\lazlo\AppData\Local\pypoetry\Cache\virtualenvs\test-poetry-vvodToDL-py3.8\lib\site-packages\py4j\protocol.py", line 326, in get_return_value
 
   raise Py4JJavaError(
 
py4j.protocol.Py4JJavaError: An error occurred while calling None.org.apache.spark.api.java.JavaSparkContext.

If important: I use a windows machine (Windows 11) and manage the packages via poetry. Here my pyproject.toml

[tool.poetry]
 
name = "test_poetry"
 
version = "1.37.5"
 
description = ""
 
authors = [
 
    "lazloo xp ",
 
 ]
 
 
[[tool.poetry.source]]
 
name = "***_nexus"
 
url = "https://nexus.infrastructure.xxxx.net/repository/pypi-all/simple/"
 
default = true
 
 
[tool.poetry.dependencies]
 
python = "==3.8.*"
 
databricks-connect = "^10.4"
 
 

0 REPLIES 0
Welcome to Databricks Community: Lets learn, network and celebrate together

Join our fast-growing data practitioner and expert community of 80K+ members, ready to discover, help and collaborate together while making meaningful connections. 

Click here to register and join today! 

Engage in exciting technical discussions, join a group with your peers and meet our Featured Members.