Databricks Community

maartenvr · ‎09-05-2023

Hi all,

We recently upgraded our databricks compute cluster from runtime version 10.4 LST, to 12.2 LST.
After the upgrade one of our python scripts suddenly fails with a module not found error; indicating that our customly created module "xml_parser" is not found on the spark executors. This is strange since we did install the module / library through the databrick UI on the new upgraded cluster; in exactly the same way as we installed it on the old cluster. Everything was running fine on the old LST. Therefore, I am wondering what causes this issue.
Has anything changed between the two runtimes? Am I missing a new setting?

FYI:
- Our spark jobs run from scripts using databricks connect (not through DB notebooks) and we have updated all the databricks connect packages from 10.4.X to 12.2.X.
- We upload a python wheel file to the UI, which gets stored on the DBFS to be picked up by the cluster.
The installation shows a success mark in the UI.

The error message is as follows:

```

Exception has occurred: Py4JJavaError

An error occurred while calling o52.save. : org.apache.spark.SparkException: Job aborted due to stage failure: Task 2 in stage 1.0 failed 4 times, most recent failure: Lost task 2.3 in stage 1.0 (TID 19) (10.161.130.19 executor 1): org.apache.spark.api.python.PythonException: 'pyspark.serializers.SerializationError: Caused by Traceback (most recent call last):

File "/databricks/spark/python/pyspark/serializers.py", line 188, in _read_with_length return self.loads(obj)

File "/databricks/spark/python/pyspark/serializers.py", line 540, in loads return cloudpickle.loads(obj, encoding=encoding)

ModuleNotFoundError: No module named 'xml_parser''.

Full traceback below:

Traceback (most recent call last): File "/databricks/spark/python/pyspark/serializers.py", line 188, in _read_with_length return self.loads(obj) File "/databricks/spark/python/pyspark/serializers.py", line 540, in loads return cloudpickle.loads(obj, encoding=encoding) ModuleNotFoundError: No module named 'xml_parser'
```

Debayan · ‎09-07-2023

Hi, This looks like package dependency issue. Could you also please try to update the DB connect to its latest version and try again?

Also, please tag @Debayan with your next response so that I will get notified. Thanks.

maartenvr · ‎09-12-2023

Hi @Debayan ,

We were already on the latest databricks-connect version (12.2.12) compatibly with LTS 12.2
I tried to run the code with a downgraded version as well after your comment (12.2.10) but that didn't do the trick.

@Retired_mod, thanks for the suggestions / checks.
We double checked all these points, and everything is fine except for the first one.
I am sure our application runs on the same Python version as the cluster (3.9.5.), but we have not set the PYSPARK_PYTHON environment variable. Where do we need to set it? On the machine making the connection to the cluster or inside the databricks compute cluster itself?
Just for my information, isn't the databricks-connect package responsible for this?
We didn't set it before either while everything was working fine.

Debayan · ‎09-13-2023

Hi, Also, what if you are trying with DBR version 13.x?

maartenvr · ‎09-14-2023

Going to 13.3 (LTS) unfortunately requires quite some extra work for our team.
We would need to start using / configure Unity Catalog.

For now I have opened a ticket with the Databricks support team.
If I find any solution I will post it here.

maartenvr · ‎09-28-2023

FYI: For now we have found a workaround.
We are adding the package as ZIP file to the current spark session with .addyFiles.
So after creating a spark session using Databricks-connect we run the following:
spark.sparkContext.addPyFile("C:/path/to/custom_package.zip")

We still have the question open to the DB team on why our installed package is not found anymore by the spark workers.

Databricks Community

Installed Library / Module not found through Databricks connect LST 12.2

Connect with Databricks Users in Your Area

Introducing an exclusively Databricks-hosted Assistant

How to present and share your Notebook insights in AI/BI Dashboards

Meet the Databricks MVPs

Now Hiring: Databricks Community Technical Moderator

Insights from a global survey of 1,100 technologists and interviews with 28 CIOs