Hello. I'm currently having an issue that I simply cannot understand nor find an adequate work-around for. Recently, my team within our organization has undergone the effort of migrating our Python code from Databricks notebooks into regular Python modules. We've started building our various modules into wheel files, uploading them to our organization's Artifactory instance, and are installing said wheel via a pip-command which resides within a common notebook that most of our downstream data transformation notebooks call using a %run command. Most of our Python modules that are found in this wheel import the databricks-sdk SparkSession and DBUtils objects first thing, using the following import statement:
from databricks.sdk.runtime import spark, dbutils
It should be noted that some of our modules have dependencies on other modules within the same directory.
This was working yesterday during my various iterations of migrating code to Python modules, building them into the wheel, uploading to Artifactory, etc. Today, upon logging on, when attempting to run a particular cell within one of our transformation notebooks that I've been using for testing, I'm greeted with the following error:
Notebook exited: PythonException:
An exception was thrown from the Python worker. Please see the stack trace below.
Traceback (most recent call last):
File "/databricks/spark/python/pyspark/serializers.py", line 192, in _read_with_length
return self.loads(obj)
^^^^^^^^^^^^^^^
File "/databricks/spark/python/pyspark/serializers.py", line 572, in loads
return cloudpickle.loads(obj, encoding=encoding)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/local_disk0/.ephemeral_nfs/cluster_libraries/python/lib/python3.11/site-packages/sc_data_estate_lib/table.py", line 1, in <module>
from databricks.sdk.runtime import spark, dbutils
ModuleNotFoundError: No module named 'databricks.sdk'
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "/databricks/spark/python/pyspark/worker.py", line 1964, in main
func, profiler, deserializer, serializer = read_udfs(pickleSer, infile, eval_type)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/databricks/spark/python/pyspark/worker.py", line 1851, in read_udfs
read_single_udf(
File "/databricks/spark/python/pyspark/worker.py", line 802, in read_single_udf
f, return_type = read_command(pickleSer, infile)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/databricks/spark/python/pyspark/worker_util.py", line 70, in read_command
command = serializer._read_with_length(file)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/databricks/spark/python/pyspark/serializers.py", line 196, in _read_with_length
raise SerializationError("Caused by " + traceback.format_exc())
pyspark.serializers.SerializationError: Caused by Traceback (most recent call last):
File "/databricks/spark/python/pyspark/serializers.py", line 192, in _read_with_length
return self.loads(obj)
^^^^^^^^^^^^^^^
File "/databricks/spark/python/pyspark/serializers.py", line 572, in loads
return cloudpickle.loads(obj, encoding=encoding)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/local_disk0/.ephemeral_nfs/cluster_libraries/python/lib/python3.11/site-packages/sc_data_estate_lib/table.py", line 1, in <module>
from databricks.sdk.runtime import spark, dbutils
ModuleNotFoundError: No module named 'databricks.sdk'
We're currently running using the Databricks Runtime version 15.4 LTS. It should be noted that the function that is being called that produces this error is in-turn calling the spark.sql() function, which executes our SCD type-2 logic.
I've tried myriad combinations of options to try and regain functionality to no avail. I'm able to import the databricks.sdk.runtime package just fine when doing so from test aforementioned testing notebook, and using pip show databricks-sdk I'm able to verify that version 0.20.0 of the package is installed. I've also tried upgrading to the latest available version (0.36.0) using pip install --upgrade databricks-sdk, again to no avail. Perhaps the most frustrating piece of all this was the fact that it worked yesterday, but no longer.
If anyone can point me in the right direction, I'd greatly appreciate it. I've been wrestling with this for several days now, and would love to get things up-and-running again. Thank you.