ModuleNotFoundError: No module named 'databricks.sdk' in module installed via Pip
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
11-12-2024 07:26 PM
Hello. I'm currently having an issue that I simply cannot understand nor find an adequate work-around for. Recently, my team within our organization has undergone the effort of migrating our Python code from Databricks notebooks into regular Python modules. We've started building our various modules into wheel files, uploading them to our organization's Artifactory instance, and are installing said wheel via a pip-command which resides within a common notebook that most of our downstream data transformation notebooks call using a %run command. Most of our Python modules that are found in this wheel import the databricks-sdk SparkSession and DBUtils objects first thing, using the following import statement:
from databricks.sdk.runtime import spark, dbutils
It should be noted that some of our modules have dependencies on other modules within the same directory.
This was working yesterday during my various iterations of migrating code to Python modules, building them into the wheel, uploading to Artifactory, etc. Today, upon logging on, when attempting to run a particular cell within one of our transformation notebooks that I've been using for testing, I'm greeted with the following error:
Notebook exited: PythonException:
An exception was thrown from the Python worker. Please see the stack trace below.
Traceback (most recent call last):
File "/databricks/spark/python/pyspark/serializers.py", line 192, in _read_with_length
return self.loads(obj)
^^^^^^^^^^^^^^^
File "/databricks/spark/python/pyspark/serializers.py", line 572, in loads
return cloudpickle.loads(obj, encoding=encoding)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/local_disk0/.ephemeral_nfs/cluster_libraries/python/lib/python3.11/site-packages/sc_data_estate_lib/table.py", line 1, in <module>
from databricks.sdk.runtime import spark, dbutils
ModuleNotFoundError: No module named 'databricks.sdk'
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "/databricks/spark/python/pyspark/worker.py", line 1964, in main
func, profiler, deserializer, serializer = read_udfs(pickleSer, infile, eval_type)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/databricks/spark/python/pyspark/worker.py", line 1851, in read_udfs
read_single_udf(
File "/databricks/spark/python/pyspark/worker.py", line 802, in read_single_udf
f, return_type = read_command(pickleSer, infile)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/databricks/spark/python/pyspark/worker_util.py", line 70, in read_command
command = serializer._read_with_length(file)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/databricks/spark/python/pyspark/serializers.py", line 196, in _read_with_length
raise SerializationError("Caused by " + traceback.format_exc())
pyspark.serializers.SerializationError: Caused by Traceback (most recent call last):
File "/databricks/spark/python/pyspark/serializers.py", line 192, in _read_with_length
return self.loads(obj)
^^^^^^^^^^^^^^^
File "/databricks/spark/python/pyspark/serializers.py", line 572, in loads
return cloudpickle.loads(obj, encoding=encoding)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/local_disk0/.ephemeral_nfs/cluster_libraries/python/lib/python3.11/site-packages/sc_data_estate_lib/table.py", line 1, in <module>
from databricks.sdk.runtime import spark, dbutils
ModuleNotFoundError: No module named 'databricks.sdk'
We're currently running using the Databricks Runtime version 15.4 LTS. It should be noted that the function that is being called that produces this error is in-turn calling the spark.sql() function, which executes our SCD type-2 logic.
I've tried myriad combinations of options to try and regain functionality to no avail. I'm able to import the databricks.sdk.runtime package just fine when doing so from test aforementioned testing notebook, and using pip show databricks-sdk I'm able to verify that version 0.20.0 of the package is installed. I've also tried upgrading to the latest available version (0.36.0) using pip install --upgrade databricks-sdk, again to no avail. Perhaps the most frustrating piece of all this was the fact that it worked yesterday, but no longer.
If anyone can point me in the right direction, I'd greatly appreciate it. I've been wrestling with this for several days now, and would love to get things up-and-running again. Thank you.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
11-12-2024 07:38 PM
Maybe I should also mention that when doing pip install --upgrade databricks-sdk, not only is the version increased from 0.20.0 to 0.36.0, but the location of the package changes from /databricks/python3/lib/python3.11/site-packages to /local_disk0/.ephemeral_nfs/envs/pythonEnv-<guid>/lib/python3.11/site-packages. Not sure if this is significant or not.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
12-03-2024 11:20 PM
I met similar issues, I doubt it's related to spark udf. In my case, I load a sklearn model and use spark udf to speed up model predict, it raised similar error. However if I don't use udf, it could work.
An exception was thrown from the Python worker. Please see the stack trace below.
Traceback (most recent call last):
File "/databricks/spark/python/pyspark/worker.py", line 1964, in main
func, profiler, deserializer, serializer = read_udfs(pickleSer, infile, eval_type)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/databricks/spark/python/pyspark/worker.py", line 1851, in read_udfs
read_single_udf(
File "/databricks/spark/python/pyspark/worker.py", line 802, in read_single_udf
f, return_type = read_command(pickleSer, infile)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/databricks/spark/python/pyspark/worker_util.py", line 72, in read_command
command = serializer.loads(command.value)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/databricks/spark/python/pyspark/serializers.py", line 572, in loads
return cloudpickle.loads(obj, encoding=encoding)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
ModuleNotFoundError: No module named 'utils'
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
02-27-2025 01:21 AM
Hi Alex,
I am now facing a similar problem. Did you ever find a solution to this?
BR
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
03-02-2025 12:44 AM
Hi Team,
I have also facing same issue
I follow below steps
1. load_data.py
2. extract_data.py
when runing extract_data.py geting below error
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
a week ago
Did anyone make any progress here?
I seem to have the same issue.
It works in an interactive shell, but doesn't work in my code.
File "/home/ubuntu/change-detection-inference/liveeo/flows/change_detection_inference/flow.py", line 8, in <module>
from liveeo.flows.change_detection_inference.tasks import (
File "/home/ubuntu/change-detection-inference/liveeo/flows/change_detection_inference/tasks.py", line 15, in <module>
import mlflow
File "/home/ubuntu/.cache/pypoetry/virtualenvs/change-detection-inference-V1cXAqSc-py3.10/lib/python3.10/site-packages/mlflow/__init__.py", line 42, in <module>
from mlflow import (
File "/home/ubuntu/.cache/pypoetry/virtualenvs/change-detection-inference-V1cXAqSc-py3.10/lib/python3.10/site-packages/mlflow/artifacts/__init__.py", line 12, in <module>
from mlflow.tracking import _get_store
File "/home/ubuntu/.cache/pypoetry/virtualenvs/change-detection-inference-V1cXAqSc-py3.10/lib/python3.10/site-packages/mlflow/tracking/__init__.py", line 8, in <module>
from mlflow.tracking._model_registry.utils import (
File "/home/ubuntu/.cache/pypoetry/virtualenvs/change-detection-inference-V1cXAqSc-py3.10/lib/python3.10/site-packages/mlflow/tracking/_model_registry/utils.py", line 4, in <module>
from mlflow.store.db.db_types import DATABASE_ENGINES
File "/home/ubuntu/.cache/pypoetry/virtualenvs/change-detection-inference-V1cXAqSc-py3.10/lib/python3.10/site-packages/mlflow/store/__init__.py", line 1, in <module>
from mlflow.store import _unity_catalog # noqa: F401
File "/home/ubuntu/.cache/pypoetry/virtualenvs/change-detection-inference-V1cXAqSc-py3.10/lib/python3.10/site-packages/mlflow/store/_unity_catalog/__init__.py", line 1, in <module>
from mlflow.store._unity_catalog import registry # noqa: F401
File "/home/ubuntu/.cache/pypoetry/virtualenvs/change-detection-inference-V1cXAqSc-py3.10/lib/python3.10/site-packages/mlflow/store/_unity_catalog/registry/__init__.py", line 1, in <module>
from mlflow.store._unity_catalog.registry import rest_store, uc_oss_rest_store # noqa: F401
File "/home/ubuntu/.cache/pypoetry/virtualenvs/change-detection-inference-V1cXAqSc-py3.10/lib/python3.10/site-packages/mlflow/store/_unity_catalog/registry/rest_store.py", line 70, in <module>
from mlflow.store.artifact.databricks_sdk_models_artifact_repo import (
File "/home/ubuntu/.cache/pypoetry/virtualenvs/change-detection-inference-V1cXAqSc-py3.10/lib/python3.10/site-packages/mlflow/store/artifact/databricks_sdk_models_artifact_repo.py", line 4, in <module>
from databricks.sdk.errors.platform import NotFound
ModuleNotFoundError: No module named 'databricks.sdk'; 'databricks' is not a package
It seems to happen on importing mlflow.
For reference, I have databricks-sdk version 0.48.0 installed, and mlflow 2.17.2 (also tried 2.21)
In a shell I can import mlflow and even run
from databricks.sdk.errors.platform import NotFound
without issue
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
a week ago - last edited a week ago
Hello again everyone, and sorry for the late response. It took a while to understand, but the cause of my issue was the attempt to create/"promote" Spark UDFs out of functions that had a dependency (or dependencies) upon classes or objects within the databricks.sdk.runtime package. The issue has been resolved for quite a while in our solution, but if I remember correctly, we had to make such functions (the ones being "promoted" to UDFs) local to our "common" notebook, which I mentioned in the OG post, meaning that rather than depending upon the spark or dbutils objects from the runtime package, those functions now depend upon the local versions of those objects that are globally available within the session they're running in. Hopefully this provides some clarity to the others who are also experiencing this issue.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Monday
Lol, OK so in my case it was because I had a file called databricks.py with clashed with the installed databricks.
Renaming my file to databricks_utils.py solved it.

