cancel
Showing results for 
Search instead for 
Did you mean: 
Data Engineering
Join discussions on data engineering best practices, architectures, and optimization strategies within the Databricks Community. Exchange insights and solutions with fellow data engineers.
cancel
Showing results for 
Search instead for 
Did you mean: 

ModuleNotFoundError: No module named 'databricks.sdk' in module installed via Pip

alex_crow
New Contributor II

Hello. I'm currently having an issue that I simply cannot understand nor find an adequate work-around for. Recently, my team within our organization has undergone the effort of migrating our Python code from Databricks notebooks into regular Python modules. We've started building our various modules into wheel files, uploading them to our organization's Artifactory instance, and are installing said wheel via a pip-command which resides within a common notebook that most of our downstream data transformation notebooks call using a %run command. Most of our Python modules that are found in this wheel import the databricks-sdk SparkSession and DBUtils objects first thing, using the following import statement:

from databricks.sdk.runtime import spark, dbutils

It should be noted that some of our modules have dependencies on other modules within the same directory.

This was working yesterday during my various iterations of migrating code to Python modules, building them into the wheel, uploading to Artifactory, etc. Today, upon logging on, when attempting to run a particular cell within one of our transformation notebooks that I've been using for testing, I'm greeted with the following error:

Notebook exited: PythonException:

  An exception was thrown from the Python worker. Please see the stack trace below.
Traceback (most recent call last):
  File "/databricks/spark/python/pyspark/serializers.py", line 192, in _read_with_length
    return self.loads(obj)
           ^^^^^^^^^^^^^^^
  File "/databricks/spark/python/pyspark/serializers.py", line 572, in loads
    return cloudpickle.loads(obj, encoding=encoding)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/local_disk0/.ephemeral_nfs/cluster_libraries/python/lib/python3.11/site-packages/sc_data_estate_lib/table.py", line 1, in <module>
    from databricks.sdk.runtime import spark, dbutils
ModuleNotFoundError: No module named 'databricks.sdk'

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/databricks/spark/python/pyspark/worker.py", line 1964, in main
    func, profiler, deserializer, serializer = read_udfs(pickleSer, infile, eval_type)
                                               ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/databricks/spark/python/pyspark/worker.py", line 1851, in read_udfs
    read_single_udf(
  File "/databricks/spark/python/pyspark/worker.py", line 802, in read_single_udf
    f, return_type = read_command(pickleSer, infile)
                     ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/databricks/spark/python/pyspark/worker_util.py", line 70, in read_command
    command = serializer._read_with_length(file)
              ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/databricks/spark/python/pyspark/serializers.py", line 196, in _read_with_length
    raise SerializationError("Caused by " + traceback.format_exc())
pyspark.serializers.SerializationError: Caused by Traceback (most recent call last):
  File "/databricks/spark/python/pyspark/serializers.py", line 192, in _read_with_length
    return self.loads(obj)
           ^^^^^^^^^^^^^^^
  File "/databricks/spark/python/pyspark/serializers.py", line 572, in loads
    return cloudpickle.loads(obj, encoding=encoding)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/local_disk0/.ephemeral_nfs/cluster_libraries/python/lib/python3.11/site-packages/sc_data_estate_lib/table.py", line 1, in <module>
    from databricks.sdk.runtime import spark, dbutils
ModuleNotFoundError: No module named 'databricks.sdk'

We're currently running using the Databricks Runtime version 15.4 LTS. It should be noted that the function that is being called that produces this error is in-turn calling the spark.sql() function, which executes our SCD type-2 logic.

I've tried myriad combinations of options to try and regain functionality to no avail. I'm able to import the databricks.sdk.runtime package just fine when doing so from test aforementioned testing notebook, and using pip show databricks-sdk I'm able to verify that version 0.20.0 of the package is installed. I've also tried upgrading to the latest available version (0.36.0) using pip install --upgrade databricks-sdk, again to no avail. Perhaps the most frustrating piece of all this was the fact that it worked yesterday, but no longer.

If anyone can point me in the right direction, I'd greatly appreciate it. I've been wrestling with this for several days now, and would love to get things up-and-running again. Thank you.

1 REPLY 1

alex_crow
New Contributor II

Maybe I should also mention that when doing pip install --upgrade databricks-sdk, not only is the version increased from 0.20.0 to 0.36.0, but the location of the package changes from /databricks/python3/lib/python3.11/site-packages to /local_disk0/.ephemeral_nfs/envs/pythonEnv-<guid>/lib/python3.11/site-packages. Not sure if this is significant or not.

Connect with Databricks Users in Your Area

Join a Regional User Group to connect with local Databricks users. Events will be happening in your city, and you won’t want to miss the chance to attend and share knowledge.

If there isn’t a group near you, start one and help create a community that brings people together.

Request a New Group