UDF fails with "No module named 'dbruntime'" when using dbutils

Dimitry — Mon, 09 Jun 2025 03:49:26 GMT

I've got an UDF which I call using applyInPandas

That UDF is to distribute API calls.

It uses my custom .py library files that make these calls.

Everything worked until I use `dbutils.widgets.get` and `dbutils.secrets.get` inside these libraries.

It throws huge stack trace.

So the question is: how either to configure those libraries or get dbutils working?

PythonException: Traceback (most recent call last): org.apache.spark.SparkException: Job aborted due to stage failure: Task 0 in stage 106.0 failed 4 times, most recent failure: Lost task 0.3 in stage 106.0 (TID 230) (10.139.64.4 executor driver): org.apache.spark.api.python.PythonException: Traceback (most recent call last): File "/databricks/python/lib/python3.12/site-packages/databricks/sdk/runtime/__init__.py", line 79, in <module> from dbruntime import UserNamespaceInitializer ModuleNotFoundError: No module named 'dbruntime' During handling of the above exception, another exception occurred: Traceback (most recent call last): File "/databricks/python/lib/python3.12/site-packages/databricks/sdk/config.py", line 473, in init_auth self._header_factory = self._credentials_strategy(self) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/databricks/python/lib/python3.12/site-packages/databricks/sdk/credentials_provider.py", line 703, in __call__ raise ValueError( ValueError: cannot configure default credentials, please check https://docs.databricks.com/en/dev-tools/auth.html#databricks-client-unified-authentication to configure credentials for your preferred authentication method. The above exception was the direct cause of the following exception: Traceback (most recent call last): File "/databricks/python/lib/python3.12/site-packages/databricks/sdk/config.py", line 123, in __init__ self.init_auth() File "/databricks/python/lib/python3.12/site-packages/databricks/sdk/config.py", line 478, in init_auth raise ValueError(f'{self._credentials_strategy.auth_type()} auth: {e}') from e ValueError: default auth: cannot configure default credentials, please check https://docs.databricks.com/en/dev-tools/auth.html#databricks-client-unified-authentication to configure credentials for your preferred authentication method. The above exception was the direct cause of the following exception: Traceback (most recent call last): File "/databricks/spark/python/pyspark/serializers.py", line 192, in _read_with_length return self.loads(obj) ^^^^^^^^^^^^^^^ File "/databricks/spark/python/pyspark/serializers.py", line 617, in loads return cloudpickle.loads(obj, encoding=encoding) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/databricks/spark/python/pyspark/cloudpickle/cloudpickle.py", line 649, in subimport __import__(name) File "/Workspace/Shared/sparky/lib/graphql/shopify_stock_graphql.py", line 2, in <module> import lib.configuration as conf File "/Workspace/Shared/sparky/lib/configuration.py", line 1, in <module> from databricks.sdk.runtime import * File "/databricks/python/lib/python3.12/site-packages/databricks/sdk/runtime/__init__.py", line 172, in <module> dbutils = RemoteDbUtils() ^^^^^^^^^^^^^^^ File "/databricks/python/lib/python3.12/site-packages/databricks/sdk/dbutils.py", line 194, in __init__ self._config = Config() if not config else config ^^^^^^^^ File "/databricks/python/lib/python3.12/site-packages/databricks/sdk/config.py", line 127, in __init__ raise ValueError(message) from e ValueError: default auth: cannot configure default credentials, please check https://docs.databricks.com/en/dev-tools/auth.html#databricks-client-unified-authentication to configure credentials for your preferred authentication method. During handling of the above exception, another exception occurred: Traceback (most recent call last): File "/databricks/spark/python/pyspark/worker.py", line 2212, in main func, profiler, deserializer, serializer = read_udfs(pickleSer, infile, eval_type) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/databricks/spark/python/pyspark/worker.py", line 1893, in read_udfs arg_offsets, f = read_single_udf( ^^^^^^^^^^^^^^^^ File "/databricks/spark/python/pyspark/worker.py", line 909, in read_single_udf f, return_type = read_command(pickleSer, infile) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/databricks/spark/python/pyspark/worker_util.py", line 71, in read_command command = serializer._read_with_length(file) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/databricks/spark/python/pyspark/serializers.py", line 196, in _read_with_length raise SerializationError("Caused by " + traceback.format_exc()) pyspark.serializers.SerializationError: Caused by Traceback (most recent call last): File "/databricks/python/lib/python3.12/site-packages/databricks/sdk/runtime/__init__.py", line 79, in <module> from dbruntime import UserNamespaceInitializer ModuleNotFoundError: No module named 'dbruntime' During handling of the above exception, another exception occurred: Traceback (most recent call last): File "/databricks/python/lib/python3.12/site-packages/databricks/sdk/config.py", line 473, in init_auth self._header_factory = self._credentials_strategy(self) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/databricks/python/lib/python3.12/site-packages/databricks/sdk/credentials_provider.py", line 703, in __call__ raise ValueError( ValueError: cannot configure default credentials, please check https://docs.databricks.com/en/dev-tools/auth.html#databricks-client-unified-authentication to configure credentials for your preferred authentication method. The above exception was the direct cause of the following exception: Traceback (most recent call last): File "/databricks/python/lib/python3.12/site-packages/databricks/sdk/config.py", line 123, in __init__ self.init_auth() File "/databricks/python/lib/python3.12/site-packages/databricks/sdk/config.py", line 478, in init_auth raise ValueError(f'{self._credentials_strategy.auth_type()} auth: {e}') from e ValueError: default auth: cannot configure default credentials, please check https://docs.databricks.com/en/dev-tools/auth.html#databricks-client-unified-authentication to configure credentials for your preferred authentication method. The above exception was the direct cause of the following exception: Traceback (most recent call last): File "/databricks/spark/python/pyspark/serializers.py", line 192, in _read_with_length return self.loads(obj) ^^^^^^^^^^^^^^^ File "/databricks/spark/python/pyspark/serializers.py", line 617, in loads return cloudpickle.loads(obj, encoding=encoding) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/databricks/spark/python/pyspark/cloudpickle/cloudpickle.py", line 649, in subimport __import__(name) File "/Workspace/Shared/sparky/lib/graphql/shopify_stock_graphql.py", line 2, in <module> import lib.configuration as conf File "/Workspace/Shared/sparky/lib/configuration.py", line 1, in <module> from databricks.sdk.runtime import * File "/databricks/python/lib/python3.12/site-packages/databricks/sdk/runtime/__init__.py", line 172, in <module> dbutils = RemoteDbUtils() ^^^^^^^^^^^^^^^ File "/databricks/python/lib/python3.12/site-packages/databricks/sdk/dbutils.py", line 194, in __init__ self._config = Config() if not config else config ^^^^^^^^ File "/databricks/python/lib/python3.12/site-packages/databricks/sdk/config.py", line 127, in __init__ raise ValueError(message) from e ValueError: default auth: cannot configure default credentials, please check https://docs.databricks.com/en/dev-tools/auth.html#databricks-client-unified-authentication to configure credentials for your preferred authentication method. ..........

Re: UDF fails with "No module named 'dbruntime'" when using dbutils

cgrant — Fri, 20 Jun 2025 21:40:18 GMT

Currently, dbutils cannot be used inside of UDFs. For secrets, instead of getting the secret inside of the UDF, you can define it as a free variable outside of the UDF and it will be passed in properly, like the below

from pyspark.sql.functions import pandas_udf, col from pyspark.sql.types import * import pandas as pd secret = dbutils.secrets.get("scope", "secret") @pandas_udf(LongType()) def example_udf(value: pd.Series) -> pd.Series: print(secret) return value spark.range(1).select(example_udf(col("id"))).display()

Re: UDF fails with "No module named 'dbruntime'" when using dbutils

df_dbx — Tue, 24 Jun 2025 19:31:30 GMT

What about creating a function like this?

CREATE OR REPLACE FUNCTION geocode_address(address STRING) RETURNS STRUCT<latitude: DOUBLE, longitude: DOUBLE> LANGUAGE PYTHON ENVIRONMENT ( dependencies = '["requests"]', environment_version = "None" ) AS $$ import requests api_key = dbutils.secrets.get("my-secret-scope", "google-maps-geocoding-api-key") url = f"https://maps.googleapis.com/maps/api/geocode/json?address={address}&key={api_key}" response = requests.get(url) if response.status_code != 200: return None try: data = response.json() if data['status'] == 'OK': location = data['results'][0]['geometry']['location'] return (location['lat'], location['lng']) else: return None except (KeyError, ValueError): return None $$

...and then testing it like this:

SELECT geocode_address('1600 Amphitheatre Parkway, Mountain View, CA');

Currently it results in the following error:

NameError: name 'dbutils' is not defined

What's the recommended way of retrieving the secret in this case?

Re: UDF fails with "No module named 'dbruntime'" when using dbutils

df_dbx — Tue, 24 Jun 2025 20:46:33 GMT

Answering my own question. Similar to the original response, the answer was to pass in the secret as a function argument:

CREATE OR REPLACE FUNCTION geocode_address(address STRING, api_key STRING) RETURNS STRUCT<latitude: DOUBLE, longitude: DOUBLE> LANGUAGE PYTHON AS $$ import requests url = f"https://maps.googleapis.com/maps/api/geocode/json?address={address}&key={api_key}" response = requests.get(url) if response.status_code != 200: return None try: data = response.json() if data['status'] == 'OK': location = data['results'][0]['geometry']['location'] return (location['lat'], location['lng']) else: return None except (KeyError, ValueError): return None $$

And then here is how to call it:

SELECT geocode_address('1600 Amphitheatre Parkway, Mountain View, CA', secret("my-secret-scope", "google-maps-geocoding-api-key"));

Note: this won't work on a Serverless Warehouse (or Serverless compute) as by default they restrict outbound traffic.

Re: UDF fails with "No module named 'dbruntime'" when using dbutils

Dimitry — Tue, 24 Jun 2025 23:17:14 GMT

I ran outbound graphql on serverless, but on Azure version of the Databricks. Azure VMs don't restrict this.

My problem with serverless is How to "Python versions in the Spark Connect clien... - Databricks Community - 121213, so serverless is still unusable for UDFs.

topic Re: UDF fails with "No module named 'dbruntime'" when using dbutils in Get Started Discussions

UDF fails with "No module named 'dbruntime'" when using dbutils

Re: UDF fails with "No module named 'dbruntime'" when using dbutils

Re: UDF fails with "No module named 'dbruntime'" when using dbutils

Re: UDF fails with "No module named 'dbruntime'" when using dbutils

Re: UDF fails with "No module named 'dbruntime'" when using dbutils