Databricks Community

Dimitry · ‎06-08-2025

I've got an UDF which I call using applyInPandas

That UDF is to distribute API calls.

It uses my custom .py library files that make these calls.

Everything worked until I use `dbutils.widgets.get` and `dbutils.secrets.get` inside these libraries.

It throws huge stack trace.

So the question is: how either to configure those libraries or get dbutils working?

PythonException: Traceback (most recent call last):
org.apache.spark.SparkException: Job aborted due to stage failure: Task 0 in stage 106.0 failed 4 times, most recent failure: Lost task 0.3 in stage 106.0 (TID 230) (10.139.64.4 executor driver): org.apache.spark.api.python.PythonException: Traceback (most recent call last):
  File "/databricks/python/lib/python3.12/site-packages/databricks/sdk/runtime/__init__.py", line 79, in <module>
    from dbruntime import UserNamespaceInitializer
ModuleNotFoundError: No module named 'dbruntime'

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/databricks/python/lib/python3.12/site-packages/databricks/sdk/config.py", line 473, in init_auth
    self._header_factory = self._credentials_strategy(self)
                           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/databricks/python/lib/python3.12/site-packages/databricks/sdk/credentials_provider.py", line 703, in __call__
    raise ValueError(
ValueError: cannot configure default credentials, please check https://docs.databricks.com/en/dev-tools/auth.html#databricks-client-unified-authentication to configure credentials for your preferred authentication method.

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
  File "/databricks/python/lib/python3.12/site-packages/databricks/sdk/config.py", line 123, in __init__
    self.init_auth()
  File "/databricks/python/lib/python3.12/site-packages/databricks/sdk/config.py", line 478, in init_auth
    raise ValueError(f'{self._credentials_strategy.auth_type()} auth: {e}') from e
ValueError: default auth: cannot configure default credentials, please check https://docs.databricks.com/en/dev-tools/auth.html#databricks-client-unified-authentication to configure credentials for your preferred authentication method.

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
  File "/databricks/spark/python/pyspark/serializers.py", line 192, in _read_with_length
    return self.loads(obj)
           ^^^^^^^^^^^^^^^
  File "/databricks/spark/python/pyspark/serializers.py", line 617, in loads
    return cloudpickle.loads(obj, encoding=encoding)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/databricks/spark/python/pyspark/cloudpickle/cloudpickle.py", line 649, in subimport
    __import__(name)
  File "/Workspace/Shared/sparky/lib/graphql/shopify_stock_graphql.py", line 2, in <module>
    import lib.configuration as conf
  File "/Workspace/Shared/sparky/lib/configuration.py", line 1, in <module>
    from databricks.sdk.runtime import *
  File "/databricks/python/lib/python3.12/site-packages/databricks/sdk/runtime/__init__.py", line 172, in <module>
    dbutils = RemoteDbUtils()
              ^^^^^^^^^^^^^^^
  File "/databricks/python/lib/python3.12/site-packages/databricks/sdk/dbutils.py", line 194, in __init__
    self._config = Config() if not config else config
                   ^^^^^^^^
  File "/databricks/python/lib/python3.12/site-packages/databricks/sdk/config.py", line 127, in __init__
    raise ValueError(message) from e
ValueError: default auth: cannot configure default credentials, please check https://docs.databricks.com/en/dev-tools/auth.html#databricks-client-unified-authentication to configure credentials for your preferred authentication method.

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/databricks/spark/python/pyspark/worker.py", line 2212, in main
    func, profiler, deserializer, serializer = read_udfs(pickleSer, infile, eval_type)
                                               ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/databricks/spark/python/pyspark/worker.py", line 1893, in read_udfs
    arg_offsets, f = read_single_udf(
                     ^^^^^^^^^^^^^^^^
  File "/databricks/spark/python/pyspark/worker.py", line 909, in read_single_udf
    f, return_type = read_command(pickleSer, infile)
                     ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/databricks/spark/python/pyspark/worker_util.py", line 71, in read_command
    command = serializer._read_with_length(file)
              ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/databricks/spark/python/pyspark/serializers.py", line 196, in _read_with_length
    raise SerializationError("Caused by " + traceback.format_exc())
pyspark.serializers.SerializationError: Caused by Traceback (most recent call last):
  File "/databricks/python/lib/python3.12/site-packages/databricks/sdk/runtime/__init__.py", line 79, in <module>
    from dbruntime import UserNamespaceInitializer
ModuleNotFoundError: No module named 'dbruntime'

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/databricks/python/lib/python3.12/site-packages/databricks/sdk/config.py", line 473, in init_auth
    self._header_factory = self._credentials_strategy(self)
                           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/databricks/python/lib/python3.12/site-packages/databricks/sdk/credentials_provider.py", line 703, in __call__
    raise ValueError(
ValueError: cannot configure default credentials, please check https://docs.databricks.com/en/dev-tools/auth.html#databricks-client-unified-authentication to configure credentials for your preferred authentication method.

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
  File "/databricks/python/lib/python3.12/site-packages/databricks/sdk/config.py", line 123, in __init__
    self.init_auth()
  File "/databricks/python/lib/python3.12/site-packages/databricks/sdk/config.py", line 478, in init_auth
    raise ValueError(f'{self._credentials_strategy.auth_type()} auth: {e}') from e
ValueError: default auth: cannot configure default credentials, please check https://docs.databricks.com/en/dev-tools/auth.html#databricks-client-unified-authentication to configure credentials for your preferred authentication method.

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
  File "/databricks/spark/python/pyspark/serializers.py", line 192, in _read_with_length
    return self.loads(obj)
           ^^^^^^^^^^^^^^^
  File "/databricks/spark/python/pyspark/serializers.py", line 617, in loads
    return cloudpickle.loads(obj, encoding=encoding)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/databricks/spark/python/pyspark/cloudpickle/cloudpickle.py", line 649, in subimport
    __import__(name)
  File "/Workspace/Shared/sparky/lib/graphql/shopify_stock_graphql.py", line 2, in <module>
    import lib.configuration as conf
  File "/Workspace/Shared/sparky/lib/configuration.py", line 1, in <module>
    from databricks.sdk.runtime import *
  File "/databricks/python/lib/python3.12/site-packages/databricks/sdk/runtime/__init__.py", line 172, in <module>
    dbutils = RemoteDbUtils()
              ^^^^^^^^^^^^^^^
  File "/databricks/python/lib/python3.12/site-packages/databricks/sdk/dbutils.py", line 194, in __init__
    self._config = Config() if not config else config
                   ^^^^^^^^
  File "/databricks/python/lib/python3.12/site-packages/databricks/sdk/config.py", line 127, in __init__
    raise ValueError(message) from e
ValueError: default auth: cannot configure default credentials, please check https://docs.databricks.com/en/dev-tools/auth.html#databricks-client-unified-authentication to configure credentials for your preferred authentication method.


..........

cgrant · ‎06-20-2025

Currently, dbutils cannot be used inside of UDFs. For secrets, instead of getting the secret inside of the UDF, you can define it as a free variable outside of the UDF and it will be passed in properly, like the below

from pyspark.sql.functions import pandas_udf, col
from pyspark.sql.types import *
import pandas as pd

secret = dbutils.secrets.get("scope", "secret")

@pandas_udf(LongType())
def example_udf(value: pd.Series) -> pd.Series:
    print(secret)
    return value

spark.range(1).select(example_udf(col("id"))).display()

View solution in original post

cgrant · ‎06-20-2025

Currently, dbutils cannot be used inside of UDFs. For secrets, instead of getting the secret inside of the UDF, you can define it as a free variable outside of the UDF and it will be passed in properly, like the below

from pyspark.sql.functions import pandas_udf, col
from pyspark.sql.types import *
import pandas as pd

secret = dbutils.secrets.get("scope", "secret")

@pandas_udf(LongType())
def example_udf(value: pd.Series) -> pd.Series:
    print(secret)
    return value

spark.range(1).select(example_udf(col("id"))).display()

df_dbx · ‎06-24-2025

What about creating a function like this?

CREATE OR REPLACE FUNCTION geocode_address(address STRING)

RETURNS STRUCT<latitude: DOUBLE, longitude: DOUBLE>

LANGUAGE PYTHON

ENVIRONMENT (

dependencies = '["requests"]',

environment_version = "None"

)

AS $$

import requests

api_key = dbutils.secrets.get("my-secret-scope", "google-maps-geocoding-api-key")

url = f"https://maps.googleapis.com/maps/api/geocode/json?address={address}&key={api_key}"

response = requests.get(url)

if response.status_code != 200:
    return None

try:
    data = response.json()

    if data['status'] == 'OK':
        location = data['results'][0]['geometry']['location']
        return (location['lat'], location['lng'])
    else:
        return None
except (KeyError, ValueError):
    return None
$$

...and then testing it like this:

SELECT geocode_address('1600 Amphitheatre Parkway, Mountain View, CA');

Currently it results in the following error:

NameError: name 'dbutils' is not defined

What's the recommended way of retrieving the secret in this case?

df_dbx · ‎06-24-2025

Answering my own question. Similar to the original response, the answer was to pass in the secret as a function argument:

CREATE OR REPLACE FUNCTION geocode_address(address STRING, api_key STRING)

RETURNS STRUCT<latitude: DOUBLE, longitude: DOUBLE>

LANGUAGE PYTHON

AS $$

import requests

url = f"https://maps.googleapis.com/maps/api/geocode/json?address={address}&key={api_key}"

response = requests.get(url)

if response.status_code != 200:
    return None

try:
    data = response.json()

    if data['status'] == 'OK':
        location = data['results'][0]['geometry']['location']
        return (location['lat'], location['lng'])
    else:
        return None
except (KeyError, ValueError):
    return None
$$

And then here is how to call it:

SELECT geocode_address('1600 Amphitheatre Parkway, Mountain View, CA', secret("my-secret-scope", "google-maps-geocoding-api-key"));

Note: this won't work on a Serverless Warehouse (or Serverless compute) as by default they restrict outbound traffic.

Dimitry · ‎06-24-2025

I ran outbound graphql on serverless, but on Azure version of the Databricks. Azure VMs don't restrict this.

My problem with serverless is How to "Python versions in the Spark Connect clien... - Databricks Community - 121213, so serverless is still unusable for UDFs.

Databricks Community

UDF fails with "No module named 'dbruntime'" when using dbutils

Join Us as a Local Community Builder!

Join us for another BrickTalk: Vibe-Coding Databricks Apps in Replit with Augusto!

🌟 Community Pulse: Your Weekly Roundup! November 14 – 20, 2025

Celebrating Our First Brickster Champion: Louis Frolio

⭐ Setup Spark with Hadoop Anywhere : A DBR aligned local Spark+HDFS+Hive stack on Docker⭐

Big Book of Data Engineering - Get how-tos, code snippets and real-world examples