cancel
Showing results forย 
Search instead forย 
Did you mean:ย 
Get Started Discussions
Start your journey with Databricks by joining discussions on getting started guides, tutorials, and introductory topics. Connect with beginners and experts alike to kickstart your Databricks experience.
cancel
Showing results forย 
Search instead forย 
Did you mean:ย 

UDF fails with "No module named 'dbruntime'" when using dbutils

Dimitry
Contributor

I've got an UDF which I call using applyInPandas

That UDF is to distribute API calls.

It uses my custom .py library files that make these calls.

Everything worked until I use `dbutils.widgets.get` and `dbutils.secrets.get` inside these libraries.

It throws huge stack trace.

So the question is: how either to configure those libraries or get dbutils working?

PythonException: Traceback (most recent call last):
org.apache.spark.SparkException: Job aborted due to stage failure: Task 0 in stage 106.0 failed 4 times, most recent failure: Lost task 0.3 in stage 106.0 (TID 230) (10.139.64.4 executor driver): org.apache.spark.api.python.PythonException: Traceback (most recent call last):
  File "/databricks/python/lib/python3.12/site-packages/databricks/sdk/runtime/__init__.py", line 79, in <module>
    from dbruntime import UserNamespaceInitializer
ModuleNotFoundError: No module named 'dbruntime'

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/databricks/python/lib/python3.12/site-packages/databricks/sdk/config.py", line 473, in init_auth
    self._header_factory = self._credentials_strategy(self)
                           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/databricks/python/lib/python3.12/site-packages/databricks/sdk/credentials_provider.py", line 703, in __call__
    raise ValueError(
ValueError: cannot configure default credentials, please check https://docs.databricks.com/en/dev-tools/auth.html#databricks-client-unified-authentication to configure credentials for your preferred authentication method.

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
  File "/databricks/python/lib/python3.12/site-packages/databricks/sdk/config.py", line 123, in __init__
    self.init_auth()
  File "/databricks/python/lib/python3.12/site-packages/databricks/sdk/config.py", line 478, in init_auth
    raise ValueError(f'{self._credentials_strategy.auth_type()} auth: {e}') from e
ValueError: default auth: cannot configure default credentials, please check https://docs.databricks.com/en/dev-tools/auth.html#databricks-client-unified-authentication to configure credentials for your preferred authentication method.

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
  File "/databricks/spark/python/pyspark/serializers.py", line 192, in _read_with_length
    return self.loads(obj)
           ^^^^^^^^^^^^^^^
  File "/databricks/spark/python/pyspark/serializers.py", line 617, in loads
    return cloudpickle.loads(obj, encoding=encoding)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/databricks/spark/python/pyspark/cloudpickle/cloudpickle.py", line 649, in subimport
    __import__(name)
  File "/Workspace/Shared/sparky/lib/graphql/shopify_stock_graphql.py", line 2, in <module>
    import lib.configuration as conf
  File "/Workspace/Shared/sparky/lib/configuration.py", line 1, in <module>
    from databricks.sdk.runtime import *
  File "/databricks/python/lib/python3.12/site-packages/databricks/sdk/runtime/__init__.py", line 172, in <module>
    dbutils = RemoteDbUtils()
              ^^^^^^^^^^^^^^^
  File "/databricks/python/lib/python3.12/site-packages/databricks/sdk/dbutils.py", line 194, in __init__
    self._config = Config() if not config else config
                   ^^^^^^^^
  File "/databricks/python/lib/python3.12/site-packages/databricks/sdk/config.py", line 127, in __init__
    raise ValueError(message) from e
ValueError: default auth: cannot configure default credentials, please check https://docs.databricks.com/en/dev-tools/auth.html#databricks-client-unified-authentication to configure credentials for your preferred authentication method.

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/databricks/spark/python/pyspark/worker.py", line 2212, in main
    func, profiler, deserializer, serializer = read_udfs(pickleSer, infile, eval_type)
                                               ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/databricks/spark/python/pyspark/worker.py", line 1893, in read_udfs
    arg_offsets, f = read_single_udf(
                     ^^^^^^^^^^^^^^^^
  File "/databricks/spark/python/pyspark/worker.py", line 909, in read_single_udf
    f, return_type = read_command(pickleSer, infile)
                     ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/databricks/spark/python/pyspark/worker_util.py", line 71, in read_command
    command = serializer._read_with_length(file)
              ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/databricks/spark/python/pyspark/serializers.py", line 196, in _read_with_length
    raise SerializationError("Caused by " + traceback.format_exc())
pyspark.serializers.SerializationError: Caused by Traceback (most recent call last):
  File "/databricks/python/lib/python3.12/site-packages/databricks/sdk/runtime/__init__.py", line 79, in <module>
    from dbruntime import UserNamespaceInitializer
ModuleNotFoundError: No module named 'dbruntime'

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/databricks/python/lib/python3.12/site-packages/databricks/sdk/config.py", line 473, in init_auth
    self._header_factory = self._credentials_strategy(self)
                           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/databricks/python/lib/python3.12/site-packages/databricks/sdk/credentials_provider.py", line 703, in __call__
    raise ValueError(
ValueError: cannot configure default credentials, please check https://docs.databricks.com/en/dev-tools/auth.html#databricks-client-unified-authentication to configure credentials for your preferred authentication method.

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
  File "/databricks/python/lib/python3.12/site-packages/databricks/sdk/config.py", line 123, in __init__
    self.init_auth()
  File "/databricks/python/lib/python3.12/site-packages/databricks/sdk/config.py", line 478, in init_auth
    raise ValueError(f'{self._credentials_strategy.auth_type()} auth: {e}') from e
ValueError: default auth: cannot configure default credentials, please check https://docs.databricks.com/en/dev-tools/auth.html#databricks-client-unified-authentication to configure credentials for your preferred authentication method.

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
  File "/databricks/spark/python/pyspark/serializers.py", line 192, in _read_with_length
    return self.loads(obj)
           ^^^^^^^^^^^^^^^
  File "/databricks/spark/python/pyspark/serializers.py", line 617, in loads
    return cloudpickle.loads(obj, encoding=encoding)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/databricks/spark/python/pyspark/cloudpickle/cloudpickle.py", line 649, in subimport
    __import__(name)
  File "/Workspace/Shared/sparky/lib/graphql/shopify_stock_graphql.py", line 2, in <module>
    import lib.configuration as conf
  File "/Workspace/Shared/sparky/lib/configuration.py", line 1, in <module>
    from databricks.sdk.runtime import *
  File "/databricks/python/lib/python3.12/site-packages/databricks/sdk/runtime/__init__.py", line 172, in <module>
    dbutils = RemoteDbUtils()
              ^^^^^^^^^^^^^^^
  File "/databricks/python/lib/python3.12/site-packages/databricks/sdk/dbutils.py", line 194, in __init__
    self._config = Config() if not config else config
                   ^^^^^^^^
  File "/databricks/python/lib/python3.12/site-packages/databricks/sdk/config.py", line 127, in __init__
    raise ValueError(message) from e
ValueError: default auth: cannot configure default credentials, please check https://docs.databricks.com/en/dev-tools/auth.html#databricks-client-unified-authentication to configure credentials for your preferred authentication method.


..........

 

1 ACCEPTED SOLUTION

Accepted Solutions

cgrant
Databricks Employee
Databricks Employee

Currently, dbutils cannot be used inside of UDFs. For secrets, instead of getting the secret inside of the UDF, you can define it as a free variable outside of the UDF and it will be passed in properly, like the below

from pyspark.sql.functions import pandas_udf, col
from pyspark.sql.types import *
import pandas as pd

secret = dbutils.secrets.get("scope", "secret")

@pandas_udf(LongType())
def example_udf(value: pd.Series) -> pd.Series:
    print(secret)
    return value

spark.range(1).select(example_udf(col("id"))).display()

View solution in original post

4 REPLIES 4

cgrant
Databricks Employee
Databricks Employee

Currently, dbutils cannot be used inside of UDFs. For secrets, instead of getting the secret inside of the UDF, you can define it as a free variable outside of the UDF and it will be passed in properly, like the below

from pyspark.sql.functions import pandas_udf, col
from pyspark.sql.types import *
import pandas as pd

secret = dbutils.secrets.get("scope", "secret")

@pandas_udf(LongType())
def example_udf(value: pd.Series) -> pd.Series:
    print(secret)
    return value

spark.range(1).select(example_udf(col("id"))).display()

df_dbx
New Contributor II

What about creating a function like this?

CREATE OR REPLACE FUNCTION geocode_address(address STRING)

RETURNS STRUCT<latitude: DOUBLE, longitude: DOUBLE>

LANGUAGE PYTHON

ENVIRONMENT (

dependencies = '["requests"]',

environment_version = "None"

)

AS $$

import requests

api_key = dbutils.secrets.get("my-secret-scope", "google-maps-geocoding-api-key")

url = f"https://maps.googleapis.com/maps/api/geocode/json?address={address}&key={api_key}"

response = requests.get(url)

if response.status_code != 200:
    return None

try:
    data = response.json()

    if data['status'] == 'OK':
        location = data['results'][0]['geometry']['location']
        return (location['lat'], location['lng'])
    else:
        return None
except (KeyError, ValueError):
    return None
$$

...and then testing it like this:

SELECT geocode_address('1600 Amphitheatre Parkway, Mountain View, CA');

Currently it results in the following error:

NameError: name 'dbutils' is not defined

What's the recommended way of retrieving the secret in this case?

df_dbx
New Contributor II

Answering my own question. Similar to the original response, the answer was to pass in the secret as a function argument:

CREATE OR REPLACE FUNCTION geocode_address(address STRING, api_key STRING)

RETURNS STRUCT<latitude: DOUBLE, longitude: DOUBLE>

LANGUAGE PYTHON

AS $$

import requests

url = f"https://maps.googleapis.com/maps/api/geocode/json?address={address}&key={api_key}"

response = requests.get(url)

if response.status_code != 200:
    return None

try:
    data = response.json()

    if data['status'] == 'OK':
        location = data['results'][0]['geometry']['location']
        return (location['lat'], location['lng'])
    else:
        return None
except (KeyError, ValueError):
    return None
$$

And then here is how to call it:

SELECT geocode_address('1600 Amphitheatre Parkway, Mountain View, CA', secret("my-secret-scope", "google-maps-geocoding-api-key"));

Note: this won't work on a Serverless Warehouse (or Serverless compute) as by default they restrict outbound traffic.

I ran outbound graphql on serverless, but on Azure version of the Databricks. Azure VMs don't restrict this.

My problem with serverless is How to "Python versions in the Spark Connect clien... - Databricks Community - 121213, so serverless is still unusable for UDFs.

Join Us as a Local Community Builder!

Passionate about hosting events and connecting people? Help us grow a vibrant local communityโ€”sign up today to get started!

Sign Up Now