cancel
Showing results for 
Search instead for 
Did you mean: 
Data Engineering
Join discussions on data engineering best practices, architectures, and optimization strategies within the Databricks Community. Exchange insights and solutions with fellow data engineers.
cancel
Showing results for 
Search instead for 
Did you mean: 

Python mocking dbutils in unittests

confused_dev
New Contributor II

I am trying to write some unittests using pytest, but I am coming accross the problem of how to mock my dbutils method when dbutils isn't being defined in my notebook.

Is there a way to do this so that I can unit test individual functions that are utilizing dbutils?

7 REPLIES 7

fermin_vicente
New Contributor III

Hi,

You can mock dbutils. An easy way to do it is to:

  • Try to receive dbutils as a parameter in your functions (inject it) instead of using it globally. This way your code is more testable and you won't to do patching which is a bit more cumbersome.
  • Use a mock library. Unittest.mock is the simplest approach
  • Call your function passing down a mock instead of the actual dbutils

Example:

  • your library under test
def my_function(dbutils):
   ...
   dbutils.fs.ls("/tmp")  # this is using the local variable received by parameter
   ...
  • your notebook
from my_library import my_function
 
...
my_function(dbutils)  # this refers to the global dbutils variable
  • your test
from unittest.mock import MagicMock
 
from my_library import my_function
 
 
def test_my_function_calls_dbutils():
   mock_dbutils = MagicMock()
 
   my_function(dbutils=mock_dbutils)
 
   mock_dbutils.fs.ls.assert_called_once_with("/tmp")

xiangzhu
Contributor III

above reponse is a pure mock, and below is another example from dbx about a fixture with local filesystem, you can also add mock on dbutils.secrets:

https://github.com/databrickslabs/dbx/blob/b2989213b5f67e2b7ccf8adeba97da70e88ffff2/dbx/templates/pr...

Anonymous
Not applicable

Hi @Jake P​ 

Hope all is well! Just wanted to check in if you were able to resolve your issue and would you be happy to share the solution or mark an answer as best? Else please let us know if you need more help. 

We'd love to hear from you.

Thanks!

drorata
New Contributor II

How can I add locally type annotation to the example provided above by @fermin_vicente?

The type of `dbutils` seems to be `dbruntime.dbutils.DBUtils` but I'm not sure how to get it available in local dev.

I personally went without type hint for the dbutils param to avoid extra dependencies just for that, but I think you can actually get that type from the Databricks SDK, according to the docs:

https://docs.databricks.com/en/dev-tools/databricks-connect/python/databricks-utilities.html

https://pypi.org/project/databricks-sdk/#description

I would worry in this case about matching the proper SDK with the DBR you're using to ensure the type validations match (although I guess dbutils doesn't evolve a lot)

Hope this helps

saurabh18cs
Contributor II

import unittest
from unittest.mock import MagicMock, patch

# Import the function to be tested
from your_notebook import my_function

class TestMyFunction(unittest.TestCase):
@patch('your_notebook.dbutils')
def test_my_function(self, mock_dbutils):
# Create a mock for dbutils.fs
mock_fs = MagicMock()
mock_dbutils.fs = mock_fs

# Define the behavior of the mock methods
mock_fs.mkdirs.return_value = None
mock_fs.ls.return_value = ["file1", "file2"]

# Call the function to be tested
result = my_function()

# Assertions

if __name__ == '__main__':
unittest.main()

pavlosskev
New Contributor III

Fermin_vicente's answer is pretty good already. Below is how you can do something similar with conftest.py

# conftest.py
import pytest
from unittest.mock import MagicMock
from pyspark.sql import SparkSession

@pytest.fixture(scope="session")
def dbutils():
    dbutils = MagicMock()
    yield dbutils
    dbutils.stop()

# Also for the SparkSession
@pytest.fixture(scope="session")
def spark():
    spark = (
        SparkSession.builder.master("local[*]").appName("app-name").getOrCreate()
    )
    yield spark
    spark.stop()

Then for the testing code:

def test_query_generation(spark, dbutils):
    """
    Test example
    """
    test_obj = QueryGenerator(test_config, spark, dbutils)

    expected_query = (
        "SELECT * from community.databricks.com"
    )
    assert (
        test_obj.get_query() == expected_query
    )

Your class object has to get input a dbutils object. I think the same can be done if you want to use just functions.

Connect with Databricks Users in Your Area

Join a Regional User Group to connect with local Databricks users. Events will be happening in your city, and you won’t want to miss the chance to attend and share knowledge.

If there isn’t a group near you, start one and help create a community that brings people together.

Request a New Group