cancel
Showing results for 
Search instead for 
Did you mean: 
Get Started Discussions
Start your journey with Databricks by joining discussions on getting started guides, tutorials, and introductory topics. Connect with beginners and experts alike to kickstart your Databricks experience.
cancel
Showing results for 
Search instead for 
Did you mean: 

How to run test of default python asset bundle, how run from terminal in general

dermoritz
New Contributor III

I created an asset bundle from default python template. I am able to "upload and run file" main.py from within vs-code:

 

5/23/2025, 9:01:29 AM - Uploading assets to databricks workspace...
5/23/2025, 9:01:31 AM - Creating execution context on cluster 0508-112319-jqxbhz37 ...
5/23/2025, 9:01:46 AM - Running src/processing/main.py ...

 

+--------------------+---------------------+-------------+-----------+----------+-----------+
|tpep_pickup_datetime|tpep_dropoff_datetime|trip_distance|fare_amount|pickup_zip|dropoff_zip|
+--------------------+---------------------+-------------+-----------+----------+-----------+
| 2016-02-13 21:47:53|  2016-02-13 21:57:15|          1.4|        8.0|     10103|      10110|
| 2016-02-13 18:29:09|  2016-02-13 18:37:23|         1.31|        7.5|     10023|      10023|
| 2016-02-06 19:40:58|  2016-02-06 19:52:32|          1.8|        9.5|     10001|      10018|
| 2016-02-12 19:06:43|  2016-02-12 19:20:54|          2.3|       11.5|     10044|      10111|
| 2016-02-23 10:27:56|  2016-02-23 10:58:33|          2.6|       18.5|     10199|      10022|
+--------------------+---------------------+-------------+-----------+----------+-----------+
only showing top 5 rows
5/23/2025, 9:02:03 AM - Done (took 33813ms)


But if i do this on "main_test.py" i get:

5/23/2025, 9:04:48 AM - Uploading assets to databricks workspace...
5/23/2025, 9:04:50 AM - Creating execution context on cluster 0508-112319-jqxbhz37 ...
5/23/2025, 9:04:56 AM - Running tests/main_test.py ...

 

---------------------------------------------------------------------------
ModuleNotFoundError                       Traceback (most recent call last)
File <frozen runpy>:286, in run_path(path_name, init_globals, run_name)
File <frozen runpy>:98, in _run_module_code(code, init_globals, mod_name, mod_spec, pkg_name, script_name)
File <frozen runpy>:88, in _run_code(code, run_globals, init_globals, mod_name, mod_spec, pkg_name, script_name)
File /home/moritz/workspace/processing/tests/main_test.py:1
----> 1 from processing.main import get_taxis, get_spark
      3 def test_main():
      4     taxis = get_taxis(get_spark())
main_test.py:1
ModuleNotFoundError: No module named 'processing'
5/23/2025, 9:04:59 AM - Done (took 11291ms)


How to get the test to run?

And more general how to get things run from within terminal? In terminal i already
- created a venv installed
- removing comment from line "databricks-connect>=15.4,<15.5" -> pip install -r requirements-dev.txt
- export DATABRICKS_CONFIG_PROFILE=adb-xyz
Yields (for main.py and main_test.py):

session.py", line 525, in _from_sdkconfig
    raise Exception("Cluster id or serverless are required but were not specified.")
Exception: Cluster id or serverless are required but were not specified.


So how to run test from vs-code and how to run anything from terminal?

 
9 REPLIES 9

-werners-
Esteemed Contributor III

the test script resides in another location (tests subdir) compared to main.py.

So probably what is going on is that the test script cannot find the necessary files (modules).
So make sure that either the modules are installed or your python path is set.
it is also important to check where the tests are run: locally or on a databricks cluster.

dermoritz
New Contributor III

As described first i want to get it run remotely using 

dermoritz_0-1747987164650.png

How to make this work for the test. Or how to leverage databricks extension correctly that it is uploading all that is needed and sets path correctly. 
Or more generally asked how the test generated by "databricks bundle init" is supposed to run at all? I would also like to run from Terminal, but here i guess certain env variables must be set for databricks-connect to work?

All documentation i found stopped at running the main - what works at least from vs code.

-werners-
Esteemed Contributor III

for databricks-connect to run, you do indeed need to setup the connection(s) to your databricks workspace.
Personally I do this with a .databrickscfg file in my home folder (using linux). But you can also pass credentials in the sparksession/databrickssession (https://docs.databricks.com/aws/en/dev-tools/databricks-connect/python, but you are probably aware of this).


The reason why your test script fails but not your main.py is because the test script resides in another location (/tests subdir whereas main.py resides in /src/default_python (or whatever))
So in your test script, the first thing you do is to import the functions written in main.py (from default_python.main import get_taxis, get_spark).
It is possible that this works or not, depending on how your python path is set.
If it does not work, you should add the main.py location to your python path with sys.path.append f.e. (or set it using export PYTHONPATH)).
This has in fact nothing to do with databricks or dab but the way python works.

dermoritz
New Contributor III

As i wrote, the setup of connection works - as i am able to run main.py. I have 2 profiles configured (one workspace and one cluster level) - both are "valid".
Does you 2nd part suggests that there are errors in the default-python template provided by databricks? Because the code i am referring to is completely generated (main and test) - an my question is how to get it working as it is delivered/ generated.
My assumption is that it should work without changes in code?! is this assumption wrong?

-werners-
Esteemed Contributor III

This assumption is wrong.
If you look into the directory structure, there is a pytest ini file. This should contain something like:

[pytest]
testpaths = tests
pythonpath = src

pytest ini is executed when you run pytest. so it will append the src dir (where main.py resides).
But afaik this only works if you run pytest from the directory where pytest ini resides.
In this case you first have to navigate to the correct directory (cd <project>), and from there run pytest.
pytest ini will then be executed and it should work.
there are other ways to make it work f.e. moving main_tests.py to the same dir as main.py etc but i think that first navigating to the correct dir should make it work.

dermoritz
New Contributor III

I have 2 different problems kind of problems or i see 2 ways to run stuff:
- using "upload and run" - this works with main, but not with main_test.py. Is main_test.py supposed to work via "upload and run" at all? - for this to work i think i need to upload all needed files not only the file to run?


- using python to run the stuff (here i know to set path and pytest.ini is correctly setup). Here i get for both (either "python main.py" or "pytest tests") the error i showed in first post: 

session.py", line 525, in _from_sdkconfig
    raise Exception("Cluster id or serverless are required but were not specified.")
Exception: Cluster id or serverless are required but were not specified.

For this error to fix i guess i have to set an env variable to specify cluster id. within vs-code this is done via gui of Databricks plugin. but how to set cluster id for python / pytest? (as said i was able to set databricks profile via DATABRICKS_CONFIG_PROFILE)

-werners-
Esteemed Contributor III

- using upload and run your main_test.py will not work.  The reason is that databricks will execute the script from its current location, and that will throw an error.  You also need the main.py uploaded btw.
A fix for this is to move the test script to the same directory as the main.py (and also change the import statement to reflect this).  But tests typically are not executed like this.


For the second issue: you indeed have to pass a cluster id (or serverless). you can add the cluster id to your profile, that is the easiest way.

dermoritz
New Contributor III

First let me say thanks for all your effort :-).

can you explain how to pass cluster id or how to add this to the profile? 

is there a way to put this settings specifically the cluster in code - like in .env? because i am about to push this in a repo. 

-werners-
Esteemed Contributor III

I figured it out from the docs:
https://learn.microsoft.com/en-us/azure/databricks/dev-tools/databricks-connect/cluster-config#clust...
I don´t put the cluster in code myself because all code runs in Jobs (on job clusters).  I only use it for development purposes, so for me it is easiest to put it in the profile.
But env var is also an option as you can see.

Join Us as a Local Community Builder!

Passionate about hosting events and connecting people? Help us grow a vibrant local community—sign up today to get started!

Sign Up Now