cancel
Showing results for 
Search instead for 
Did you mean: 
Machine Learning
Dive into the world of machine learning on the Databricks platform. Explore discussions on algorithms, model training, deployment, and more. Connect with ML enthusiasts and experts.
cancel
Showing results for 
Search instead for 
Did you mean: 

Debugging using vscode & databricks connect

EijayK
New Contributor

Hi all

I'm facing some difficulties when I use DataBricks Connect to debug my ML solution. A long story short, I want to investigate a few variables after I've conducted training. With the debugger at hand, I can simply place a breakpoint on the line I want to inspect. Although that only works partially...

You may assume that I have installed everything correctly, as I pretty much followed every guideline that I could find. 

I've defined a customer package with Poetry that aligns its dependencies with that of the cluster (unity catalog enabled). Based on what I read here I conclude that also my 'package' will be available on the cluster (is that correct?). The package is defined in vscode (simple file structure, with __init__.py).

I do many imports from my package when I start executing my source code, and those never failed at the start of my program. That either means that it's running correctly on the cluster, or that it's running locally - for some reason I believe it must be running locally as according to this blog but if check the databricks assistance, it assures me that everything is running on the cluster.

I'm doing my training using spark and it's using some custom classes from my package that translates the spark dataframe into a pandas dataframe using the pandas_api so that I can easily run my code locally and execute it in parallel fashion server side (i.e the cluster)

This works well, but at a certain point it complains that the worker has no access to my custom package...

Can I assume that my package is not installed at the cluster when running the debugger (the first linked put me a bit off guard). Maybe I'm seeing this all wrong. Just hoping that someone can clarify it a bit

Have a nice day

1 REPLY 1

Kaniz_Fatma
Community Manager
Community Manager

Hi @EijayK, Ensure that the package is installed on the cluster itself, which you can verify through the cluster's library installation logs. Additionally, make sure your cluster meets all Databricks Connect requirements, including proper configuration and runtime compatibility. Use the databricks-connect test command to validate your setup.

Have a nice day, and I hope this helps! 

Connect with Databricks Users in Your Area

Join a Regional User Group to connect with local Databricks users. Events will be happening in your city, and you won’t want to miss the chance to attend and share knowledge.

If there isn’t a group near you, start one and help create a community that brings people together.

Request a New Group