Debugging using vscode & databricks connect
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
08-06-2024 11:33 AM
Hi all
I'm facing some difficulties when I use DataBricks Connect to debug my ML solution. A long story short, I want to investigate a few variables after I've conducted training. With the debugger at hand, I can simply place a breakpoint on the line I want to inspect. Although that only works partially...
You may assume that I have installed everything correctly, as I pretty much followed every guideline that I could find.
I've defined a customer package with Poetry that aligns its dependencies with that of the cluster (unity catalog enabled). Based on what I read here I conclude that also my 'package' will be available on the cluster (is that correct?). The package is defined in vscode (simple file structure, with __init__.py).
I do many imports from my package when I start executing my source code, and those never failed at the start of my program. That either means that it's running correctly on the cluster, or that it's running locally - for some reason I believe it must be running locally as according to this blog but if check the databricks assistance, it assures me that everything is running on the cluster.
I'm doing my training using spark and it's using some custom classes from my package that translates the spark dataframe into a pandas dataframe using the pandas_api so that I can easily run my code locally and execute it in parallel fashion server side (i.e the cluster)
This works well, but at a certain point it complains that the worker has no access to my custom package...
Can I assume that my package is not installed at the cluster when running the debugger (the first linked put me a bit off guard). Maybe I'm seeing this all wrong. Just hoping that someone can clarify it a bit
Have a nice day
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
08-08-2024 04:15 AM
Hi @EijayK, Ensure that the package is installed on the cluster itself, which you can verify through the cluster's library installation logs. Additionally, make sure your cluster meets all Databricks Connect requirements, including proper configuration and runtime compatibility. Use the databricks-connect test
command to validate your setup.
Have a nice day, and I hope this helps!

