Hi all
I'm facing some difficulties when I use DataBricks Connect to debug my ML solution. A long story short, I want to investigate a few variables after I've conducted training. With the debugger at hand, I can simply place a breakpoint on the line I want to inspect. Although that only works partially...
You may assume that I have installed everything correctly, as I pretty much followed every guideline that I could find.
I've defined a customer package with Poetry that aligns its dependencies with that of the cluster (unity catalog enabled). Based on what I read here I conclude that also my 'package' will be available on the cluster (is that correct?). The package is defined in vscode (simple file structure, with __init__.py).
I do many imports from my package when I start executing my source code, and those never failed at the start of my program. That either means that it's running correctly on the cluster, or that it's running locally - for some reason I believe it must be running locally as according to this blog but if check the databricks assistance, it assures me that everything is running on the cluster.
I'm doing my training using spark and it's using some custom classes from my package that translates the spark dataframe into a pandas dataframe using the pandas_api so that I can easily run my code locally and execute it in parallel fashion server side (i.e the cluster)
This works well, but at a certain point it complains that the worker has no access to my custom package...
Can I assume that my package is not installed at the cluster when running the debugger (the first linked put me a bit off guard). Maybe I'm seeing this all wrong. Just hoping that someone can clarify it a bit
Have a nice day