Debug UDFs using VSCode extension
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
08-06-2024 02:06 PM
I am trying to debug my python script using Databricks VSCode extension. I am using udf and pandas_udf in my script. Everything works fine except when the execution gets to the udf and pandas_udf usages. It then complains that "SparkContext or SparkSession should be created first.". I did some research and looks like SparkContext is not supported in Databricks Connect (read https://docs.databricks.com/en/dev-tools/databricks-connect/python/limitations.html). On the other hand I also found https://docs.databricks.com/en/dev-tools/databricks-connect/python/udf.html that says you can use UDFs with Databricks Connect. So I am confused. If you cannot debug UDFs with vscode extensions, how do you guys typically do this? Thanks for your help.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
08-08-2024 04:09 AM
Hi @SeyedA, To resolve this, first, ensure your SparkSession
is properly initialized in your script. Be aware of the limitations of Databricks Connect, which might affect UDFs, and consider running UDFs locally in a simple Spark environment for debugging. If VSCode becomes too challenging, Databricks Notebooks offer a more integrated environment.
If you have any specific parts of your code that you’d like to share, I can help you troubleshoot further. How does that sound?

