cancel
Showing results for 
Search instead for 
Did you mean: 
Data Engineering
Join discussions on data engineering best practices, architectures, and optimization strategies within the Databricks Community. Exchange insights and solutions with fellow data engineers.
cancel
Showing results for 
Search instead for 
Did you mean: 

Debug UDFs using VSCode extension

SeyedA
New Contributor

I am trying to debug my python script using Databricks VSCode extension. I am using udf and pandas_udf in my script. Everything works fine except when the execution gets to the udf and pandas_udf usages. It then complains that "SparkContext or SparkSession should be created first.". I did some research and looks like SparkContext is not supported in Databricks Connect (read https://docs.databricks.com/en/dev-tools/databricks-connect/python/limitations.html). On the other hand I also found https://docs.databricks.com/en/dev-tools/databricks-connect/python/udf.html that says you can use UDFs with Databricks Connect. So I am confused. If you cannot debug UDFs with vscode extensions, how do you guys typically do this? Thanks for your help.

1 REPLY 1

Retired_mod
Esteemed Contributor III

Hi @SeyedA, To resolve this, first, ensure your SparkSession is properly initialized in your script. Be aware of the limitations of Databricks Connect, which might affect UDFs, and consider running UDFs locally in a simple Spark environment for debugging. If VSCode becomes too challenging, Databricks Notebooks offer a more integrated environment. 

If you have any specific parts of your code that you’d like to share, I can help you troubleshoot further. How does that sound?