cancel
Showing results for 
Search instead for 
Did you mean: 
Data Engineering
Join discussions on data engineering best practices, architectures, and optimization strategies within the Databricks Community. Exchange insights and solutions with fellow data engineers.
cancel
Showing results for 
Search instead for 
Did you mean: 

spark.sql makes debugger freeze

Sega2
New Contributor III

I have just created a simple bundle with databricks, and is using Databricks connect to debug locally. This is my script:

from pyspark.sql import SparkSession, DataFrame

def get_taxis(spark: SparkSession) -> DataFrame:
  return spark.read.table("samples.nyctaxi.trips")


# Create a new Databricks Connect session. If this fails,
# check that you have configured Databricks Connect correctly.
# See https://docs.databricks.com/dev-tools/databricks-connect.html.
def get_spark() -> SparkSession:
  try:
    from databricks.connect import DatabricksSession
    return DatabricksSession.builder.getOrCreate()
  except ImportError:
    return SparkSession.builder.getOrCreate()

def test_connection():
    try:
        print("Attempting to create Spark session...")
        spark = get_spark()
        print("Successfully created Spark session")
        
        # Test with a simple query first
        print("Testing with a simple query...")
        test_query = "SELECT 1 as test"
        test_df = spark.sql(test_query)
        print("Simple query successful")
        
        # If simple query works, try listing tables
        print("Attempting to list tables...")
        spark.sql("SHOW DATABASES").show()
        
        return spark
        
    except Exception as e:
        print(f"Error type: {type(e).__name__}")
        print(f"Error message: {str(e)}")
        print(f"Error location: {e.__traceback__.tb_frame.f_code.co_filename}:{e.__traceback__.tb_lineno}")
        raise

def main():
    try:
        # First test the connection
        spark = test_connection()
        print("Connection test completed successfully")
        
        # If connection works, proceed with the original code
        print("Proceeding with main query...")
        
        # Define your SQL query
        sql_query = """
        select * from supermarket_dev.streaming_bronze.source_setting where source_application = 'iban'
        """
        
        print(f"Executing query: {sql_query}")
        # Execute the SQL query and convert the results into a DataFrame
        df = spark.sql(sql_query)
        
        print("Query executed successfully")
        print(f"DataFrame is empty: {df.isEmpty()}")
        print(f"DataFrame schema: {df.schema}")
        
        # Show the DataFrame contents
        first = df.first()
        print(f"First row: {first}")

    except Exception as e:
        print(f"Error type: {type(e).__name__}")
        print(f"Error message: {str(e)}")
        print(f"Error location: {e.__traceback__.tb_frame.f_code.co_filename}:{e.__traceback__.tb_lineno}")
        raise

if __name__ == '__main__':
  main()

Every time i call spark.sql then the debugger freezes following and VS code just stands like this:

Sega2_0-1739520074229.png

If I deploy it then I can see it runs through successfully:

Sega2_1-1739520103137.png

Any pointers what to do or what can cause this?

 

1 REPLY 1

NandiniN
Databricks Employee
Databricks Employee

Ensure that your Databricks Connect is properly set up and is using the correct version compatible with your cluster’s runtime. For VS Code, any mismatches between the installed databricks-connect Python package version and the cluster runtime could lead to freezes or errors.

Also, Add detailed logging in your code to help identify where the freeze might be happening. You can add logs around the spark.sql operations to monitor the query execution phases and catch errors, if any.

 

 

Join Us as a Local Community Builder!

Passionate about hosting events and connecting people? Help us grow a vibrant local community—sign up today to get started!

Sign Up Now