cancel
Showing results for 
Search instead for 
Did you mean: 
Data Engineering
Join discussions on data engineering best practices, architectures, and optimization strategies within the Databricks Community. Exchange insights and solutions with fellow data engineers.
cancel
Showing results for 
Search instead for 
Did you mean: 

Debugger freezes when calling spark.sql with dbx connect

Sega2
New Contributor III

I have just created a simple bundle with databricks, and is using Databricks connect to debug locally. This is my script:

from pyspark.sql import SparkSession, DataFrame

def get_taxis(spark: SparkSession) -> DataFrame:
  return spark.read.table("samples.nyctaxi.trips")

# Create a new Databricks Connect session. If this fails,
# check that you have configured Databricks Connect correctly.
# See https://docs.databricks.com/dev-tools/databricks-connect.html.
def get_spark() -> SparkSession:
  try:
    from databricks.connect import DatabricksSession
    return DatabricksSession.builder.getOrCreate()
  except ImportError:
    return SparkSession.builder.getOrCreate()

def test_connection():
    try:
        print("Attempting to create Spark session...")
        spark = get_spark()
        print("Successfully created Spark session")
        
        # Test with a simple query first
        print("Testing with a simple query...")
        test_query = "SELECT 1 as test"
        test_df = spark.sql(test_query)
        print("Simple query successful")
        
        # If simple query works, try listing tables
        print("Attempting to list tables...")
        spark.sql("SHOW DATABASES").show()
        
        return spark
        
    except Exception as e:
        print(f"Error type: {type(e).__name__}")
        print(f"Error message: {str(e)}")
        print(f"Error location: {e.__traceback__.tb_frame.f_code.co_filename}:{e.__traceback__.tb_lineno}")
        raise

def main():
    try:
        # First test the connection
        spark = test_connection()
        print("Connection test completed successfully")
        
        # If connection works, proceed with the original code
        print("Proceeding with main query...")
        
        # Define your SQL query
        sql_query = """
        select * from supermarket_dev.streaming_bronze.source_setting where source_application = 'iban'
        """
        
        print(f"Executing query: {sql_query}")
        # Execute the SQL query and convert the results into a DataFrame
        df = spark.sql(sql_query)
        
        print("Query executed successfully")
        print(f"DataFrame is empty: {df.isEmpty()}")
        print(f"DataFrame schema: {df.schema}")
        
        # Show the DataFrame contents
        first = df.first()
        print(f"First row: {first}")

    except Exception as e:
        print(f"Error type: {type(e).__name__}")
        print(f"Error message: {str(e)}")
        print(f"Error location: {e.__traceback__.tb_frame.f_code.co_filename}:{e.__traceback__.tb_lineno}")
        raise

if __name__ == '__main__':
  main()

Every time i call spark.sql then the debugger freezes following and VS code just stands like this:

If I deploy it then I can see it runs through successfully:

Any pointers what to do or what can cause this?

 

 

 

Sega2_1-1740135258051.png

Sega2_0-1740135225882.png

2 REPLIES 2

cln
New Contributor II

I am experiencing a similar issue, have you ever managed to find a solution?

mark_ott
Databricks Employee
Databricks Employee

The issue you're experiencing—where your script freezes in VS Code when running spark.sql locally using Databricks Connect, but works correctly when deployed—can result from several common causes related to Databricks Connect configuration, networking, environment mismatches, and limitations in interactive debugging setups.

Key Possible Causes

1. Databricks Connect Misconfiguration

  • If Databricks Connect isn't fully or correctly configured, local commands like spark.sql may get stuck waiting for remote execution that never completes.

  • Ensure your Databricks Connect version matches your Databricks Runtime version and Python version requirements.

2. Network and Firewall Issues

  • Databricks Connect uses REST calls to communicate with the remote cluster. Local firewall, VPN, or proxy settings might block or slow down communication.

  • Check that you can reach Databricks API endpoints from your local machine, and no network interruptions occur during debugging.

3. Python Environment Mismatch

  • Incompatible library versions (PySpark, Databricks Connect, etc.) or environment differences between local and cluster can cause local jobs to hang.

  • Make sure your local Python, PySpark, and Databricks Connect libraries match the versions on your cluster.

4. VS Code Interactive Debugger Limitations

  • Debugging distributed workloads with remote calls can lead to freezes if the debugger tries to step into remote execution, which is not supported or can deadlock the UI.

  • Try running the script without the debugger in a terminal (python script.py) to see if it works; if so, the issue may be specific to interactive debugging and not Databricks Connect itself.

5. Resource Initialization Delays or Deadlocks

  • Creating a SparkSession using Databricks Connect can require initialization overhead. If the connection or authentication takes too long or hits an internal error, VS Code may freeze.

  • Look for logs/output in the Databricks CLI or the Databricks Connect configuration folder for potential errors.

Troubleshooting Steps

  • Validate Databricks Connect setup: Use databricks-connect test and check that all tests pass.

  • Check network access: Try a simple REST call, such as listing clusters with the Databricks CLI.

  • Run outside of debugger: Execute the script in a standard terminal session.

  • Upgrade/downgrade libraries: Ensure all required libraries are compatible and up-to-date.

  • Increase logging/debug output: Set environment variables to increase verbosity (PYSPARK_DEBUG=1, etc.).

  • Clean/reinstall Databricks Connect: Sometimes a fresh install solves hidden dependency issues.

Additional Resources

  • [Databricks Connect troubleshooting documentation]

  • [Databricks Community discussions for hanging issues]

Summary:
VS Code debugger freezes with spark.sql calls are most often caused by misconfiguration, networking issues, or Python environment mismatch when using Databricks Connect. Try validating each point above, and test running outside the debugger for clues.

Join Us as a Local Community Builder!

Passionate about hosting events and connecting people? Help us grow a vibrant local community—sign up today to get started!

Sign Up Now