Dear Community,
I am testing pyspark code via pytest using VS code and Databricks Connect.
SparkSession is initiated from Databricks Connect:
from databricks.connect import DatabricksSession
spark = DatabricksSession.builder.getOrCreate()
I am receiving every time error message when I am calling 'SparkSession.sql()' method.
# module.py
def create_catalog(spark_session):
"""Doc string"""
spark_session.sql("""CREATE CATALOG IF NOT EXISTS test_catalog""")
# test_module.py
from module import create_catalog
@pytest.fixture(scope="session")
def spark_session():
"""Creates SparkSession."""
global spark
try:
spark
except NameError:
from databricks.connect import DatabricksSession
spark = DatabricksSession.builder.getOrCreate()
yield spark
def test_create_catalog(spark_session):
"""Doc string"""
create_catalog(spark_session)
I am receiving following error message:
pyspark.errors.exceptions.connect.SparkConnectGrpcException: <_MultiThreadedRendezvous of RPC that terminated with:
E status = StatusCode.UNIMPLEMENTED
E details = "Method not found: spark.connect.SparkConnectService/ReattachExecute"
E debug_error_string = "UNKNOWN:Error received from peer {created_time:"2023-11-04T16:14:26.2187837+00:00", grpc_status:12, grpc_message:"Method
not found: spark.connect.SparkConnectService/ReattachExecute"}"
Issue occurs also when I am using SparkSession directly and not as a fixture.
I have tested and SparkSession.sql() created from databricks.connect works correctly when I am runing code via 'Run file as a Workflow on Databricks.' from VS Code.
Thank you in advance for any help,
Rafal