Hello All,
Our Azure databricks cluster is running under "Legacy Shared Compute" policy with 15.3 runtime. One of the python notebooks is used to connect to an Azure SQL database to read/insert data. The following snippet of code is responsible for running queries to insert/update data in Azure SQL database and to execute stored procedures. All of this works without any issues. However, we have now upgraded our environment to Unity Catalog and we want to start using the Unity Catalog instead of the hive_metastore. To write data to Unity Catalog catalog, the cluster must be running in "Shared Compute" policy and not "Legacy Shared Compute". Unfortunately, running the cluster in this mode seems an issue because as per the documentation, sparkContext is not supported for this cluster type starting with runtime 14.0 and above.
So, the following line of code errors out since _sc is not available.
driver_manager = spark_session._sc._gateway.jvm.java.sql.DriverManager
I have looked around the documentation but haven't seen anything to replace this code so it can run inside a UC enabled cluster. I could use a dataframe to insert the data into Azure SQL but that becomes tricky when I want to return something back, e.g. the newly inserted identity value from that operation. Also, there are additional concerns with the dataframe approach like the code structure, difficulty in reusing, etc.
I wanted to know if there is a different approach to achieve the below features in UC enabled clusters running on 15.3 runtime or above. Also, are there any plans of supporting sparkContext in future versions of runtime. If so, I can just wait for the supported runtime to be released. If there are no plans, then I will need to find an alternate way.
I appreciate any help in this matter.
def get_sqlserver_jdbc_connection(spark_session, server_name, database_name):
driver_manager = spark_session._sc._gateway.jvm.java.sql.DriverManager
jdbc_url = "MY DB URL"
connection = driver_manager.getConnection(jdbc_url)
return connection
# Execute SQL Query with parameters
connection = get_sqlserver_jdbc_connection(spark_session = spark_session, server_name = server_name, database_name = database_name)
statement = connection.prepareStatement(sql)
statement.setInt(1, int(self.application_id))
statement.setString(2, current_date_time)
statement.setString(3, current_date_time)
if self.application_execution_context is None:
statement.setNull(4, JAVA_SQL_TYPE_STRING)
else:
statement.setString(4, self.application_execution_context[0:100])
resultset = statement.executeQuery()
resultset.next()
application_execution_id = resultset.getInt(1)
connection.close()
# Execute stored procedure
connection = get_sqlserver_jdbc_connection(spark_session = spark_session, server_name = server_name, database_name = database_name)
sql = f"exec {stored_procedure_name}"
statement = connection.prepareStatement(sql)
statement.execute()
connection.close()