<?xml version="1.0" encoding="UTF-8"?>
<rss xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:taxo="http://purl.org/rss/1.0/modules/taxonomy/" version="2.0">
  <channel>
    <title>topic Re: sparkContext in Runtime 15.3 in Data Engineering</title>
    <link>https://community.databricks.com/t5/data-engineering/sparkcontext-in-runtime-15-3/m-p/110591#M43614</link>
    <description>&lt;P&gt;do you guys found any solutions for this problem?&amp;nbsp;&lt;a href="https://community.databricks.com/t5/user/viewprofilepage/user-id/114576"&gt;@rushi29&lt;/a&gt;&amp;nbsp;&lt;a href="https://community.databricks.com/t5/user/viewprofilepage/user-id/122601"&gt;@jayct&lt;/a&gt;&amp;nbsp;&lt;/P&gt;</description>
    <pubDate>Wed, 19 Feb 2025 11:57:55 GMT</pubDate>
    <dc:creator>Gangster</dc:creator>
    <dc:date>2025-02-19T11:57:55Z</dc:date>
    <item>
      <title>sparkContext in Runtime 15.3</title>
      <link>https://community.databricks.com/t5/data-engineering/sparkcontext-in-runtime-15-3/m-p/81646#M36375</link>
      <description>&lt;DIV&gt;Hello All,&lt;/DIV&gt;&lt;DIV&gt;&amp;nbsp;&lt;/DIV&gt;&lt;DIV&gt;Our Azure databricks cluster is running under "Legacy Shared Compute" policy with 15.3 runtime. One of the python notebooks is used to connect to an Azure SQL database to read/insert data. The following snippet of code is responsible for running queries to insert/update data in Azure SQL database and to execute stored procedures. All of this works without any issues. However, we have now upgraded our environment to Unity Catalog and we want to start using the Unity Catalog instead of the hive_metastore. To write data to Unity Catalog catalog, the cluster must be running in "Shared Compute" policy and not "Legacy Shared Compute". Unfortunately, running the cluster in this mode seems an issue because as per the documentation, sparkContext is not supported for this cluster type starting with runtime 14.0 and above.&amp;nbsp;&lt;/DIV&gt;&lt;DIV&gt;&amp;nbsp;&lt;/DIV&gt;&lt;DIV&gt;&lt;A href="https://learn.microsoft.com/en-us/azure/databricks/compute/access-mode-limitations" target="_blank"&gt;https://learn.microsoft.com/en-us/azure/databricks/compute/access-mode-limitations&lt;/A&gt;&lt;/DIV&gt;&lt;DIV&gt;&amp;nbsp;&lt;/DIV&gt;&lt;DIV&gt;So, the following line of code errors out since _sc is not available.&lt;/DIV&gt;&lt;DIV&gt;&amp;nbsp;&lt;/DIV&gt;&lt;DIV&gt;driver_manager = spark_session._sc._gateway.jvm.java.sql.DriverManager&lt;/DIV&gt;&lt;DIV&gt;&amp;nbsp;&lt;/DIV&gt;&lt;DIV&gt;I have looked around the documentation but haven't seen anything to replace this code so it can run inside a UC enabled cluster. I could use a dataframe to insert the data into Azure SQL but that becomes tricky when I want to return something back, e.g. the newly inserted identity value from that operation. Also, there are additional concerns with the dataframe approach like the code structure, difficulty in reusing, etc.&lt;/DIV&gt;&lt;DIV&gt;&amp;nbsp;&lt;/DIV&gt;&lt;DIV&gt;I wanted to know if there is a different approach to achieve the below features in UC enabled clusters running on 15.3 runtime or above. Also, are there any plans of supporting sparkContext in future versions of runtime. If so, I can just wait for the supported runtime to be released. If there are no plans, then I will need to find an alternate way.&lt;/DIV&gt;&lt;DIV&gt;&amp;nbsp;&lt;/DIV&gt;&lt;DIV&gt;I appreciate any help in this matter.&lt;/DIV&gt;&lt;DIV&gt;&amp;nbsp;&lt;/DIV&gt;&lt;DIV&gt;def get_sqlserver_jdbc_connection(spark_session, server_name, database_name):&lt;/DIV&gt;&lt;DIV&gt;&amp;nbsp; &amp;nbsp; driver_manager = spark_session._sc._gateway.jvm.java.sql.DriverManager&lt;/DIV&gt;&lt;DIV&gt;&amp;nbsp; &amp;nbsp; jdbc_url = "MY DB URL"&lt;/DIV&gt;&lt;DIV&gt;&amp;nbsp; &amp;nbsp; connection = driver_manager.getConnection(jdbc_url)&lt;/DIV&gt;&lt;DIV&gt;&amp;nbsp;&lt;/DIV&gt;&lt;DIV&gt;&amp;nbsp; &amp;nbsp; return connection&lt;/DIV&gt;&lt;DIV&gt;&amp;nbsp;&lt;/DIV&gt;&lt;DIV&gt;# Execute SQL Query with parameters&lt;/DIV&gt;&lt;DIV&gt;connection = get_sqlserver_jdbc_connection(spark_session = spark_session, server_name = server_name, database_name = database_name)&lt;/DIV&gt;&lt;DIV&gt;statement = connection.prepareStatement(sql)&lt;/DIV&gt;&lt;DIV&gt;&amp;nbsp;&lt;/DIV&gt;&lt;DIV&gt;statement.setInt(1, int(self.application_id))&lt;/DIV&gt;&lt;DIV&gt;statement.setString(2, current_date_time)&lt;/DIV&gt;&lt;DIV&gt;statement.setString(3, current_date_time)&lt;/DIV&gt;&lt;DIV&gt;if self.application_execution_context is None:&lt;/DIV&gt;&lt;DIV&gt;&lt;SPAN&gt;statement.setNull(4, JAVA_SQL_TYPE_STRING)&lt;/SPAN&gt;&lt;/DIV&gt;&lt;DIV&gt;else:&lt;/DIV&gt;&lt;DIV&gt;&lt;SPAN&gt;statement.setString(4, self.application_execution_context[0:100])&lt;/SPAN&gt;&lt;/DIV&gt;&lt;DIV&gt;&amp;nbsp;&lt;/DIV&gt;&lt;DIV&gt;resultset = statement.executeQuery()&lt;/DIV&gt;&lt;DIV&gt;resultset.next()&lt;/DIV&gt;&lt;DIV&gt;application_execution_id = resultset.getInt(1)&lt;/DIV&gt;&lt;DIV&gt;connection.close()&lt;/DIV&gt;&lt;DIV&gt;&amp;nbsp;&lt;/DIV&gt;&lt;DIV&gt;# Execute stored procedure&lt;/DIV&gt;&lt;DIV&gt;connection = get_sqlserver_jdbc_connection(spark_session = spark_session, server_name = server_name, database_name = database_name)&lt;/DIV&gt;&lt;DIV&gt;sql = f"exec {stored_procedure_name}"&lt;/DIV&gt;&lt;DIV&gt;statement = connection.prepareStatement(sql)&lt;/DIV&gt;&lt;DIV&gt;statement.execute()&lt;/DIV&gt;&lt;DIV&gt;connection.close()&lt;/DIV&gt;</description>
      <pubDate>Fri, 02 Aug 2024 12:56:22 GMT</pubDate>
      <guid>https://community.databricks.com/t5/data-engineering/sparkcontext-in-runtime-15-3/m-p/81646#M36375</guid>
      <dc:creator>rushi29</dc:creator>
      <dc:date>2024-08-02T12:56:22Z</dc:date>
    </item>
    <item>
      <title>Re: sparkContext in Runtime 15.3</title>
      <link>https://community.databricks.com/t5/data-engineering/sparkcontext-in-runtime-15-3/m-p/82263#M36587</link>
      <description>&lt;P&gt;Thanks&amp;nbsp;&lt;a href="https://community.databricks.com/t5/user/viewprofilepage/user-id/9"&gt;@Retired_mod&lt;/a&gt;&amp;nbsp;for your response. Since, I also need to call stored procedures in the Azure SQL databases from Azure Databricks, I don't think the DataFrames solution would work. When using py4j, how would I create a connection object in Azure Databricks? I tried with various sample code online but none of them worked. I was trying something similar as below. Am I missing anything?&lt;/P&gt;&lt;P&gt;Thanks,&lt;/P&gt;&lt;P&gt;Rushi&lt;/P&gt;&lt;DIV&gt;&lt;DIV&gt;&lt;SPAN&gt;from&lt;/SPAN&gt;&lt;SPAN&gt; pyspark.sql &lt;/SPAN&gt;&lt;SPAN&gt;import&lt;/SPAN&gt;&lt;SPAN&gt; SparkSession&lt;/SPAN&gt;&lt;/DIV&gt;&lt;DIV&gt;&lt;SPAN&gt;from&lt;/SPAN&gt;&lt;SPAN&gt; py4j.java_gateway &lt;/SPAN&gt;&lt;SPAN&gt;import&lt;/SPAN&gt;&lt;SPAN&gt; java_import, JavaGateway, GatewayParameters&lt;/SPAN&gt;&lt;/DIV&gt;&lt;BR /&gt;&lt;DIV&gt;&lt;SPAN&gt;spark &lt;/SPAN&gt;&lt;SPAN&gt;=&lt;/SPAN&gt;&lt;SPAN&gt; SparkSession.builder.&lt;/SPAN&gt;&lt;SPAN&gt;appName&lt;/SPAN&gt;&lt;SPAN&gt;(&lt;/SPAN&gt;&lt;SPAN&gt;"AzureSQLStoredProcedure"&lt;/SPAN&gt;&lt;SPAN&gt;).&lt;/SPAN&gt;&lt;SPAN&gt;getOrCreate&lt;/SPAN&gt;&lt;SPAN&gt;()&lt;/SPAN&gt;&lt;/DIV&gt;&lt;DIV&gt;&lt;SPAN&gt;gateway &lt;/SPAN&gt;&lt;SPAN&gt;=&lt;/SPAN&gt; &lt;SPAN&gt;JavaGateway&lt;/SPAN&gt;&lt;SPAN&gt;()&lt;/SPAN&gt;&lt;/DIV&gt;&lt;DIV&gt;&lt;SPAN&gt;print&lt;/SPAN&gt;&lt;SPAN&gt; (gateway.jvm.DriverManager)&lt;/SPAN&gt;&lt;/DIV&gt;&lt;DIV&gt;&lt;SPAN&gt;jdbc_url &lt;/SPAN&gt;&lt;SPAN&gt;=&lt;/SPAN&gt; &lt;SPAN&gt;"jdbc:sqlserver://xxx.database.windows.net:1433;databaseName=xxx;authentication=ActiveDirectoryMSI"&lt;/SPAN&gt;&lt;/DIV&gt;&lt;DIV&gt;&lt;SPAN&gt;connection &lt;/SPAN&gt;&lt;SPAN&gt;=&lt;/SPAN&gt;&lt;SPAN&gt; gateway.jvm.DriverManager.&lt;/SPAN&gt;&lt;SPAN&gt;getConnection&lt;/SPAN&gt;&lt;SPAN&gt;(jdbc_url)&lt;/SPAN&gt;&lt;/DIV&gt;&lt;DIV&gt;&lt;SPAN&gt;print&lt;/SPAN&gt;&lt;SPAN&gt; (connection)&lt;/SPAN&gt;&lt;/DIV&gt;&lt;DIV&gt;&amp;nbsp;&lt;/DIV&gt;&lt;DIV&gt;&lt;SPAN&gt;This code gives an error&lt;/SPAN&gt;&lt;/DIV&gt;&lt;DIV&gt;&lt;DIV&gt;&lt;DIV class=""&gt;&lt;SPAN class=""&gt;Py4JNetworkError: &lt;/SPAN&gt;An error occurred while trying to connect to the Java server (127.0.0.1:25333)&lt;/DIV&gt;&lt;/DIV&gt;&lt;DIV&gt;&lt;DIV class=""&gt;File &lt;SPAN class=""&gt;/databricks/spark/python/lib/py4j-0.10.9.7-src.zip/py4j/java_gateway.py:982&lt;/SPAN&gt;, in &lt;SPAN class=""&gt;GatewayClient._get_connection&lt;/SPAN&gt;&lt;SPAN class=""&gt;(self)&lt;/SPAN&gt; &lt;SPAN&gt;981&lt;/SPAN&gt; &lt;SPAN class=""&gt;try&lt;/SPAN&gt;: &lt;SPAN class=""&gt;--&amp;gt; 982&lt;/SPAN&gt; connection &lt;SPAN&gt;=&lt;/SPAN&gt; &lt;SPAN&gt;self&lt;/SPAN&gt;&lt;SPAN&gt;.&lt;/SPAN&gt;deque&lt;SPAN&gt;.&lt;/SPAN&gt;pop() &lt;SPAN&gt;983&lt;/SPAN&gt; &lt;SPAN class=""&gt;except&lt;/SPAN&gt; &lt;SPAN class=""&gt;IndexError&lt;/SPAN&gt;:&lt;/DIV&gt;&lt;DIV class=""&gt;&lt;HR /&gt;&lt;/DIV&gt;&lt;DIV class=""&gt;File &lt;SPAN class=""&gt;/databricks/spark/python/lib/py4j-0.10.9.7-src.zip/py4j/java_gateway.py:1177&lt;/SPAN&gt;, in &lt;SPAN class=""&gt;GatewayConnection.start&lt;/SPAN&gt;&lt;SPAN class=""&gt;(self)&lt;/SPAN&gt; &lt;SPAN&gt;1174&lt;/SPAN&gt; msg &lt;SPAN&gt;=&lt;/SPAN&gt; &lt;SPAN&gt;"&lt;/SPAN&gt;&lt;SPAN&gt;An error occurred while trying to connect to the Java &lt;/SPAN&gt;&lt;SPAN&gt;"&lt;/SPAN&gt;\ &lt;SPAN&gt;1175&lt;/SPAN&gt; &lt;SPAN&gt;"&lt;/SPAN&gt;&lt;SPAN&gt;server (&lt;/SPAN&gt;&lt;SPAN class=""&gt;{0}&lt;/SPAN&gt;&lt;SPAN&gt;:&lt;/SPAN&gt;&lt;SPAN class=""&gt;{1}&lt;/SPAN&gt;&lt;SPAN&gt;)&lt;/SPAN&gt;&lt;SPAN&gt;"&lt;/SPAN&gt;&lt;SPAN&gt;.&lt;/SPAN&gt;format(&lt;SPAN&gt;self&lt;/SPAN&gt;&lt;SPAN&gt;.&lt;/SPAN&gt;address, &lt;SPAN&gt;self&lt;/SPAN&gt;&lt;SPAN&gt;.&lt;/SPAN&gt;port) &lt;SPAN&gt;1176&lt;/SPAN&gt; logger&lt;SPAN&gt;.&lt;/SPAN&gt;exception(msg) &lt;SPAN class=""&gt;-&amp;gt; 1177&lt;/SPAN&gt; &lt;SPAN class=""&gt;raise&lt;/SPAN&gt; Py4JNetworkError(msg, e)&lt;/DIV&gt;&lt;/DIV&gt;&lt;/DIV&gt;&lt;/DIV&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;</description>
      <pubDate>Wed, 07 Aug 2024 16:17:40 GMT</pubDate>
      <guid>https://community.databricks.com/t5/data-engineering/sparkcontext-in-runtime-15-3/m-p/82263#M36587</guid>
      <dc:creator>rushi29</dc:creator>
      <dc:date>2024-08-07T16:17:40Z</dc:date>
    </item>
    <item>
      <title>Re: sparkContext in Runtime 15.3</title>
      <link>https://community.databricks.com/t5/data-engineering/sparkcontext-in-runtime-15-3/m-p/91528#M38189</link>
      <description>&lt;P&gt;Hi&amp;nbsp;&lt;a href="https://community.databricks.com/t5/user/viewprofilepage/user-id/114576"&gt;@rushi29&lt;/a&gt;&amp;nbsp;, did you ever get a solution to this?&lt;/P&gt;&lt;P&gt;&lt;a href="https://community.databricks.com/t5/user/viewprofilepage/user-id/9"&gt;@Retired_mod&lt;/a&gt;&amp;nbsp;there never was a response to the issue there&lt;/P&gt;</description>
      <pubDate>Tue, 24 Sep 2024 07:43:57 GMT</pubDate>
      <guid>https://community.databricks.com/t5/data-engineering/sparkcontext-in-runtime-15-3/m-p/91528#M38189</guid>
      <dc:creator>jayct</dc:creator>
      <dc:date>2024-09-24T07:43:57Z</dc:date>
    </item>
    <item>
      <title>Re: sparkContext in Runtime 15.3</title>
      <link>https://community.databricks.com/t5/data-engineering/sparkcontext-in-runtime-15-3/m-p/110591#M43614</link>
      <description>&lt;P&gt;do you guys found any solutions for this problem?&amp;nbsp;&lt;a href="https://community.databricks.com/t5/user/viewprofilepage/user-id/114576"&gt;@rushi29&lt;/a&gt;&amp;nbsp;&lt;a href="https://community.databricks.com/t5/user/viewprofilepage/user-id/122601"&gt;@jayct&lt;/a&gt;&amp;nbsp;&lt;/P&gt;</description>
      <pubDate>Wed, 19 Feb 2025 11:57:55 GMT</pubDate>
      <guid>https://community.databricks.com/t5/data-engineering/sparkcontext-in-runtime-15-3/m-p/110591#M43614</guid>
      <dc:creator>Gangster</dc:creator>
      <dc:date>2025-02-19T11:57:55Z</dc:date>
    </item>
    <item>
      <title>Re: sparkContext in Runtime 15.3</title>
      <link>https://community.databricks.com/t5/data-engineering/sparkcontext-in-runtime-15-3/m-p/110595#M43615</link>
      <description>&lt;P&gt;Hi&amp;nbsp;&lt;a href="https://community.databricks.com/t5/user/viewprofilepage/user-id/122601"&gt;@jayct&lt;/a&gt;&amp;nbsp;,&amp;nbsp;&lt;a href="https://community.databricks.com/t5/user/viewprofilepage/user-id/149891"&gt;@Gangster&lt;/a&gt;&amp;nbsp;,&lt;/P&gt;&lt;P&gt;The "Shared Compute" in Unity Catalog does not support the sparkContext as per the docs and is still the case with the latest version. We ended up using Personal Compute clusters for each of our developers and run the jobs under "Job Compute" in production.&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;</description>
      <pubDate>Wed, 19 Feb 2025 12:26:06 GMT</pubDate>
      <guid>https://community.databricks.com/t5/data-engineering/sparkcontext-in-runtime-15-3/m-p/110595#M43615</guid>
      <dc:creator>rushi29</dc:creator>
      <dc:date>2025-02-19T12:26:06Z</dc:date>
    </item>
    <item>
      <title>Re: sparkContext in Runtime 15.3</title>
      <link>https://community.databricks.com/t5/data-engineering/sparkcontext-in-runtime-15-3/m-p/110604#M43620</link>
      <description>&lt;P&gt;&lt;a href="https://community.databricks.com/t5/user/viewprofilepage/user-id/114576"&gt;@rushi29&lt;/a&gt;&amp;nbsp;&lt;a href="https://community.databricks.com/t5/user/viewprofilepage/user-id/149891"&gt;@Gangster&lt;/a&gt;&lt;/P&gt;&lt;P&gt;I ended up implementing pyodbc with the mssql driver using init scripts.&lt;/P&gt;&lt;P&gt;Spark context is no longer usable in shared compute so that was the only approach we could take.&amp;nbsp;&lt;/P&gt;</description>
      <pubDate>Wed, 19 Feb 2025 14:03:54 GMT</pubDate>
      <guid>https://community.databricks.com/t5/data-engineering/sparkcontext-in-runtime-15-3/m-p/110604#M43620</guid>
      <dc:creator>jayct</dc:creator>
      <dc:date>2025-02-19T14:03:54Z</dc:date>
    </item>
  </channel>
</rss>

