<?xml version="1.0" encoding="UTF-8"?>
<rss xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:taxo="http://purl.org/rss/1.0/modules/taxonomy/" version="2.0">
  <channel>
    <title>topic Migrating Job Orchestration to Shared Compute and avoiding(?) refactoring in Data Engineering</title>
    <link>https://community.databricks.com/t5/data-engineering/migrating-job-orchestration-to-shared-compute-and-avoiding/m-p/61159#M31724</link>
    <description>&lt;P&gt;In an effort to migrate our data objects to the Unity Catalog, we must migrate our Job Orchestration to leverage Shared Compute to interact with the 3-namespace hierarchy.&lt;/P&gt;&lt;P&gt;We have some functions and references to code that are outside of the features supported on Shared Compute, as the majority of our pipelines were built leveraging an Unrestricted policy.&lt;/P&gt;&lt;P&gt;Some examples of these are Spark Dataframe methods like "toJSON()" and "toDF()", as well as RDD commands that are leveraged to get at specific elements within a data structure.&lt;/P&gt;&lt;P&gt;When trying to run these commands, this is the error that is returned.&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;LI-CODE lang="python"&gt;Py4JError: An error occurred while calling o464.toJson. Trace: py4j.security.Py4JSecurityException: Method public java.lang.String com.databricks.backend.common.rpc.CommandContext.toJson() is not whitelisted on class class com.databricks.backend.common.rpc.CommandContext …&lt;/LI-CODE&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;&amp;nbsp;My question is two pronged -&amp;nbsp;&lt;/P&gt;&lt;P&gt;Are there work-arounds for white-listing certain methods so that all instances of these and other non-supported references can avoid refactoring? While some online references say yes, everything ~October '23+ says that this is not possible on a Shared Compute. Adding the following to the compute's spark configuration results in a failure to save. (Other documentation/similar questions support this)&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;LI-CODE lang="python"&gt;spark.databricks.pyspark.enablePy4JSecurity false&lt;/LI-CODE&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;If not - what are people's thoughts on how to best get at this information? Future-state there are presumably some best-practices to implement team-wide... but for now, we would need to identify these snippets of code.&lt;/P&gt;&lt;P&gt;The current proposal is:&lt;/P&gt;&lt;OL&gt;&lt;LI&gt;Traverse our source control for references to these commonly used methods/code snippets&lt;/LI&gt;&lt;LI&gt;Test and address those changes&lt;/LI&gt;&lt;LI&gt;Run associated pipelines in lower environments&lt;/LI&gt;&lt;LI&gt;Continue to address unsupported code until all has been refactored&lt;/LI&gt;&lt;/OL&gt;&lt;P&gt;The concern I have with this is that there are an unknown number of references to an unknown number of unsupported functionality. Can anyone think of a way to quantify this so as to scope out what level of refactor would be required? Is there a magic piece of documentation of what is identified as "unsupported"?&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;</description>
    <pubDate>Mon, 19 Feb 2024 18:03:30 GMT</pubDate>
    <dc:creator>Alex_O</dc:creator>
    <dc:date>2024-02-19T18:03:30Z</dc:date>
    <item>
      <title>Migrating Job Orchestration to Shared Compute and avoiding(?) refactoring</title>
      <link>https://community.databricks.com/t5/data-engineering/migrating-job-orchestration-to-shared-compute-and-avoiding/m-p/61159#M31724</link>
      <description>&lt;P&gt;In an effort to migrate our data objects to the Unity Catalog, we must migrate our Job Orchestration to leverage Shared Compute to interact with the 3-namespace hierarchy.&lt;/P&gt;&lt;P&gt;We have some functions and references to code that are outside of the features supported on Shared Compute, as the majority of our pipelines were built leveraging an Unrestricted policy.&lt;/P&gt;&lt;P&gt;Some examples of these are Spark Dataframe methods like "toJSON()" and "toDF()", as well as RDD commands that are leveraged to get at specific elements within a data structure.&lt;/P&gt;&lt;P&gt;When trying to run these commands, this is the error that is returned.&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;LI-CODE lang="python"&gt;Py4JError: An error occurred while calling o464.toJson. Trace: py4j.security.Py4JSecurityException: Method public java.lang.String com.databricks.backend.common.rpc.CommandContext.toJson() is not whitelisted on class class com.databricks.backend.common.rpc.CommandContext …&lt;/LI-CODE&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;&amp;nbsp;My question is two pronged -&amp;nbsp;&lt;/P&gt;&lt;P&gt;Are there work-arounds for white-listing certain methods so that all instances of these and other non-supported references can avoid refactoring? While some online references say yes, everything ~October '23+ says that this is not possible on a Shared Compute. Adding the following to the compute's spark configuration results in a failure to save. (Other documentation/similar questions support this)&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;LI-CODE lang="python"&gt;spark.databricks.pyspark.enablePy4JSecurity false&lt;/LI-CODE&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;If not - what are people's thoughts on how to best get at this information? Future-state there are presumably some best-practices to implement team-wide... but for now, we would need to identify these snippets of code.&lt;/P&gt;&lt;P&gt;The current proposal is:&lt;/P&gt;&lt;OL&gt;&lt;LI&gt;Traverse our source control for references to these commonly used methods/code snippets&lt;/LI&gt;&lt;LI&gt;Test and address those changes&lt;/LI&gt;&lt;LI&gt;Run associated pipelines in lower environments&lt;/LI&gt;&lt;LI&gt;Continue to address unsupported code until all has been refactored&lt;/LI&gt;&lt;/OL&gt;&lt;P&gt;The concern I have with this is that there are an unknown number of references to an unknown number of unsupported functionality. Can anyone think of a way to quantify this so as to scope out what level of refactor would be required? Is there a magic piece of documentation of what is identified as "unsupported"?&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;</description>
      <pubDate>Mon, 19 Feb 2024 18:03:30 GMT</pubDate>
      <guid>https://community.databricks.com/t5/data-engineering/migrating-job-orchestration-to-shared-compute-and-avoiding/m-p/61159#M31724</guid>
      <dc:creator>Alex_O</dc:creator>
      <dc:date>2024-02-19T18:03:30Z</dc:date>
    </item>
    <item>
      <title>Re: Migrating Job Orchestration to Shared Compute and avoiding(?) refactoring</title>
      <link>https://community.databricks.com/t5/data-engineering/migrating-job-orchestration-to-shared-compute-and-avoiding/m-p/61267#M31749</link>
      <description>&lt;P&gt;&lt;a href="https://community.databricks.com/t5/user/viewprofilepage/user-id/9"&gt;@Retired_mod&lt;/a&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;Okay, that makes sense, thank you.&lt;/P&gt;&lt;P&gt;What about the approach to identifying these unsupported methods? Is there any documentation of what is unsupported between Unrestricted and Shared?&lt;/P&gt;</description>
      <pubDate>Tue, 20 Feb 2024 14:12:25 GMT</pubDate>
      <guid>https://community.databricks.com/t5/data-engineering/migrating-job-orchestration-to-shared-compute-and-avoiding/m-p/61267#M31749</guid>
      <dc:creator>Alex_O</dc:creator>
      <dc:date>2024-02-20T14:12:25Z</dc:date>
    </item>
  </channel>
</rss>

