<?xml version="1.0" encoding="UTF-8"?>
<rss xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:taxo="http://purl.org/rss/1.0/modules/taxonomy/" version="2.0">
  <channel>
    <title>topic Unity Catalog : RDD Issue in Get Started Discussions</title>
    <link>https://community.databricks.com/t5/get-started-discussions/unity-catalog-rdd-issue/m-p/105539#M4736</link>
    <description>&lt;P&gt;In our existing notebooks, the scripts are reliant on RDDs. However, with the upgrade to Unity Catalog, RDDs will no longer be supported. We need to explore alternative approaches or tools to replace the use of RDDs. Could you suggest the best practices or migration strategies for this transition?&lt;/P&gt;</description>
    <pubDate>Tue, 14 Jan 2025 09:24:51 GMT</pubDate>
    <dc:creator>shwetamagar</dc:creator>
    <dc:date>2025-01-14T09:24:51Z</dc:date>
    <item>
      <title>Unity Catalog : RDD Issue</title>
      <link>https://community.databricks.com/t5/get-started-discussions/unity-catalog-rdd-issue/m-p/105539#M4736</link>
      <description>&lt;P&gt;In our existing notebooks, the scripts are reliant on RDDs. However, with the upgrade to Unity Catalog, RDDs will no longer be supported. We need to explore alternative approaches or tools to replace the use of RDDs. Could you suggest the best practices or migration strategies for this transition?&lt;/P&gt;</description>
      <pubDate>Tue, 14 Jan 2025 09:24:51 GMT</pubDate>
      <guid>https://community.databricks.com/t5/get-started-discussions/unity-catalog-rdd-issue/m-p/105539#M4736</guid>
      <dc:creator>shwetamagar</dc:creator>
      <dc:date>2025-01-14T09:24:51Z</dc:date>
    </item>
    <item>
      <title>Re: Unity Catalog : RDD Issue</title>
      <link>https://community.databricks.com/t5/get-started-discussions/unity-catalog-rdd-issue/m-p/105547#M4737</link>
      <description>&lt;P class="_1t7bu9h1 paragraph"&gt;&lt;SPAN&gt;To transition from using RDDs (Resilient Distributed Datasets) to alternative approaches supported by Unity Catalog, you can follow these best practices and migration strategies:&lt;/SPAN&gt;&lt;/P&gt;
&lt;OL&gt;
&lt;LI&gt;
&lt;P class="_1t7bu9h1 paragraph"&gt;&lt;SPAN&gt;&lt;STRONG&gt;Use DataFrame API&lt;/STRONG&gt;: The DataFrame API is the recommended alternative to RDDs. It provides a higher-level abstraction for data processing and is optimized for performance. You can convert your existing RDD-based code to use DataFrames, which are supported in Unity Catalog.&lt;/SPAN&gt;&lt;/P&gt;
&lt;/LI&gt;
&lt;LI&gt;
&lt;P class="_1t7bu9h1 paragraph"&gt;&lt;SPAN&gt;&lt;STRONG&gt;Replace RDD Operations&lt;/STRONG&gt;:&lt;/SPAN&gt;&lt;/P&gt;
&lt;UL class="_1t7bu9h7 _1t7bu9h2"&gt;
&lt;LI&gt;&lt;SPAN&gt;For operations like &lt;CODE&gt;sc.parallelize&lt;/CODE&gt;, use &lt;CODE&gt;spark.createDataFrame&lt;/CODE&gt; with a list of dictionaries or Row objects.&lt;/SPAN&gt;&lt;/LI&gt;
&lt;LI&gt;&lt;SPAN&gt;For creating empty DataFrames, use &lt;CODE&gt;spark.createDataFrame&lt;/CODE&gt; with an empty list and a defined schema.&lt;/SPAN&gt;&lt;/LI&gt;
&lt;LI&gt;&lt;SPAN&gt;For &lt;CODE&gt;mapPartitions&lt;/CODE&gt;, rewrite the logic using DataFrame transformations and PySpark native Arrow UDFs.&lt;/SPAN&gt;&lt;/LI&gt;
&lt;/UL&gt;
&lt;/LI&gt;
&lt;LI&gt;
&lt;P class="_1t7bu9h1 paragraph"&gt;&lt;SPAN&gt;&lt;STRONG&gt;Avoid Spark Context and SQL Context&lt;/STRONG&gt;: Unity Catalog does not support direct access to Spark Context (&lt;CODE&gt;sc&lt;/CODE&gt;) and SQL Context (&lt;CODE&gt;sqlContext&lt;/CODE&gt;). Use the &lt;CODE&gt;spark&lt;/CODE&gt; variable to interact with the SparkSession instance instead.&lt;/SPAN&gt;&lt;/P&gt;
&lt;/LI&gt;
&lt;LI&gt;
&lt;P class="_1t7bu9h1 paragraph"&gt;&lt;SPAN&gt;&lt;STRONG&gt;Use Volumes for File Access&lt;/STRONG&gt;: Instead of using DBFS (Databricks File System) mount points, use Unity Catalog Volumes for file storage and access. This ensures that your data access is governed and secure.&lt;/SPAN&gt;&lt;/P&gt;
&lt;/LI&gt;
&lt;LI&gt;
&lt;P class="_1t7bu9h1 paragraph"&gt;&lt;SPAN&gt;&lt;STRONG&gt;Update Cluster Configurations&lt;/STRONG&gt;: Ensure that your clusters are running Databricks Runtime 13.3 or higher, and configure them to use shared or single-user access modes as appropriate for your workloads.&lt;/SPAN&gt;&lt;/P&gt;
&lt;/LI&gt;
&lt;LI&gt;
&lt;P class="_1t7bu9h1 paragraph"&gt;&lt;SPAN&gt;&lt;STRONG&gt;Migrate Streaming Jobs&lt;/STRONG&gt;: If you have streaming jobs that use RDDs, refactor them to use the Structured Streaming API. Ensure that checkpoint directories are moved to Volumes.&lt;/SPAN&gt;&lt;/P&gt;
&lt;/LI&gt;
&lt;LI&gt;
&lt;P class="_1t7bu9h1 paragraph"&gt;&lt;SPAN&gt;&lt;STRONG&gt;Handle UDFs and Libraries&lt;/STRONG&gt;: For user-defined functions (UDFs) and custom libraries, ensure they are compatible with the DataFrame API and Unity Catalog. Use cluster policies to manage library installations.&lt;/SPAN&gt;&lt;/P&gt;
&lt;/LI&gt;
&lt;LI&gt;
&lt;P class="_1t7bu9h1 paragraph"&gt;&lt;SPAN&gt;&lt;STRONG&gt;Use the SYNC Command&lt;/STRONG&gt;: For migrating tables from Hive/Glue to Unity Catalog, use the SYNC command to synchronize schema and table metadata.&lt;/SPAN&gt;&lt;/P&gt;
&lt;/LI&gt;
&lt;LI&gt;
&lt;P class="_1t7bu9h1 paragraph"&gt;&lt;SPAN&gt;&lt;STRONG&gt;Upgrade Managed and External Tables&lt;/STRONG&gt;: Use the upgrade wizard in Data Explorer to upgrade managed and external tables to Unity Catalog. For managed tables, consider using DEEP CLONE for Delta tables to preserve the delta log.&lt;/SPAN&gt;&lt;/P&gt;
&lt;/LI&gt;
&lt;LI&gt;
&lt;P class="_1t7bu9h1 paragraph"&gt;&lt;SPAN&gt;&lt;STRONG&gt;Refactor Jobs and Notebooks&lt;/STRONG&gt;: Evaluate and refactor your jobs and notebooks to ensure compatibility with Unity Catalog. This includes updating references to tables, paths, and configurations.&lt;/SPAN&gt;&lt;/P&gt;
&lt;/LI&gt;
&lt;/OL&gt;</description>
      <pubDate>Tue, 14 Jan 2025 10:05:47 GMT</pubDate>
      <guid>https://community.databricks.com/t5/get-started-discussions/unity-catalog-rdd-issue/m-p/105547#M4737</guid>
      <dc:creator>Walter_C</dc:creator>
      <dc:date>2025-01-14T10:05:47Z</dc:date>
    </item>
  </channel>
</rss>

