<?xml version="1.0" encoding="UTF-8"?>
<rss xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:taxo="http://purl.org/rss/1.0/modules/taxonomy/" version="2.0">
  <channel>
    <title>topic Unity Catalog Volume as spark checkpoint location in Data Engineering</title>
    <link>https://community.databricks.com/t5/data-engineering/unity-catalog-volume-as-spark-checkpoint-location/m-p/53755#M29872</link>
    <description>&lt;P&gt;Hi,&lt;/P&gt;&lt;P&gt;I tried to set the spark checkpoint location in a notebook to a folder in a Unity Catalog Volume, with the following command:&amp;nbsp;&lt;/P&gt;&lt;P&gt;&lt;SPAN&gt;sc.&lt;/SPAN&gt;&lt;SPAN&gt;setCheckpointDir&lt;/SPAN&gt;&lt;SPAN&gt;(&lt;/SPAN&gt;&lt;SPAN&gt;"/Volumes/catalog_name/schema_name/volume_name/folder_name"&lt;/SPAN&gt;&lt;SPAN&gt;)&lt;/SPAN&gt;&lt;/P&gt;&lt;P&gt;&lt;SPAN&gt;Unfortunately I receive the following error: "&lt;SPAN class=""&gt;Py4JJavaError&lt;/SPAN&gt;: An error occurred while calling o356.setCheckpointDir. : java.io.IOException: Operation not permitted".&lt;/SPAN&gt;&lt;/P&gt;&lt;P&gt;&lt;SPAN&gt;My user have all privileges granted on the volume.&lt;/SPAN&gt;&lt;/P&gt;&lt;P&gt;&lt;SPAN&gt;Did anyone face the same issue? Is it possible to use Databricks volumes as storage location for checkpoints?&lt;/SPAN&gt;&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;</description>
    <pubDate>Fri, 24 Nov 2023 14:05:22 GMT</pubDate>
    <dc:creator>Balazs</dc:creator>
    <dc:date>2023-11-24T14:05:22Z</dc:date>
    <item>
      <title>Unity Catalog Volume as spark checkpoint location</title>
      <link>https://community.databricks.com/t5/data-engineering/unity-catalog-volume-as-spark-checkpoint-location/m-p/53755#M29872</link>
      <description>&lt;P&gt;Hi,&lt;/P&gt;&lt;P&gt;I tried to set the spark checkpoint location in a notebook to a folder in a Unity Catalog Volume, with the following command:&amp;nbsp;&lt;/P&gt;&lt;P&gt;&lt;SPAN&gt;sc.&lt;/SPAN&gt;&lt;SPAN&gt;setCheckpointDir&lt;/SPAN&gt;&lt;SPAN&gt;(&lt;/SPAN&gt;&lt;SPAN&gt;"/Volumes/catalog_name/schema_name/volume_name/folder_name"&lt;/SPAN&gt;&lt;SPAN&gt;)&lt;/SPAN&gt;&lt;/P&gt;&lt;P&gt;&lt;SPAN&gt;Unfortunately I receive the following error: "&lt;SPAN class=""&gt;Py4JJavaError&lt;/SPAN&gt;: An error occurred while calling o356.setCheckpointDir. : java.io.IOException: Operation not permitted".&lt;/SPAN&gt;&lt;/P&gt;&lt;P&gt;&lt;SPAN&gt;My user have all privileges granted on the volume.&lt;/SPAN&gt;&lt;/P&gt;&lt;P&gt;&lt;SPAN&gt;Did anyone face the same issue? Is it possible to use Databricks volumes as storage location for checkpoints?&lt;/SPAN&gt;&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;</description>
      <pubDate>Fri, 24 Nov 2023 14:05:22 GMT</pubDate>
      <guid>https://community.databricks.com/t5/data-engineering/unity-catalog-volume-as-spark-checkpoint-location/m-p/53755#M29872</guid>
      <dc:creator>Balazs</dc:creator>
      <dc:date>2023-11-24T14:05:22Z</dc:date>
    </item>
    <item>
      <title>Re: Unity Catalog Volume as spark checkpoint location</title>
      <link>https://community.databricks.com/t5/data-engineering/unity-catalog-volume-as-spark-checkpoint-location/m-p/83509#M36949</link>
      <description>&lt;P&gt;I am facing the same issue on DBR 14.3 and the beta of 15.4.&lt;/P&gt;&lt;P&gt;My cluster is using the "Unrestricted" policy and "Single user" access mode set a user which has permission to read and write to the volume. I tested the permissions by writing a small dataframe to my desired checkpoint folder directly (with .write instead of .setCheckpointDir followed by .checkpoint) and did not get the error. The exception is only raised when setting the volume as Spark's checkpoint directory.&lt;/P&gt;&lt;P&gt;Here is a bit more of the stack trace when calling .setCheckpointDir on a Unity catalog volume.&lt;/P&gt;&lt;LI-CODE lang="java"&gt;java.io.IOException: Operation not permitted
	at java.io.UnixFileSystem.canonicalize0(Native Method)
	at java.io.UnixFileSystem.canonicalize(UnixFileSystem.java:177)
	at java.io.File.getCanonicalPath(File.java:626)
	at java.io.File.getCanonicalFile(File.java:651)
	at org.apache.spark.util.SparkFileUtils.resolveURI(SparkFileUtils.scala:49)
	at org.apache.spark.util.SparkFileUtils.resolveURI$(SparkFileUtils.scala:33)
	at org.apache.spark.util.Utils$.resolveURI(Utils.scala:105)
	...&lt;/LI-CODE&gt;&lt;P&gt;What storage solution is recommended for setting the cluster checkpoint directory in Databricks, if not Unity volumes?&lt;/P&gt;</description>
      <pubDate>Mon, 19 Aug 2024 23:22:43 GMT</pubDate>
      <guid>https://community.databricks.com/t5/data-engineering/unity-catalog-volume-as-spark-checkpoint-location/m-p/83509#M36949</guid>
      <dc:creator>Erp12</dc:creator>
      <dc:date>2024-08-19T23:22:43Z</dc:date>
    </item>
    <item>
      <title>Re: Unity Catalog Volume as spark checkpoint location</title>
      <link>https://community.databricks.com/t5/data-engineering/unity-catalog-volume-as-spark-checkpoint-location/m-p/110677#M43640</link>
      <description>&lt;P&gt;Further to this, it also seems that it is not possible to set a checkpoint directory on an external location where the principal has write permission to the external location.&amp;nbsp;&lt;/P&gt;&lt;P&gt;When we try:&lt;/P&gt;&lt;LI-CODE lang="markup"&gt;spark.sparkContext.setCheckpointDir("s3://bucket/path")&lt;/LI-CODE&gt;&lt;P&gt;we see:&lt;/P&gt;&lt;LI-CODE lang="markup"&gt;com.amazonaws.services.s3.model.AmazonS3Exception: Access Denied;&lt;/LI-CODE&gt;&lt;P&gt;(I know its not a permissions issue, because I can read and write dataframes to the same path on the same UC cluster).&lt;/P&gt;&lt;P&gt;We've also tried setting the checkpoint directory through the spark configs like this:&lt;/P&gt;&lt;LI-CODE lang="markup"&gt;spark.conf.set("spark.checkpoint.dir", "s3://bucket/path")&lt;/LI-CODE&gt;&lt;P&gt;But we get:&lt;/P&gt;&lt;LI-CODE lang="markup"&gt;[CANNOT_MODIFY_CONFIG] Cannot modify the value of the Spark config: "spark.checkpoint.dir".
See also 'https://spark.apache.org/docs/latest/sql-migration-guide.html#ddl-statements'. SQLSTATE: 46110
File &amp;lt;command-5849427671817506&amp;gt;, line 1&lt;/LI-CODE&gt;&lt;P&gt;Both attempted on DBR 15.4, dedidated cluster.&lt;/P&gt;&lt;P&gt;I am shocked. Is it not possible to use checkpoints on UC???? There must be something I am overlooking.&lt;/P&gt;</description>
      <pubDate>Thu, 20 Feb 2025 03:35:13 GMT</pubDate>
      <guid>https://community.databricks.com/t5/data-engineering/unity-catalog-volume-as-spark-checkpoint-location/m-p/110677#M43640</guid>
      <dc:creator>DBC__not_17496</dc:creator>
      <dc:date>2025-02-20T03:35:13Z</dc:date>
    </item>
    <item>
      <title>Re: Unity Catalog Volume as spark checkpoint location</title>
      <link>https://community.databricks.com/t5/data-engineering/unity-catalog-volume-as-spark-checkpoint-location/m-p/111757#M43990</link>
      <description>&lt;P&gt;Did you get any solution for the above issue? I am also trying same in DBR 15.4, Standard cluster .So I am able to set checkpoint directory using below commands.&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;LI-CODE lang="python"&gt;spark.conf.set("pyspark.sql.DataFrame.checkpoint", "/Volumes/path/")
spark.conf.set("spark.sql.checkpoint.dir","/Volumes/path/")
spark.conf.set("spark.sql.checkpointLocation","/Volumes/path/")&lt;/LI-CODE&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;&amp;nbsp;But its failing&amp;nbsp; while checkpointing a dataframe.&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;LI-CODE lang="python"&gt;df.checkpoint(True)
--&amp;gt;Checkpoint directory has not been set in the SparkContext
File &amp;lt;command-8168308127814448&amp;gt;, line 1
----&amp;gt; 1 df.checkpoint(True)
File /databricks/spark/python/pyspark/sql/connect/client/core.py:2149, in SparkConnectClient._handle_rpc_error(self, rpc_error)
   2134                 raise Exception(
   2135                     "Python versions in the Spark Connect client and server are different. "
   2136                     "To execute user-defined functions, client and server should have the "&lt;/LI-CODE&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;&amp;nbsp;In this case can we use localCheckoint as an alternative but I know localCheckoint are not reliable as&amp;nbsp;&lt;SPAN class=""&gt; Local checkpoints&lt;/SPAN&gt; &lt;SPAN class=""&gt;are stored in the executors using the caching subsystem.&lt;/SPAN&gt;&lt;/P&gt;&lt;P&gt;&lt;SPAN class=""&gt;&lt;SPAN&gt;Is it not really possible to use checkpoints on UC enabled cluster with DBR 15.4 Or is there any new way to use checkpoint on dataframe .&lt;/SPAN&gt;&lt;/SPAN&gt;&lt;/P&gt;</description>
      <pubDate>Tue, 04 Mar 2025 19:33:23 GMT</pubDate>
      <guid>https://community.databricks.com/t5/data-engineering/unity-catalog-volume-as-spark-checkpoint-location/m-p/111757#M43990</guid>
      <dc:creator>satya1206</dc:creator>
      <dc:date>2025-03-04T19:33:23Z</dc:date>
    </item>
    <item>
      <title>Re: Unity Catalog Volume as spark checkpoint location</title>
      <link>https://community.databricks.com/t5/data-engineering/unity-catalog-volume-as-spark-checkpoint-location/m-p/144450#M52321</link>
      <description>&lt;P&gt;Any progress on this?&amp;nbsp;&lt;A href="https://docs.databricks.com/aws/en/notebooks/source/graphframes-user-guide-py.html" target="_blank"&gt;https://docs.databricks.com/aws/en/notebooks/source/graphframes-user-guide-py.html&lt;/A&gt;&lt;BR /&gt;this is not working both with checkpointing and s&lt;SPAN&gt;tandard graph algorithms&lt;/SPAN&gt;&lt;/P&gt;</description>
      <pubDate>Mon, 19 Jan 2026 16:17:30 GMT</pubDate>
      <guid>https://community.databricks.com/t5/data-engineering/unity-catalog-volume-as-spark-checkpoint-location/m-p/144450#M52321</guid>
      <dc:creator>aaonurdemir</dc:creator>
      <dc:date>2026-01-19T16:17:30Z</dc:date>
    </item>
  </channel>
</rss>

