Databricks Community

NikosLoutas · ‎04-02-2025

Good morning,

I am having a difficulty when trying to checkpoint a PySpark DataFrame.

The DataFrame is not involved in a DLT pipeline so I am using the df.checkpoint(eager=True) command, to truncate the logical plan of df and materialize it as files within a Unity Catalog volume directory.

However, after some search, it seems that the checkpoint location needs to be an hdfs mounted directory.
I think this is deprecated in Unity Catalog and an alternative would be to write the df in the UC volume directory and then immediately read it back.

Does anyone know if hdfs is indeed deprecated in Unity Catalog and if the alternative mentioned above is a valid one ?

Thank you.