cancel
Showing results for 
Search instead for 
Did you mean: 
Data Engineering
Join discussions on data engineering best practices, architectures, and optimization strategies within the Databricks Community. Exchange insights and solutions with fellow data engineers.
cancel
Showing results for 
Search instead for 
Did you mean: 

dlt Streaming Checkpoint Not Found

ggsmith
New Contributor III

I am using Delta Live Tables and have my pipeline defined using the code below. My understanding is that a checkpoint is automatically set when using Delta Live Tables. I am using the Unity Catalog and Schema settings in the pipeline as the storage destination.

Since I am reading JSON messages and many files are being created, I want to eventually run a cleanup process to delete the old files that have already been written to the streaming table. I thought I could do this by looking at the checkpoint file. But I am unable to find where the checkpoints are being written or how i can access them. When i try to manually set a checkpoint directory, nothing gets created when the pipeline runs. 

 

@Dlt.table(
    name="newdata_raw",
    table_properties={"quality": "bronze"},
    temporary=False,
)
def create_table():
    query = (
        spark.readStream.format("cloudFiles")
        .schema(schema)
        .option("cloudFiles.format", "json")
        .load(sink_dir + "partition=*/")
        .selectExpr("newRecord.*")
        .withColumn("LOAD_DT", to_timestamp(current_timestamp()))
    )
    return query

 

 

5 REPLIES 5

szymon_dybczak
Contributor III

Hi @ggsmith ,

If you use Delta Live Tables then checkpoints are stored under the storage location specified in the DLT settings. Each table gets a dedicated directory under storage_location/checkpoints/<dlt_table_name.

Slash_0-1725218574958.png

 

@szymon_dybczak  how can I access the checkpoint? is there any way i can delete the checkpoints stored in the storage location ? The reason I want to cleanup checkpoint is because spark.sql.shuffle.partition change is not taking effect and as per some discussions on the community, any change in above parameters takes effect after cleaning up existing checkpoints since the value of this parameter is saved there.

Hi @PushkarDeole ,

You can just go to that location in delete it manually. Or you can use dbutils. Whichever you prefer.

Thanks for the quick response @szymon_dybczak and appreciate it.  Probably I am missing something. I will check the dbutils part to access the location,

however on your first point, I am not sure how can I directly go to the location and delete it manually. I think that's the main question I have is how can I access that location directly without using any utility ?

We are using Unity Catalog, so I don't see this storage location option. Just the catalog & target schema. 

Connect with Databricks Users in Your Area

Join a Regional User Group to connect with local Databricks users. Events will be happening in your city, and you won’t want to miss the chance to attend and share knowledge.

If there isn’t a group near you, start one and help create a community that brings people together.

Request a New Group