Databricks Community

ggsmith · ‎08-31-2024

I am using Delta Live Tables and have my pipeline defined using the code below. My understanding is that a checkpoint is automatically set when using Delta Live Tables. I am using the Unity Catalog and Schema settings in the pipeline as the storage destination.

Since I am reading JSON messages and many files are being created, I want to eventually run a cleanup process to delete the old files that have already been written to the streaming table. I thought I could do this by looking at the checkpoint file. But I am unable to find where the checkpoints are being written or how i can access them. When i try to manually set a checkpoint directory, nothing gets created when the pipeline runs.

@Dlt.table(
    name="newdata_raw",
    table_properties={"quality": "bronze"},
    temporary=False,
)
def create_table():
    query = (
        spark.readStream.format("cloudFiles")
        .schema(schema)
        .option("cloudFiles.format", "json")
        .load(sink_dir + "partition=*/")
        .selectExpr("newRecord.*")
        .withColumn("LOAD_DT", to_timestamp(current_timestamp()))
    )
    return query

szymon_dybczak · ‎09-01-2024

Hi @ggsmith ,

If you use Delta Live Tables then checkpoints are stored under the storage location specified in the DLT settings. Each table gets a dedicated directory under storage_location/checkpoints/<dlt_table_name.

PushkarDeole · ‎10-04-2024

@szymon_dybczak how can I access the checkpoint? is there any way i can delete the checkpoints stored in the storage location ? The reason I want to cleanup checkpoint is because spark.sql.shuffle.partition change is not taking effect and as per some discussions on the community, any change in above parameters takes effect after cleaning up existing checkpoints since the value of this parameter is saved there.

szymon_dybczak · ‎10-04-2024

Hi @PushkarDeole ,

You can just go to that location in delete it manually. Or you can use dbutils. Whichever you prefer.

PushkarDeole · ‎10-04-2024

Thanks for the quick response @szymon_dybczak and appreciate it. Probably I am missing something. I will check the dbutils part to access the location,

however on your first point, I am not sure how can I directly go to the location and delete it manually. I think that's the main question I have is how can I access that location directly without using any utility ?