dlt Streaming Checkpoint Not Found
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
08-31-2024 12:02 PM
I am using Delta Live Tables and have my pipeline defined using the code below. My understanding is that a checkpoint is automatically set when using Delta Live Tables. I am using the Unity Catalog and Schema settings in the pipeline as the storage destination.
Since I am reading JSON messages and many files are being created, I want to eventually run a cleanup process to delete the old files that have already been written to the streaming table. I thought I could do this by looking at the checkpoint file. But I am unable to find where the checkpoints are being written or how i can access them. When i try to manually set a checkpoint directory, nothing gets created when the pipeline runs.
@Dlt.table(
name="newdata_raw",
table_properties={"quality": "bronze"},
temporary=False,
)
def create_table():
query = (
spark.readStream.format("cloudFiles")
.schema(schema)
.option("cloudFiles.format", "json")
.load(sink_dir + "partition=*/")
.selectExpr("newRecord.*")
.withColumn("LOAD_DT", to_timestamp(current_timestamp()))
)
return query
- Labels:
-
Delta Lake
-
Spark
-
Workflows
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
09-01-2024 12:26 PM
Hi @ggsmith ,
If you use Delta Live Tables then checkpoints are stored under the storage location specified in the DLT settings. Each table gets a dedicated directory under storage_location/checkpoints/<dlt_table_name.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
10-04-2024 12:32 AM
@szymon_dybczak how can I access the checkpoint? is there any way i can delete the checkpoints stored in the storage location ? The reason I want to cleanup checkpoint is because spark.sql.shuffle.partition change is not taking effect and as per some discussions on the community, any change in above parameters takes effect after cleaning up existing checkpoints since the value of this parameter is saved there.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
10-04-2024 12:57 AM
Hi @PushkarDeole ,
You can just go to that location in delete it manually. Or you can use dbutils. Whichever you prefer.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
10-04-2024 04:49 AM
Thanks for the quick response @szymon_dybczak and appreciate it. Probably I am missing something. I will check the dbutils part to access the location,
however on your first point, I am not sure how can I directly go to the location and delete it manually. I think that's the main question I have is how can I access that location directly without using any utility ?
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
10-04-2024 09:10 AM
We are using Unity Catalog, so I don't see this storage location option. Just the catalog & target schema.

