cancel
Showing results for 
Search instead for 
Did you mean: 
Data Engineering
Join discussions on data engineering best practices, architectures, and optimization strategies within the Databricks Community. Exchange insights and solutions with fellow data engineers.
cancel
Showing results for 
Search instead for 
Did you mean: 

Why there are many offsets in checkpoint

Brad
Contributor

Hi team, 

I'm using trigger=availableNow to read delta table daily. The delta table itself is loaded by structured streaming from kinesis. I noticed there are many offsets under checkpoint, and when the job starting to run to get data from delta table, from log I can see

 

BatchIds found from listing: 186, 187, 188, 189, 190, 191, 192, 193, 194, 195, 196, 197, 198, 199, 200, 201, 202, 203, 204, 205, 206, 207, 208, 209, 210, 211, 212, 213, 214, 215, 216, 217, 218, 219, 220, 221, 222, 223, 224, 225, 226, 227, 228, 229, 230, 231, 232, 233, 234, 235, 236, 237, 238, 239, 240, 241, 242, 243, 244, 245, 246, 247, 248, 249, 250, 251, 252, 253, 254, 255, 256, 257, 258, 259, 260, 261, 262, 263, 264, 265, 266, 267, 268, 269, 270, 271, 272, 273, 274, 275, 276, 277, 278, 279, 280, 281, 282, 283, 284, 285, 286

 

and these batchIds match the offsets. Is it supposed to read the last offset from checkpoint to read from the delta table or am I misunderstanding something here?

Thanks

1 REPLY 1

Rishabh-Pandey
Esteemed Contributor

@Brad  When you see the batch IDs listed in the logs (e.g., 186, 187, 188,...), these correspond to the batches of data that have been processed. Each batch ID represents a specific point in time in the streaming process, where the data was ingested, transformed, and written to the Delta table.

The offsets you see in your checkpoint correspond to the state of the streaming job at each batch ID. The checkpointing mechanism ensures that your streaming job can resume from the last successful batch in case of a failure.

Rishabh Pandey

Connect with Databricks Users in Your Area

Join a Regional User Group to connect with local Databricks users. Events will be happening in your city, and you won’t want to miss the chance to attend and share knowledge.

If there isn’t a group near you, start one and help create a community that brings people together.

Request a New Group