cancel
Showing results for 
Search instead for 
Did you mean: 
Data Engineering
cancel
Showing results for 
Search instead for 
Did you mean: 

Autoloader: How to identify the backlog in RocksDB

User16869510359
Esteemed Contributor

With S3-SQS it was easier to identify the backlog ( the messages that are fetched from SQS and not consumed by the streaming job)

How to find the same with Auto-loader

1 ACCEPTED SOLUTION

Accepted Solutions

User16869510359
Esteemed Contributor

For DBR 8.2 or later, the backlog details are captured in the Streaming metrics

Eg:

image (11)

View solution in original post

2 REPLIES 2

User16869510359
Esteemed Contributor

For DBR versions prior to 8.2 use the below code snippet:

import org.rocksdb.Options
import org.apache.hadoop.fs.Path
import com.databricks.sql.rocksdb.{CloudRocksDB, PutLogEntry}
val rocksdbPath: String = new Path("/tmp/hari/streaming/auto_loader2/sources/0/", "rocksdb").toString
val rocksDBOptions = new Options()
  rocksDBOptions
    .setCreateIfMissing(true)
    .setMaxTotalWalSize(Long.MaxValue)
    .setWalTtlSeconds(Long.MaxValue)
    .setWalSizeLimitMB(Long.MaxValue)
val rocksDB = CloudRocksDB.open(
    rocksdbPath,
    hadoopConf = spark.sessionState.newHadoopConf(),
    dbOptions = rocksDBOptions,
    opTypePrefix = "autoIngest")
println("Latest offset in RocksDB:" + rocksDB.latestDurableSequenceNumber) 
println("Number of files to be processed " + rocksDB.latestDurableSequenceNumber - latestSeqFromQueryProgression)

User16869510359
Esteemed Contributor

For DBR 8.2 or later, the backlog details are captured in the Streaming metrics

Eg:

image (11)

Welcome to Databricks Community: Lets learn, network and celebrate together

Join our fast-growing data practitioner and expert community of 80K+ members, ready to discover, help and collaborate together while making meaningful connections. 

Click here to register and join today! 

Engage in exciting technical discussions, join a group with your peers and meet our Featured Members.