Databricks Community

Brad

Hi team,I have a delta table src, and somehow I want to replicate it to another table tgt with CDF, sort of (spark .readStream .format("delta") .option("readChangeFeed", "true") .table('src') .writeStream .format("delta") ...

Brad

Hi,If I use dropDuplicates inside foreachBatch, the dropDuplicates will become stateless and no state. It just drop duplicates for the current micro batch so I don't have to specify watermark. Is this true?Thanks

Brad

Hi,I'm using runtime 15.4 LTS or 14.3 LTS. When loading a delta lake table from Kinesis, I found the delta log checkpoint is in mixing formats like:7616 00000000000003291896.checkpoint.b1c24725-....json 7616 00000000000003291906.checkpoint.873e1b3e-....

Brad · 10-15-2024

Hi team,Kinesis -> delta table raw -> job with trigger=availableNow -> delta table target. The Kinesis->delta table raw is running continuously. The job is daily with trigger=availableNow. The job reads from raw, do some transformation, and run a MER...

Brad · 10-14-2024

Hi team, I'm using trigger=availableNow to read delta table daily. The delta table itself is loaded by structured streaming from kinesis. I noticed there are many offsets under checkpoint, and when the job starting to run to get data from delta table...

Brad

Thanks @VZLA . How to runspark.sparkContext.getPersistentRDDs.values.foreach(_.unpersist())from databricks notebook?

Brad

@VZLA , thanks for the input and suggestion. Will create a support ticket.

Brad

Thanks. If the replicated table can have the _commit_version in strict sequence, I can take it as a global ever-incremental col and consume the delta of it (e.g. in batch way) with select * from replicated_tgt where _commit_version > ( selecct la...

Brad

Thanks. I tracked there with log but cannot figure out which parts make the 18000 version apply slow. It is the same with CDF if I feed a big range to table_changes function. Any idea on this?

Brad

Appreciate for the input. Thanks.We try to use delta table as a streaming sink, so we don't want to control the update frequency for the raw table and target to load it asap. The default checkpointinterval is actually 10. I tried to change it to bigg...

Databricks Community

User Stats

User Activity

Can I have sequence guarantee when replicate with CDF

dropDuplicates inside foreachBatch

why delta log checkpoint is created in different formats

why latestOffset and getBatch takes so long time

Why there are many offsets in checkpoint

Re: How to disable all cache

Re: why latestOffset and getBatch takes so long time

Re: Can I have sequence guarantee when replicate with CDF

Re: why latestOffset and getBatch takes so long time

Re: why latestOffset and getBatch takes so long time