Hi team,I have a delta table src, and somehow I want to replicate it to another table tgt with CDF, sort of (spark
.readStream
.format("delta")
.option("readChangeFeed", "true")
.table('src')
.writeStream
.format("delta")
...
Hi,If I use dropDuplicates inside foreachBatch, the dropDuplicates will become stateless and no state. It just drop duplicates for the current micro batch so I don't have to specify watermark. Is this true?Thanks
Hi,I'm using runtime 15.4 LTS or 14.3 LTS. When loading a delta lake table from Kinesis, I found the delta log checkpoint is in mixing formats like:7616 00000000000003291896.checkpoint.b1c24725-....json
7616 00000000000003291906.checkpoint.873e1b3e-....
Hi team,Kinesis -> delta table raw -> job with trigger=availableNow -> delta table target. The Kinesis->delta table raw is running continuously. The job is daily with trigger=availableNow. The job reads from raw, do some transformation, and run a MER...
Hi team, I'm using trigger=availableNow to read delta table daily. The delta table itself is loaded by structured streaming from kinesis. I noticed there are many offsets under checkpoint, and when the job starting to run to get data from delta table...
Thanks. If the replicated table can have the _commit_version in strict sequence, I can take it as a global ever-incremental col and consume the delta of it (e.g. in batch way) with select * from replicated_tgt where _commit_version > (
selecct la...
Thanks. I tracked there with log but cannot figure out which parts make the 18000 version apply slow. It is the same with CDF if I feed a big range to table_changes function. Any idea on this?
Appreciate for the input. Thanks.We try to use delta table as a streaming sink, so we don't want to control the update frequency for the raw table and target to load it asap. The default checkpointinterval is actually 10. I tried to change it to bigg...