Hi team,I want to add some shared libs which might be used by many repos, e.g. some util functions which might be used by any repos.1. What is the recommended way to add those libs? E.g. create a separate repo and reference it in another repo?2. How ...
Hi team,I have a delta table src, and somehow I want to replicate it to another table tgt with CDF, sort of (spark
.readStream
.format("delta")
.option("readChangeFeed", "true")
.table('src')
.writeStream
.format("delta")
...
Hi,If I use dropDuplicates inside foreachBatch, the dropDuplicates will become stateless and no state. It just drop duplicates for the current micro batch so I don't have to specify watermark. Is this true?Thanks
Hi,I'm using runtime 15.4 LTS or 14.3 LTS. When loading a delta lake table from Kinesis, I found the delta log checkpoint is in mixing formats like:7616 00000000000003291896.checkpoint.b1c24725-....json
7616 00000000000003291906.checkpoint.873e1b3e-....
Hi team,Kinesis -> delta table raw -> job with trigger=availableNow -> delta table target. The Kinesis->delta table raw is running continuously. The job is daily with trigger=availableNow. The job reads from raw, do some transformation, and run a MER...
Thanks. If the replicated table can have the _commit_version in strict sequence, I can take it as a global ever-incremental col and consume the delta of it (e.g. in batch way) with select * from replicated_tgt where _commit_version > (
selecct la...
Thanks. I tracked there with log but cannot figure out which parts make the 18000 version apply slow. It is the same with CDF if I feed a big range to table_changes function. Any idea on this?