It looks like _source_cdc_time is the timestamp for when the CDC transaction occurred in your source system. This would be a good choice for a timestamp column for your watermark, since you would be deduping values according to the time the transacti...
Does your merge_stream function contain any stateful operations, such as aggregation or deduplication logic? If so, your streaming job may be accumulating state in memory over time, which will eventually result in OOM error. If this is the case, you ...
If you haven't already, sign up to receive Databricks marketing emails. Occasionally, Databricks will offer a 50% off certification voucher for attending a webinar.If your company is a Databricks partner, you can submit a request for a 50% voucher.If...
Here's another idea: configure a Personal Compute policy and restrict the inexperienced users from attaching to the shared cluster, Then, only grant unrestricted cluster creation permissions to trusted users.You can override the default personal comp...
From the information you provided, your issue might be resolved by setting a watermark on the streaming dataframe. The purpose of watermarks is to set a maximum time for records to be retained in state. Without a watermark, records in your state will...