Streaming Kafka data without duplication

Data Engineering

Join discussions on data engineering best practices, architectures, and optimization strategies within the Databricks Community. Exchange insights and solutions with fellow data engineers.

Hello,

We are creating an application to read data from Kafka topic send by a source. After we get the data, we do some transformations and send to other Kafka topic. In this process source may send same data twice.

Our questions are

1. How can we control duplications and only send the updated data to target Kafka topic?

2. Where and what format should we store the data in Databricks to check for duplicates?

Thank You,

Dheeraj

0 REPLIES 0

Photos

Upload Upload
URL URL
Saved Photos Saved Photos

Upload location

Upload location

Add Photos to Album:

New Album

Drag here to start uploading

Drag photos here or

Tap for upload options

You must install or upgrade to the latest version of Adobe Flash Player before you can upload images.