Databricks Community

User16826994223 · ‎06-17-2021

A kafka topic is having 300 partitions and I see two clusters are running and have the same group id,

will the data be duplicate in my delta bonze layer

sajith_appukutt · ‎06-22-2021

By default, each streaming query generates a unique group ID for reading data ( ensuring it's own its own consumer group ) . In scenarios where you'd want to specify it (authz etc ) , it is not recommended to have two streaming applications specify the same groupid. Spark keeps track of Kafka offsets internally and doesn’t commit any offset.

In any case, for sources that doesn't support exactly once behaviour, with delta you could achieve idempotency via MERGE

Databricks Community

Streaming with Kafka with the same groupid

Connect with Databricks Users in Your Area

Virtual Learning Festival: 9 April - 30 April

Get Started With Lakehouse Architecture | Pass a quiz to earn your certificate completion.

Data + AI Summit 2025 — registration now open!

Databricks DevConnect: Global Community Meetups for Data Engineers

Databricks Community Champion - February 2025 - Stefan Koch