cancel
Showing results forย 
Search instead forย 
Did you mean:ย 
Data Engineering
cancel
Showing results forย 
Search instead forย 
Did you mean:ย 

Streaming with Kafka with the same groupid

User16826994223
Honored Contributor III

A kafka topic is having 300 partitions and I see two clusters are running and have the same group id,

will the data be duplicate in my delta bonze layer

1 REPLY 1

sajith_appukutt
Honored Contributor II

By default, each streaming query generates a unique group ID for reading data ( ensuring it's own  its own consumer group ) . In scenarios where you'd want to specify it (authz etc ) , it is not recommended to have two streaming applications specify the same groupid. Spark keeps track of Kafka offsets internally and doesnโ€™t commit any offset.

In any case, for sources that doesn't support exactly once behaviour, with delta you could achieve idempotency via MERGE

Welcome to Databricks Community: Lets learn, network and celebrate together

Join our fast-growing data practitioner and expert community of 80K+ members, ready to discover, help and collaborate together while making meaningful connections. 

Click here to register and join today! 

Engage in exciting technical discussions, join a group with your peers and meet our Featured Members.