cancel
Showing results forย 
Search instead forย 
Did you mean:ย 
Data Engineering
cancel
Showing results forย 
Search instead forย 
Did you mean:ย 

Consume 2 kafka topic with different schemas on 1 cluster databricks

Joe1912
New Contributor III

Hi everyone,

I have a concern that is there any way to read stream from 2 different kafka topics with 2 different in 1 jobs or same cluster? or we need to create 2 separate jobs for it ? (Job will need to process continually)

1 ACCEPTED SOLUTION

Accepted Solutions

Kaniz
Community Manager
Community Manager

Hi @Joe1912 , Certainly! Handling multiple Kafka topics with different schemas in a single job can be achieved using various approaches.

Letโ€™s explore some strategies and resources to guide you:

  1. Schema Management:

  2. Spring Kafka:

  3. Custom Mapping:

    • Create a mapping table that associates each topic with its corresponding schema.
    • In your job, dynamically select the schema based on the topic being processed.

Resources:

Remember to adapt these approaches to your specific use case and project requirements. Happy Kafka-ing! ๐Ÿš€๐Ÿƒ

View solution in original post

3 REPLIES 3

Kaniz
Community Manager
Community Manager

Hi @Joe1912 ,

It's certainly reasonable to run a number # of concurrent streams per driver node.

Each .start() consumes a certain amount of driver resources in spark. Your limiting factor will be the load on the driver node and its available resources. 100's of topics running continuously at a high rate would need to be spread across multiple driver nodes [In Databricks there is one driver per cluster]. The advantage of Spark is as you mention, multiple sinks and also a unified batch & streaming apis for transformations.

The other issue will be dealing with the small writes you may end up making to S3 and file consistency. Take a look at delta.io to handle consistent & reliable writes to S3.

Joe1912
New Contributor III

Hi, do you have any tutorial or setup for this case. I'm a little confuse about how we can setup to consume multiple topic with multiple schema for only 1 job run 

Kaniz
Community Manager
Community Manager

Hi @Joe1912 , Certainly! Handling multiple Kafka topics with different schemas in a single job can be achieved using various approaches.

Letโ€™s explore some strategies and resources to guide you:

  1. Schema Management:

  2. Spring Kafka:

  3. Custom Mapping:

    • Create a mapping table that associates each topic with its corresponding schema.
    • In your job, dynamically select the schema based on the topic being processed.

Resources:

Remember to adapt these approaches to your specific use case and project requirements. Happy Kafka-ing! ๐Ÿš€๐Ÿƒ

Welcome to Databricks Community: Lets learn, network and celebrate together

Join our fast-growing data practitioner and expert community of 80K+ members, ready to discover, help and collaborate together while making meaningful connections. 

Click here to register and join today! 

Engage in exciting technical discussions, join a group with your peers and meet our Featured Members.