cancel
Showing results forย 
Search instead forย 
Did you mean:ย 
Data Engineering
Join discussions on data engineering best practices, architectures, and optimization strategies within the Databricks Community. Exchange insights and solutions with fellow data engineers.
cancel
Showing results forย 
Search instead forย 
Did you mean:ย 

Consume 2 kafka topic with different schemas on 1 cluster databricks

Joe1912
New Contributor III

Hi everyone,

I have a concern that is there any way to read stream from 2 different kafka topics with 2 different in 1 jobs or same cluster? or we need to create 2 separate jobs for it ? (Job will need to process continually)

1 ACCEPTED SOLUTION

Accepted Solutions

Kaniz
Community Manager
Community Manager

Hi @Joe1912 , Certainly! Handling multiple Kafka topics with different schemas in a single job can be achieved using various approaches.

Letโ€™s explore some strategies and resources to guide you:

  1. Schema Management:

  2. Spring Kafka:

  3. Custom Mapping:

    • Create a mapping table that associates each topic with its corresponding schema.
    • In your job, dynamically select the schema based on the topic being processed.

Resources:

Remember to adapt these approaches to your specific use case and project requirements. Happy Kafka-ing! ๐Ÿš€๐Ÿƒ

View solution in original post

3 REPLIES 3

Kaniz
Community Manager
Community Manager

Hi @Joe1912 ,

It's certainly reasonable to run a number # of concurrent streams per driver node.

Each .start() consumes a certain amount of driver resources in spark. Your limiting factor will be the load on the driver node and its available resources. 100's of topics running continuously at a high rate would need to be spread across multiple driver nodes [In Databricks there is one driver per cluster]. The advantage of Spark is as you mention, multiple sinks and also a unified batch & streaming apis for transformations.

The other issue will be dealing with the small writes you may end up making to S3 and file consistency. Take a look at delta.io to handle consistent & reliable writes to S3.

Joe1912
New Contributor III

Hi, do you have any tutorial or setup for this case. I'm a little confuse about how we can setup to consume multiple topic with multiple schema for only 1 job run 

Kaniz
Community Manager
Community Manager

Hi @Joe1912 , Certainly! Handling multiple Kafka topics with different schemas in a single job can be achieved using various approaches.

Letโ€™s explore some strategies and resources to guide you:

  1. Schema Management:

  2. Spring Kafka:

  3. Custom Mapping:

    • Create a mapping table that associates each topic with its corresponding schema.
    • In your job, dynamically select the schema based on the topic being processed.

Resources:

Remember to adapt these approaches to your specific use case and project requirements. Happy Kafka-ing! ๐Ÿš€๐Ÿƒ

Join 100K+ Data Experts: Register Now & Grow with Us!

Excited to expand your horizons with us? Click here to Register and begin your journey to success!

Already a member? Login and join your local regional user group! If there isn’t one near you, fill out this form and we’ll create one for you to join!