Databricks Community

noorbasha534 · ‎10-10-2024

Dears

I was wondering if anyone has leveraged Databricks as a "pure" data streaming software in place of Confluent, Flink, Kafka etc.

I see the reference architectures placing Databricks on the data processing side mostly once data is made available by Confluent or Flink or Kafka.

Appreciate if you can share your insights.

szymon_dybczak · ‎10-10-2024

Hi @noorbasha534 ,

It depends on what you're asking for. Kafka is primarily a messaging system, optimized for handling high-throughput, distributed message logs. Databricks can read from Kafka as a data source but doesn't replace Kafka's role in message distribution.
But if you're comparing Kafka Streams (which is Kafka offering for stream processing) with Apache Spark Structured Streaming (which Databricks uses for stream processing) then yes, I think Databricks streaming capablilities are top-notch and you can use it instead of Kafka and you'll be happy.

As for Apache Flink, it is known for low-latency, stateful, and complex event processing. If your streaming use case involves complex operations, maybe Flink would be a better choice. But with the intensive development of spark strucutred streaming, this boundary is blurring.

Databricks Community

Databricks as a "pure" data streaming software like Confluent

Community BrickTalk: Using AI to Navigate Unfamiliar Business Data

Solution Accelerator Series | Survival Analysis for Churn and Lifetime Value

DAIS 2026 Speaker Spotlight Series #8 | Fabien Contaminard

Your guide to Data + AI Summit 2026 passes and pricing

🌟 Community Pulse: Your Weekly Roundup! May 11 – 17, 2026