cancel
Showing results for 
Search instead for 
Did you mean: 
Data Engineering
Join discussions on data engineering best practices, architectures, and optimization strategies within the Databricks Community. Exchange insights and solutions with fellow data engineers.
cancel
Showing results for 
Search instead for 
Did you mean: 

Databricks as a "pure" data streaming software like Confluent

noorbasha534
Valued Contributor

Dears

I was wondering if anyone has leveraged Databricks as a "pure" data streaming software in place of Confluent, Flink, Kafka etc.

I see the reference architectures placing Databricks on the data processing side mostly once data is made available by Confluent or Flink or Kafka.

Appreciate if you can share your insights.

1 REPLY 1

szymon_dybczak
Esteemed Contributor III

Hi @noorbasha534 ,

It depends on what you're asking for. Kafka is primarily a messaging system, optimized for handling high-throughput, distributed message logs. Databricks can read from Kafka as a data source but doesn't replace Kafka's role in message distribution. 
But if you're comparing Kafka Streams (which is Kafka offering for stream processing) with Apache Spark Structured Streaming (which Databricks uses for stream processing)  then yes, I think Databricks streaming capablilities are top-notch and you can use it instead of Kafka and you'll be happy.

 As for Apache Flink, it is known for low-latency, stateful, and complex event processing. If your streaming use case involves complex operations, maybe Flink would be a better choice. But with the intensive development of spark strucutred streaming, this boundary is blurring.