I’m exploring how to handle real-time data for an application and I keep seeing Databricks recommended as a strong option — especially with its support for streaming pipelines, Delta Live Tables, and integrations with various event sources. That said, I’m still trying to understand how practical and efficient it is for real-time use cases compared to other solutions.
For anyone who has used Databricks for real-time or near–real-time app data:
How well does Databricks handle real-time ingestion from sources like Kafka, Kinesis, Event Hubs, or webhooks?
Is it reliable enough for low-latency processing, or is it better suited for micro-batch workloads?
What architecture or components do you typically use (Spark Structured Streaming, Delta Live Tables, Auto Loader, Unity Catalog, etc.)?
Are there any performance tuning tips to keep streaming jobs stable when traffic spikes?
How do you manage schema changes, late-arriving data, or error handling in production pipelines?
If you’ve used other platforms (Flink, Snowflake, Redpanda, etc.), how does Databricks compare for real-time applications?
Any cost-control strategies? I’ve heard that always-on clusters can get expensive fast.
I’m mainly trying to understand whether Databricks is a good fit for powering real-time features in apps — like analytics dashboards, event tracking, personalization, alerts, or recommendation engines — and what I should watch out for if I go down this path.