Hi all - Matt Jones here, I’m on the Data Streaming team at Databricks and wanted to share a few takeaways from last week’s Current 2022 data streamin...

mattjones — Mon, 17 Oct 2022 20:00:53 GMT

Hi all - Matt Jones here, I’m on the Data Streaming team at Databricks and wanted to share a few takeaways from last week’s Current 2022 data streaming event (formerly Kafka Summit) in Austin.

By far the most common question we got at the booth was how/why customers would use Kafka/Confluent and Databricks together. A popular use case is to aggregate streaming events through a Kafka-based collector system, then send that event stream into a Databricks streaming pipeline (or roll your own with Spark Structured Streaming, if you prefer). Frank Munz’s blog post on this topic is an excellent overview.

In addition to a few of the sessions we had at the event, our head of streaming Karthik Ramasamy hosted a meetup that delved into the details of Project Lightspeed, our nextgen Structured Streaming work. As you may know, the meetup format is a great way to get into more conversational depth than a breakout session affords - for example, one of Karthik’s former students at UC Berkeley was getting into the details of how we handle async state checkpointing for low-latency pipelines.

I also had some productive dialogue around what Databricks users want from streaming - low latency is obviously a desirable outcome, but it must be balanced against cost and accuracy (given windowing considerations, late arriving data, etc). Then of course there are scale/throughput considerations. I’d love to hear how your organizations/teams approach this tradeoff.

The ubiquity of streaming use cases was my big takeaway from Current 2022. Performant streaming architecture isn’t a cutting edge set of use cases reserved for high tech; it’s really becoming a democratized practice for everyone from grocery stores to the public sector.

If you were at Current, what was the most impactful/interesting thing you got from the event? If you weren’t able to join us this year, please do add your voice - what’s on your data streaming wish list for the next year?

topic Hi all - Matt Jones here, I’m on the Data Streaming team at Databricks and wanted to share a few takeaways from last week’s Current 2022 data streamin... in Data Engineering

Hi all - Matt Jones here, I’m on the Data Streaming team at Databricks and wanted to share a few takeaways from last week’s Current 2022 data streamin...