cancel
Showing results for 
Search instead for 
Did you mean: 
Data Engineering
Join discussions on data engineering best practices, architectures, and optimization strategies within the Databricks Community. Exchange insights and solutions with fellow data engineers.
cancel
Showing results for 
Search instead for 
Did you mean: 

Hi all - Matt Jones here, I’m on the Data Streaming team at Databricks and wanted to share a few takeaways from last week’s Current 2022 data streamin...

mattjones
New Contributor II

Hi all - Matt Jones here, I’m on the Data Streaming team at Databricks and wanted to share a few takeaways from last week’s Current 2022 data streaming event (formerly Kafka Summit) in Austin.

Current 2022 Banner ImageBy far the most common question we got at the booth was how/why customers would use Kafka/Confluent and Databricks together. A popular use case is to aggregate streaming events through a Kafka-based collector system, then send that event stream into a Databricks streaming pipeline (or roll your own with Spark Structured Streaming, if you prefer). Frank Munz’s blog post on this topic is an excellent overview.

In addition to a few of the sessions we had at the event, our head of streaming Karthik Ramasamy hosted a meetup that delved into the details of Project Lightspeed, our nextgen Structured Streaming work. As you may know, the meetup format is a great way to get into more conversational depth than a breakout session affords - for example, one of Karthik’s former students at UC Berkeley was getting into the details of how we handle async state checkpointing for low-latency pipelines.

I also had some productive dialogue around what Databricks users want from streaming - low latency is obviously a desirable outcome, but it must be balanced against cost and accuracy (given windowing considerations, late arriving data, etc). Then of course there are scale/throughput considerations. I’d love to hear how your organizations/teams approach this tradeoff.

The ubiquity of streaming use cases was my big takeaway from Current 2022. Performant streaming architecture isn’t a cutting edge set of use cases reserved for high tech; it’s really becoming a democratized practice for everyone from grocery stores to the public sector.

If you were at Current, what was the most impactful/interesting thing you got from the event? If you weren’t able to join us this year, please do add your voice - what’s on your data streaming wish list for the next year?

0 REPLIES 0

Connect with Databricks Users in Your Area

Join a Regional User Group to connect with local Databricks users. Events will be happening in your city, and you won’t want to miss the chance to attend and share knowledge.

If there isn’t a group near you, start one and help create a community that brings people together.

Request a New Group