<?xml version="1.0" encoding="UTF-8"?>
<rss xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:taxo="http://purl.org/rss/1.0/modules/taxonomy/" version="2.0">
  <channel>
    <title>topic Hi all - Matt Jones here, I’m on the Data Streaming team at Databricks and wanted to share a few takeaways from last week’s Current 2022 data streamin... in Data Engineering</title>
    <link>https://community.databricks.com/t5/data-engineering/hi-all-matt-jones-here-i-m-on-the-data-streaming-team-at/m-p/26904#M18897</link>
    <description>&lt;P&gt;Hi all - Matt Jones here, I’m on the Data Streaming team at Databricks and wanted to share a few takeaways from last week’s &lt;A href="https://2022.currentevent.io/website/39543/welcome" alt="https://2022.currentevent.io/website/39543/welcome" target="_blank"&gt;&lt;U&gt;Current 2022 data streaming event&lt;/U&gt;&lt;/A&gt; (formerly Kafka Summit) in Austin.&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;&lt;span class="lia-inline-image-display-wrapper" image-alt="Current 2022 Banner Image"&gt;&lt;img src="https://community.databricks.com/t5/image/serverpage/image-id/1352iAD53726C7EC68373/image-size/large?v=v2&amp;amp;px=999" role="button" title="Current 2022 Banner Image" alt="Current 2022 Banner Image" /&gt;&lt;/span&gt;By far the most common question we got at the booth was how/why customers would use Kafka/Confluent and Databricks together. A popular use case is to aggregate streaming events through a Kafka-based collector system, then send that event stream into a Databricks streaming pipeline (or roll your own with Spark Structured Streaming, if you prefer). &lt;A href="https://www.databricks.com/blog/2022/08/09/low-latency-streaming-data-pipelines-with-delta-live-tables-and-apache-kafka.html" alt="https://www.databricks.com/blog/2022/08/09/low-latency-streaming-data-pipelines-with-delta-live-tables-and-apache-kafka.html" target="_blank"&gt;&lt;U&gt;Frank Munz’s blog post on this topic&lt;/U&gt;&lt;/A&gt; is an excellent overview.&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;In addition to a few of the &lt;A href="https://www.databricks.com/blog/2022/09/28/databricks-current-2022.html" alt="https://www.databricks.com/blog/2022/09/28/databricks-current-2022.html" target="_blank"&gt;&lt;U&gt;sessions&lt;/U&gt;&lt;/A&gt; we had at the event, our head of streaming Karthik Ramasamy hosted a meetup that delved into the details of &lt;A href="https://www.databricks.com/blog/2022/06/28/project-lightspeed-faster-and-simpler-stream-processing-with-apache-spark.html" alt="https://www.databricks.com/blog/2022/06/28/project-lightspeed-faster-and-simpler-stream-processing-with-apache-spark.html" target="_blank"&gt;&lt;U&gt;Project Lightspeed&lt;/U&gt;&lt;/A&gt;, our nextgen Structured Streaming work. As you may know, the meetup format is a great way to get into more conversational depth than a breakout session affords - for example, one of Karthik’s former students at UC Berkeley was getting into the details of how we handle &lt;A href="https://docs.databricks.com/structured-streaming/async-checkpointing.html" alt="https://docs.databricks.com/structured-streaming/async-checkpointing.html" target="_blank"&gt;&lt;U&gt;async state checkpointing&lt;/U&gt;&lt;/A&gt; for low-latency pipelines.&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;I also had some productive dialogue around what Databricks users want from streaming - low latency is obviously a desirable outcome, but it must be balanced against cost and accuracy (given windowing considerations, late arriving data, etc). Then of course there are scale/throughput considerations. I’d love to hear how your organizations/teams approach this tradeoff.&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;The ubiquity of streaming use cases was my big takeaway from Current 2022. Performant streaming architecture isn’t a cutting edge set of use cases reserved for high tech; it’s really becoming a democratized practice for everyone from grocery stores to the public sector.&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;If you were at Current, what was the most impactful/interesting thing you got from the event? If you weren’t able to join us this year, please do add your voice - what’s on your data streaming wish list for the next year?&lt;/P&gt;&lt;P&gt;&lt;/P&gt;</description>
    <pubDate>Mon, 17 Oct 2022 20:00:53 GMT</pubDate>
    <dc:creator>mattjones</dc:creator>
    <dc:date>2022-10-17T20:00:53Z</dc:date>
    <item>
      <title>Hi all - Matt Jones here, I’m on the Data Streaming team at Databricks and wanted to share a few takeaways from last week’s Current 2022 data streamin...</title>
      <link>https://community.databricks.com/t5/data-engineering/hi-all-matt-jones-here-i-m-on-the-data-streaming-team-at/m-p/26904#M18897</link>
      <description>&lt;P&gt;Hi all - Matt Jones here, I’m on the Data Streaming team at Databricks and wanted to share a few takeaways from last week’s &lt;A href="https://2022.currentevent.io/website/39543/welcome" alt="https://2022.currentevent.io/website/39543/welcome" target="_blank"&gt;&lt;U&gt;Current 2022 data streaming event&lt;/U&gt;&lt;/A&gt; (formerly Kafka Summit) in Austin.&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;&lt;span class="lia-inline-image-display-wrapper" image-alt="Current 2022 Banner Image"&gt;&lt;img src="https://community.databricks.com/t5/image/serverpage/image-id/1352iAD53726C7EC68373/image-size/large?v=v2&amp;amp;px=999" role="button" title="Current 2022 Banner Image" alt="Current 2022 Banner Image" /&gt;&lt;/span&gt;By far the most common question we got at the booth was how/why customers would use Kafka/Confluent and Databricks together. A popular use case is to aggregate streaming events through a Kafka-based collector system, then send that event stream into a Databricks streaming pipeline (or roll your own with Spark Structured Streaming, if you prefer). &lt;A href="https://www.databricks.com/blog/2022/08/09/low-latency-streaming-data-pipelines-with-delta-live-tables-and-apache-kafka.html" alt="https://www.databricks.com/blog/2022/08/09/low-latency-streaming-data-pipelines-with-delta-live-tables-and-apache-kafka.html" target="_blank"&gt;&lt;U&gt;Frank Munz’s blog post on this topic&lt;/U&gt;&lt;/A&gt; is an excellent overview.&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;In addition to a few of the &lt;A href="https://www.databricks.com/blog/2022/09/28/databricks-current-2022.html" alt="https://www.databricks.com/blog/2022/09/28/databricks-current-2022.html" target="_blank"&gt;&lt;U&gt;sessions&lt;/U&gt;&lt;/A&gt; we had at the event, our head of streaming Karthik Ramasamy hosted a meetup that delved into the details of &lt;A href="https://www.databricks.com/blog/2022/06/28/project-lightspeed-faster-and-simpler-stream-processing-with-apache-spark.html" alt="https://www.databricks.com/blog/2022/06/28/project-lightspeed-faster-and-simpler-stream-processing-with-apache-spark.html" target="_blank"&gt;&lt;U&gt;Project Lightspeed&lt;/U&gt;&lt;/A&gt;, our nextgen Structured Streaming work. As you may know, the meetup format is a great way to get into more conversational depth than a breakout session affords - for example, one of Karthik’s former students at UC Berkeley was getting into the details of how we handle &lt;A href="https://docs.databricks.com/structured-streaming/async-checkpointing.html" alt="https://docs.databricks.com/structured-streaming/async-checkpointing.html" target="_blank"&gt;&lt;U&gt;async state checkpointing&lt;/U&gt;&lt;/A&gt; for low-latency pipelines.&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;I also had some productive dialogue around what Databricks users want from streaming - low latency is obviously a desirable outcome, but it must be balanced against cost and accuracy (given windowing considerations, late arriving data, etc). Then of course there are scale/throughput considerations. I’d love to hear how your organizations/teams approach this tradeoff.&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;The ubiquity of streaming use cases was my big takeaway from Current 2022. Performant streaming architecture isn’t a cutting edge set of use cases reserved for high tech; it’s really becoming a democratized practice for everyone from grocery stores to the public sector.&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;If you were at Current, what was the most impactful/interesting thing you got from the event? If you weren’t able to join us this year, please do add your voice - what’s on your data streaming wish list for the next year?&lt;/P&gt;&lt;P&gt;&lt;/P&gt;</description>
      <pubDate>Mon, 17 Oct 2022 20:00:53 GMT</pubDate>
      <guid>https://community.databricks.com/t5/data-engineering/hi-all-matt-jones-here-i-m-on-the-data-streaming-team-at/m-p/26904#M18897</guid>
      <dc:creator>mattjones</dc:creator>
      <dc:date>2022-10-17T20:00:53Z</dc:date>
    </item>
  </channel>
</rss>

