cancel
Showing results for 
Search instead for 
Did you mean: 
Community Platform Discussions
Connect with fellow community members to discuss general topics related to the Databricks platform, industry trends, and best practices. Share experiences, ask questions, and foster collaboration within the community.
cancel
Showing results for 
Search instead for 
Did you mean: 

Structured Streaming and Delta Live Tables

Salmiakki
New Contributor

Structured Streaming and Delta Live Tables.

Structured Streaming: Structured Streaming is a stream processing engine built on Apache Spark that provides high-level, declarative APIs for processing and analyzing continuous data streams. It allows developers to treat streaming data as a series of structured data frames or tables, enabling seamless integration with batch processing and traditional SQL queries. Structured Streaming provides fault-tolerant and exactly-once processing semantics, ensuring data reliability and consistency. It supports various data sources and sinks, including files, Kafka, and more. With Structured Streaming, developers can write continuous queries that update results as new data arrives, enabling real-time analytics and insights.

Delta Live Tables: Delta Live Tables is an extension to Delta Lake, which is an open-source storage layer built on top of Apache Spark for reliable and scalable data lakes. Delta Live Tables provides a high-level API for building real-time, collaborative, and event-driven applications on top of Delta Lake. It allows developers to create tables that represent streaming data and automatically tracks changes to those tables. Delta Live Tables supports both streaming data and batch data, enabling the ability to mix real-time and batch processing. It provides capabilities like transactional writes, schema evolution, time travel, and data versioning for managing and processing data in a reliable and scalable manner.

I think Delta Live tables is unique and good use case, when people wanted to start from the scratch for a project by having a lot of dependant pipelines, then it's a really important feature to consider.

The actual use cases are if there are IoT data coming to the organization and there are constantly changing datasets, when you wanted to combine multiple other datasets, this leaves a situation to dependany pipelines where you wanted to maintain the previous workloads and then start the downstream systems. Also because it is more evolving due to IoT - constant feeding of the data, we need to have streaming done in a batch fashion, ie., Delta Live Tables comes to the picture.

0 REPLIES 0

Connect with Databricks Users in Your Area

Join a Regional User Group to connect with local Databricks users. Events will be happening in your city, and you won’t want to miss the chance to attend and share knowledge.

If there isn’t a group near you, start one and help create a community that brings people together.

Request a New Group