cancel
Showing results forย 
Search instead forย 
Did you mean:ย 
Data Engineering
Join discussions on data engineering best practices, architectures, and optimization strategies within the Databricks Community. Exchange insights and solutions with fellow data engineers.
cancel
Showing results forย 
Search instead forย 
Did you mean:ย 

Move on DLT Pipelines or CDF Delta Tables?

jeremy98
Contributor

Hello Community,

I have a basic question that Iโ€™ve been thinking about lately. Is it better to use DLT Pipelines or CDF Delta Tables for handling a medallion architecture?

I understand that DLT Pipelines offer some shortcuts, but are they a good choice when queries from the layers in the medallion architecture are sometimes complex?

1 ACCEPTED SOLUTION

Accepted Solutions

Alberto_Umana
Databricks Employee
Databricks Employee

Hi @jeremy98,

When deciding between using Delta Live Tables (DLT) Pipelines and Change Data Feed (CDF) Delta Tables for handling a medallion architecture, there are several factors to consider.

 

DLT Pipelines:

 

  1. Automation and Management: DLT Pipelines offer a declarative ETL framework that simplifies the creation and management of data pipelines. They automatically handle task orchestration, cluster management, monitoring, data quality, and error handling. This can significantly reduce the operational complexity and allow you to focus on delivering high-quality data.
  2. Streaming and Batch Processing: DLT Pipelines support both streaming and batch data processing, making it easier to build and maintain pipelines that need to handle real-time data ingestion and processing.
  3. Medallion Architecture: DLT Pipelines are designed to work seamlessly with the medallion architecture, allowing you to define transformations for Bronze, Silver, and Gold layers with just a few lines of code. This can streamline the implementation of the architecture and ensure that data quality and structure improve as data flows through each layer.
  4. Complex Queries: While DLT Pipelines can handle complex queries, they are optimized for scenarios where the pipeline logic can be expressed declaratively. If your queries are highly complex and require extensive custom logic, you might need to evaluate whether DLT Pipelines can accommodate those needs effectively.

CDF Delta Tables:

  1. Change Data Capture (CDC): CDF Delta Tables are particularly useful for capturing and processing changes in data. This feature allows you to track and apply changes incrementally, which can be beneficial for maintaining up-to-date data in the medallion architecture.
  2. Flexibility: Using CDF Delta Tables gives you more control over the implementation of your data processing logic. This can be advantageous if your queries are complex and require custom handling that might be difficult to express in a declarative framework like DLT.
  3. Integration with Existing Workflows: If you already have established workflows and processes that rely on Delta Tables, integrating CDF might be more straightforward and require fewer changes to your existing setup.

In summary, if you prioritize automation, ease of management, and the ability to handle both streaming and batch data efficiently, DLT Pipelines might be the better choice. However, if your queries are highly complex or you need fine-grained control over change data capture, CDF Delta Tables could be more suitable. Consider your specific requirements and the complexity of your queries when making the decision

View solution in original post

1 REPLY 1

Alberto_Umana
Databricks Employee
Databricks Employee

Hi @jeremy98,

When deciding between using Delta Live Tables (DLT) Pipelines and Change Data Feed (CDF) Delta Tables for handling a medallion architecture, there are several factors to consider.

 

DLT Pipelines:

 

  1. Automation and Management: DLT Pipelines offer a declarative ETL framework that simplifies the creation and management of data pipelines. They automatically handle task orchestration, cluster management, monitoring, data quality, and error handling. This can significantly reduce the operational complexity and allow you to focus on delivering high-quality data.
  2. Streaming and Batch Processing: DLT Pipelines support both streaming and batch data processing, making it easier to build and maintain pipelines that need to handle real-time data ingestion and processing.
  3. Medallion Architecture: DLT Pipelines are designed to work seamlessly with the medallion architecture, allowing you to define transformations for Bronze, Silver, and Gold layers with just a few lines of code. This can streamline the implementation of the architecture and ensure that data quality and structure improve as data flows through each layer.
  4. Complex Queries: While DLT Pipelines can handle complex queries, they are optimized for scenarios where the pipeline logic can be expressed declaratively. If your queries are highly complex and require extensive custom logic, you might need to evaluate whether DLT Pipelines can accommodate those needs effectively.

CDF Delta Tables:

  1. Change Data Capture (CDC): CDF Delta Tables are particularly useful for capturing and processing changes in data. This feature allows you to track and apply changes incrementally, which can be beneficial for maintaining up-to-date data in the medallion architecture.
  2. Flexibility: Using CDF Delta Tables gives you more control over the implementation of your data processing logic. This can be advantageous if your queries are complex and require custom handling that might be difficult to express in a declarative framework like DLT.
  3. Integration with Existing Workflows: If you already have established workflows and processes that rely on Delta Tables, integrating CDF might be more straightforward and require fewer changes to your existing setup.

In summary, if you prioritize automation, ease of management, and the ability to handle both streaming and batch data efficiently, DLT Pipelines might be the better choice. However, if your queries are highly complex or you need fine-grained control over change data capture, CDF Delta Tables could be more suitable. Consider your specific requirements and the complexity of your queries when making the decision

Connect with Databricks Users in Your Area

Join a Regional User Group to connect with local Databricks users. Events will be happening in your city, and you wonโ€™t want to miss the chance to attend and share knowledge.

If there isnโ€™t a group near you, start one and help create a community that brings people together.

Request a New Group