Delta Live Tables are the Hot Topic in Data Field, innovation by Databricks. Delta Live Table is a Declarative ETL framework. In ETL two types of ETL frame works are there -
1) procedure ETL 2)Declarative ETL
1)procedure ETL- it involves writing code that explicitly outlines the steps to transform data from source to target. It is a more hands-on approach that requires developers to define each steps of the ETL process. Example-Informatica, Talend, SSIS.
2)Declarative ETL- Declarative ETL is a more abstract approach that focus on defining the desired outcome of the ETL process. In declarative ETL the developer defines the desired end state of the ETL tool automatically generates the code to transform the data into end state. Example-ADF,Aws Glue, DLT.
Main advantage of DLT-
1) version control.
2) Deployment.
3)Data Quality checks.
4)Governance.
5)Delta engine automatically handles the complex task of Data ingestion,Data merging, Schema evaluation also.
6)Use Auto Loader and streaming tables to incrementally land data into the Bronze layer for DLT pipelines or Databricks SQL queries.
In DLT two env modes support -1) Development 2)Production
DLT Pipeline Refresh modes- 1)Continuous 2)Triggered
If the pipeline uses the triggered execution mode, the system stops processing after successfully refreshing all tables or selected tables in the pipeline once, ensuring each table that is part of the update is updated based on the data available when the update started.
If the pipeline uses continuous execution, Delta Live Tables processes new data as it arrives in data sources to keep tables throughout the pipeline fresh.
Here in the below example created one sample DLT Pipeline notebook and it is following the medallion architecture-
1)Ingesting the data into bronze layer (using the autoloader, csv & json file ingestion).
2)Ingesting the data into Silver Layer from Bronze Layer and check constraints to check the data quality and Data cleaning and transformation.
3)Gold layer table preparation and more refined data and sharing to the Bi & ML team.
Disadvantage - DLT supports all tables in one schema/db, suppose you want to create the bronze,silver, golden layer tables in diff schema/db , you cann't implement this thing by DLT. all bronze, silver and golden layer tables you have to create in one db/schema.
Sourav Das