โ05-18-2023 12:22 AM
Please explain with some use cases which show the difference between DLT and dbt.
โ05-18-2023 01:10 AM
โ05-18-2023 01:03 AM
Data build tool (dbt) is a transformation tool that aims to simplify the work of the analytic engineer in the data pipeline workflow. It specifically implements only the Transformation in the ETL process.
whereas Delta Live Tables (DLT) is a framework that makes it easier to design data pipelines and control the data quality. It covers the whole ETL process and is integrated in Databricks
โ05-18-2023 01:05 AM
Use Cases :
Data lineage graph
To make data teams more efficient, a data lineage graph can be used to find problems in the data easier and faster. It also makes it simpler for new members of the team, data analysts or other colleagues to understand the data pipeline.
The data lineage graph includes the source table in the data warehouse, the tables after the different transformations and the dashboard in which the business value of the tables is displayed. It is however not possible to hover above or click on the tables and see more information.
The data lineage graph shows the tables that load data from the data lake and the different tables after transformation. More information about the different tables can be obtained by clicking on the tables.
Both show the tables of the sources and the transformations, but only dbt also shows what the end point of the data is, either if this is a dashboard, application or a data science pipeline.
โ05-18-2023 01:06 AM
Incremental tables
In big data, the amount of data is so vast that it is impossible to load all the data every time a few rows were added, the delays would be enormous. To solve this, incremental tables only loads those extra rows. The transformation that first took a few hours drops now down to a few seconds.
An incremental model creates the whole table the first time it is run and then adapts the SQL code in an incremental run to incrementally transform the data
The incrementally transforming and loading of the data from a data lake to the data warehouse is both possible.
Both can incrementally transform data, but Delta Live Tables can also incrementally load data.
โ05-18-2023 01:08 AM
Detailed logging
For the engineers that maintain the different infrastructures and connections, logging is very important to pinpoint what the error was and more importantly, where a certain error took place. With logging, one can determine exactly when something went wrong very efficiently and action can immediately take place. This way a production line does not experience much or any downtime.
The logs are generated from running a dbt command in its command line interface and are stored in a folder named logs in the project folder.
delta Live Tables
Comparison
Both have an extensive amount of logs, but only Delta Live Tables has real-time updates and a visual interface to follow the process.
โ05-18-2023 01:09 AM
Programming language
In a team, not everyone knows the same programming languages, so the more languages that are supported, the better, because then the programming language that the most people in the tam know already will be chosen. This will speed up the development process, since now a smaller group needs to learn the language.
The only language supported in dbt is SQL. Some jinja needs to be known to create templates.
The notebooks in which the tables are defined can be in SQL and python. A notebook cannot be in SQL and python at the same time, but different notebooks can be in different languages in the same project.
Both support SQL, but only Delta Live Tables also supports python.
Data warehouses
Not every company has every kind of data warehouse at its disposal. To set up a data warehouse specific for a certain problem or tool while they already have a data warehouse is quite excessive. It is worthwhile to look for a tool that can be used for the data warehouse you already have, unless the tool for another data warehouse has a lot of benefits.
Various data warehouses can be used and are supported by dbt, while there are more that are community supported. For example for Databricks, dbt gets its tables from the Databricks hive metastore, but these can also link to various sources.
It is directly integrated into Databricks, so also sources that can be loaded into the Databricks hive metastore can be used.
Both can make use of different data sources such as a data lake, but only dbt can be used in combination with and ran against other data warehouses.
โ05-18-2023 01:10 AM
@Prachi Sankhalaโ These are some of the use cases
โ05-23-2023 01:38 AM
Hi @Prachi Sankhalaโ
Thank you for posting your question in our community! We are happy to assist you.
To help us provide you with the most accurate information, could you please take a moment to review the responses and select the one that best answers your question?
This will also help other community members who may have similar questions in the future. Thank you for your participation and let us know if you need any further assistance!
Join a Regional User Group to connect with local Databricks users. Events will be happening in your city, and you wonโt want to miss the chance to attend and share knowledge.
If there isnโt a group near you, start one and help create a community that brings people together.
Request a New Group