cancel
Showing results forย 
Search instead forย 
Did you mean:ย 
Data Engineering
Join discussions on data engineering best practices, architectures, and optimization strategies within the Databricks Community. Exchange insights and solutions with fellow data engineers.
cancel
Showing results forย 
Search instead forย 
Did you mean:ย 

Could jobs do everything delta live tables do ?

xiangzhu
Contributor III

Hello,

I've read the posts:

Jobs - Delta Live tables difference (databricks.com)

and

Difference between Delta Live Tables and Multitask Jobs (databricks.com)

My understanding is that delta live tables are more like a DSL that simplfies the workflow definition (json instead of code).

Could you please confirme jobs can do everthing that delta live tables do, but not vice versa ?

3 REPLIES 3

LandanG
Databricks Employee
Databricks Employee

Hi @Xiang ZHUโ€‹ ,

DLT is a declarative way (either SQL or Python) to build data pipelines in Databricks that uses Delta tables for each stage in the pipeline and has many features and benefits that running ETL pipelines in a notebook might not have. Jobs are a way to orchestrate tasks in Databricks that may include DLT pipelines and much more.

So while you can use jobs to schedule a DLT pipeline, they don't replace each other. Jobs won't be able to do what DLT does and DLT won't be able to do what Jobs does.

Jobs docs: https://docs.databricks.com/workflows/jobs/jobs.html

DLT docs: https://docs.databricks.com/workflows/delta-live-tables/index.html

xiangzhu
Contributor III

@Landan Georgeโ€‹ 

"Jobs won't be able to do what DLT does",

I read some blogs, and watched some videos too, but I still cannot figure out the difference between jobs vs DLT. Does it mean without Databricks DLT, Databricks jobs cannot handle delta tables ?

Could you please spotlight concretly what DLT can do but jobs can't ? Just some of them is enough.

LandanG
Databricks Employee
Databricks Employee

@Xiang ZHUโ€‹ 

From the docs above:

Delta Live Tables is a framework for building reliable, maintainable, and testable data processing pipelines. You define the transformations to perform on your data, and Delta Live Tables manages task orchestration, cluster management, monitoring, data quality, and error handling.

Instead of defining your data pipelines using a series of separate Apache Spark tasks, Delta Live Tables manages how your data is transformed based on a target schema you define for each processing step. You can also enforce data quality with Delta Live Tables expectations. Expectations allow you to define expected data quality and specify how to handle records that fail those expectations.

A job is a way to run non-interactive code in a Databricks cluster. For example, you can run an extract, transform, and load (ETL) workload interactively or on a schedule. You can also run jobs interactively in the notebook UI.

Your job can consist of a single task or can be a large, multi-task workflow with complex dependencies. Databricks manages the task orchestration, cluster management, monitoring, and error reporting for all of your jobs. You can run your jobs immediately or periodically through an easy-to-use scheduling system.

Connect with Databricks Users in Your Area

Join a Regional User Group to connect with local Databricks users. Events will be happening in your city, and you wonโ€™t want to miss the chance to attend and share knowledge.

If there isnโ€™t a group near you, start one and help create a community that brings people together.

Request a New Group