cancel
Showing results for 
Search instead for 
Did you mean: 
Data Engineering
cancel
Showing results for 
Search instead for 
Did you mean: 

Could jobs do everything delta live tables do ?

xiangzhu
Contributor

Hello,

I've read the posts:

Jobs - Delta Live tables difference (databricks.com)

and

Difference between Delta Live Tables and Multitask Jobs (databricks.com)

My understanding is that delta live tables are more like a DSL that simplfies the workflow definition (json instead of code).

Could you please confirme jobs can do everthing that delta live tables do, but not vice versa ?

3 REPLIES 3

LandanG
Honored Contributor
Honored Contributor

Hi @Xiang ZHU​ ,

DLT is a declarative way (either SQL or Python) to build data pipelines in Databricks that uses Delta tables for each stage in the pipeline and has many features and benefits that running ETL pipelines in a notebook might not have. Jobs are a way to orchestrate tasks in Databricks that may include DLT pipelines and much more.

So while you can use jobs to schedule a DLT pipeline, they don't replace each other. Jobs won't be able to do what DLT does and DLT won't be able to do what Jobs does.

Jobs docs: https://docs.databricks.com/workflows/jobs/jobs.html

DLT docs: https://docs.databricks.com/workflows/delta-live-tables/index.html

xiangzhu
Contributor

@Landan George​ 

"Jobs won't be able to do what DLT does",

I read some blogs, and watched some videos too, but I still cannot figure out the difference between jobs vs DLT. Does it mean without Databricks DLT, Databricks jobs cannot handle delta tables ?

Could you please spotlight concretly what DLT can do but jobs can't ? Just some of them is enough.

LandanG
Honored Contributor
Honored Contributor

@Xiang ZHU​ 

From the docs above:

Delta Live Tables is a framework for building reliable, maintainable, and testable data processing pipelines. You define the transformations to perform on your data, and Delta Live Tables manages task orchestration, cluster management, monitoring, data quality, and error handling.

Instead of defining your data pipelines using a series of separate Apache Spark tasks, Delta Live Tables manages how your data is transformed based on a target schema you define for each processing step. You can also enforce data quality with Delta Live Tables expectations. Expectations allow you to define expected data quality and specify how to handle records that fail those expectations.

A job is a way to run non-interactive code in a Databricks cluster. For example, you can run an extract, transform, and load (ETL) workload interactively or on a schedule. You can also run jobs interactively in the notebook UI.

Your job can consist of a single task or can be a large, multi-task workflow with complex dependencies. Databricks manages the task orchestration, cluster management, monitoring, and error reporting for all of your jobs. You can run your jobs immediately or periodically through an easy-to-use scheduling system.

Welcome to Databricks Community: Lets learn, network and celebrate together

Join our fast-growing data practitioner and expert community of 80K+ members, ready to discover, help and collaborate together while making meaningful connections. 

Click here to register and join today! 

Engage in exciting technical discussions, join a group with your peers and meet our Featured Members.