cancel
Showing results for 
Search instead for 
Did you mean: 
Data Engineering
Join discussions on data engineering best practices, architectures, and optimization strategies within the Databricks Community. Exchange insights and solutions with fellow data engineers.
cancel
Showing results for 
Search instead for 
Did you mean: 

Creating notebooks which work on both normal databricks jobs as well as dlt pipeline

ashraf1395
Valued Contributor

We are working on automation of our databricks ingestion. 
We want to make our python scrips or notebooks such that they work on both databricks jobs and dlt pipelines.
When i say databricks jobs it means normal run without dlt pipeline.

How shall we work on it. Any resources or ideas. 
Some of my thoughts 
- wrt tables there are only two types of tables in dlt either streaming or materialized views so if we make sure that in normal conditions as we create same tables it can work out. 
- similarly with some if else block so that we can use single script or notebook with dynamic handling

or shall we create seperate notebooks for both.

How shall we run the databricks jobs
- separate dlt pipeline runs or
- inside databricks jobs by providing the dlt pipeline id.




1 ACCEPTED SOLUTION

Accepted Solutions

Alberto_Umana
Databricks Employee
Databricks Employee

Hello @ashraf1395,

To address your goal of creating Python scripts or notebooks that work both in Databricks Jobs and Delta Live Tables (DLT) pipelines, here are some ideas:

  1. Unified Script Approach:
    • Table Creation: As you mentioned, DLT supports two types of tables: streaming and materialized views. Ensure that your scripts create tables in a way that is compatible with both DLT and regular Databricks Jobs. This might involve using conditional logic to handle the differences in table creation.
    • Conditional Logic: Use if-else blocks to dynamically handle the execution context. For example, you can check if the script is running within a DLT pipeline or a regular job and adjust the logic accordingly.
  2. Separate Notebooks:
    • If the logic becomes too complex or if there are significant differences in how the scripts should behave in DLT versus regular jobs, consider maintaining separate notebooks for each use case. This can help keep the code clean and maintainable.
  3. Running Databricks Jobs:
    • Separate DLT Pipeline Runs: You can run DLT pipelines separately from regular Databricks jobs. This approach allows you to leverage the specific features and optimizations of DLT for data ingestion and transformation.
    • Inside Databricks Jobs: Alternatively, you can trigger DLT pipelines from within Databricks jobs by providing the DLT pipeline ID. This can be done using the Databricks REST API or CLI to start the pipeline as part of your job workflow.

View solution in original post

1 REPLY 1

Alberto_Umana
Databricks Employee
Databricks Employee

Hello @ashraf1395,

To address your goal of creating Python scripts or notebooks that work both in Databricks Jobs and Delta Live Tables (DLT) pipelines, here are some ideas:

  1. Unified Script Approach:
    • Table Creation: As you mentioned, DLT supports two types of tables: streaming and materialized views. Ensure that your scripts create tables in a way that is compatible with both DLT and regular Databricks Jobs. This might involve using conditional logic to handle the differences in table creation.
    • Conditional Logic: Use if-else blocks to dynamically handle the execution context. For example, you can check if the script is running within a DLT pipeline or a regular job and adjust the logic accordingly.
  2. Separate Notebooks:
    • If the logic becomes too complex or if there are significant differences in how the scripts should behave in DLT versus regular jobs, consider maintaining separate notebooks for each use case. This can help keep the code clean and maintainable.
  3. Running Databricks Jobs:
    • Separate DLT Pipeline Runs: You can run DLT pipelines separately from regular Databricks jobs. This approach allows you to leverage the specific features and optimizations of DLT for data ingestion and transformation.
    • Inside Databricks Jobs: Alternatively, you can trigger DLT pipelines from within Databricks jobs by providing the DLT pipeline ID. This can be done using the Databricks REST API or CLI to start the pipeline as part of your job workflow.

Connect with Databricks Users in Your Area

Join a Regional User Group to connect with local Databricks users. Events will be happening in your city, and you won’t want to miss the chance to attend and share knowledge.

If there isn’t a group near you, start one and help create a community that brings people together.

Request a New Group