Databricks Community

ashraf1395 · ‎11-10-2024

We are working on automation of our databricks ingestion.
We want to make our python scrips or notebooks such that they work on both databricks jobs and dlt pipelines.
When i say databricks jobs it means normal run without dlt pipeline.

How shall we work on it. Any resources or ideas.
Some of my thoughts
- wrt tables there are only two types of tables in dlt either streaming or materialized views so if we make sure that in normal conditions as we create same tables it can work out.
- similarly with some if else block so that we can use single script or notebook with dynamic handling

or shall we create seperate notebooks for both.

How shall we run the databricks jobs
- separate dlt pipeline runs or
- inside databricks jobs by providing the dlt pipeline id.

Alberto_Umana · ‎11-12-2024

Hello @ashraf1395,

To address your goal of creating Python scripts or notebooks that work both in Databricks Jobs and Delta Live Tables (DLT) pipelines, here are some ideas:

Unified Script Approach:

Table Creation: As you mentioned, DLT supports two types of tables: streaming and materialized views. Ensure that your scripts create tables in a way that is compatible with both DLT and regular Databricks Jobs. This might involve using conditional logic to handle the differences in table creation.
Conditional Logic: Use if-else blocks to dynamically handle the execution context. For example, you can check if the script is running within a DLT pipeline or a regular job and adjust the logic accordingly.

Separate Notebooks:

If the logic becomes too complex or if there are significant differences in how the scripts should behave in DLT versus regular jobs, consider maintaining separate notebooks for each use case. This can help keep the code clean and maintainable.

Running Databricks Jobs:

Separate DLT Pipeline Runs: You can run DLT pipelines separately from regular Databricks jobs. This approach allows you to leverage the specific features and optimizations of DLT for data ingestion and transformation.
Inside Databricks Jobs: Alternatively, you can trigger DLT pipelines from within Databricks jobs by providing the DLT pipeline ID. This can be done using the Databricks REST API or CLI to start the pipeline as part of your job workflow.

View solution in original post

Alberto_Umana · ‎11-12-2024

Hello @ashraf1395,

To address your goal of creating Python scripts or notebooks that work both in Databricks Jobs and Delta Live Tables (DLT) pipelines, here are some ideas:

Unified Script Approach:

Table Creation: As you mentioned, DLT supports two types of tables: streaming and materialized views. Ensure that your scripts create tables in a way that is compatible with both DLT and regular Databricks Jobs. This might involve using conditional logic to handle the differences in table creation.
Conditional Logic: Use if-else blocks to dynamically handle the execution context. For example, you can check if the script is running within a DLT pipeline or a regular job and adjust the logic accordingly.

Separate Notebooks:

If the logic becomes too complex or if there are significant differences in how the scripts should behave in DLT versus regular jobs, consider maintaining separate notebooks for each use case. This can help keep the code clean and maintainable.

Running Databricks Jobs:

Separate DLT Pipeline Runs: You can run DLT pipelines separately from regular Databricks jobs. This approach allows you to leverage the specific features and optimizations of DLT for data ingestion and transformation.
Inside Databricks Jobs: Alternatively, you can trigger DLT pipelines from within Databricks jobs by providing the DLT pipeline ID. This can be done using the Databricks REST API or CLI to start the pipeline as part of your job workflow.

Databricks Community

Creating notebooks which work on both normal databricks jobs as well as dlt pipeline

Photos

Join Us as a Local Community Builder!

Business Intelligence in the Era of AI

🚀 Monthly Databricks Get Started Days – Accelerate Your Learning Journey! 🚀

Databricks Community Champion - March 2025 - Takuya Omi

Get Started With Lakehouse Architecture | Pass a quiz to earn your certificate completion.

Virtual Learning Festival: 9 April - 30 April