topic Re: Delta Live Table Pipeline with Multiple Notebooks in Data Engineering

Delta Live Table Pipeline with Multiple Notebooks

Dave_Nithio — Mon, 07 Nov 2022 21:53:24 GMT

I have two notebooks created for my Delta Live Table pipeline. The first is a utils notebook with functions I will be reusing for other pipelines. The second contains my actual creation of the delta live tables. I added both notebooks to the pipeline settings

But the pipeline fails with the error 'Failed to execute python command for notebook' pointing to the function I created in my utils notebook. Alternatively, I attempted to use a %run magic command to force the utils notebook to run first, but it did not work. I was given the warning that magic commands are not supported. Is there any way to force the Delta Live Table Pipeline to load my utils notebook first so that its functions can be referenced while building the pipeline?

Re: Delta Live Table Pipeline with Multiple Notebooks

Dave_Nithio — Tue, 08 Nov 2022 14:43:24 GMT

Per another question we are unable to use either magic commands or dbutils.notebook.run with the pro level databricks account or Delta Live Tables. Are there any other solutions for utilizing generic functions from other notebooks within a Delta Live Table pipeline?

Re: Delta Live Table Pipeline with Multiple Notebooks

Vivian_Wilfred — Tue, 08 Nov 2022 19:54:52 GMT

Hi @Dave Wilson %run or dbutils is not supported in DLT. This is intentionally disabled because DLT is declarative and we cannot perform data movement on our own.

To answer your first query, there is unfortunately no option to make the utils notebook run first. The only option is to combine utils and your main notebooks together. This does not address the reusability aspect in DLT and we have raised this feature request with the product team. The engineering team is working internally to address this issue. We can soon expect a feature that would address this usecase.

Thanks.

Re: Delta Live Table Pipeline with Multiple Notebooks

fecavalc08 — Mon, 23 Jan 2023 14:36:18 GMT

Hi @Vivian Wilfred and @Dave Wilson we solved our reusability code with repos and pointing the code to our main code:

sys.path.append(os.path.abspath('/Workspace/Repos/[your repo]/[folder with the python scripts'))

from your_class import *

It just works if your reusable code is in python. Also depending on what you want to do we noticed that DLT is always executed as the last piece of the code no matter what is the position in the script

Re: Delta Live Table Pipeline with Multiple Notebooks

ssudhakarpatil — Tue, 02 May 2023 14:43:21 GMT

Could you help me with explain this in detail?

Lets i have notebook abc which is reusable and pqr is the one i will be mentioning in dlt pipeline.

how do i call functions from abc pipeline in pqr?

Re: Delta Live Table Pipeline with Multiple Notebooks

alexgv12 — Sat, 15 Jun 2024 17:35:42 GMT

I changed in my notebooks the magic commands using the sys and os library but when I run the code in a cluster it works correctly, but when I do it from the delta live table pipeline it does not work, when I try to see the current directory data it is something different, what additional configuration should I do?

print(os.getcwd()) -> /databricks/driver

Re: Delta Live Table Pipeline with Multiple Notebooks

JackyL — Mon, 26 Aug 2024 20:29:48 GMT

Hi Dave,

You can solve this by putting your utils into a python file and referencing your .py file in the DLT notebook. I provided a template for the python file below:

STEP 1:

#import functions from pyspark.sql import SparkSession import IPython dbutils = IPython.get_ipython().user_ns["dbutils"] spark = SparkSession.builder.getOrCreate() def myfunc1(): test = 1

STEP 2: You will need to create a __init__.py file in the same directory your utils.py file lives.

STEP 3:

In your DLT notebook, you'll need to append your sys path and then import your utils file as a library.

# set path import sys sys.path.append("/Workspace/utils_folder") # import libraries import dlt import my_utils

I suggest to avoid naming your package with existing packages, e,g; pandas as a file name. I also suggest you put your utils file in a separate path from all your other files. This will make appending your path less risky.