cancel
Showing results for 
Search instead for 
Did you mean: 
Data Engineering
Join discussions on data engineering best practices, architectures, and optimization strategies within the Databricks Community. Exchange insights and solutions with fellow data engineers.
cancel
Showing results for 
Search instead for 
Did you mean: 

Delta Live Table Pipeline with Multiple Notebooks

Dave_Nithio
Contributor

I have two notebooks created for my Delta Live Table pipeline. The first is a utils notebook with functions I will be reusing for other pipelines. The second contains my actual creation of the delta live tables. I added both notebooks to the pipeline settings

image.pngBut the pipeline fails with the error 'Failed to execute python command for notebook' pointing to the function I created in my utils notebook. Alternatively, I attempted to use a %run magic command to force the utils notebook to run first, but it did not work. I was given the warning that magic commands are not supported. Is there any way to force the Delta Live Table Pipeline to load my utils notebook first so that its functions can be referenced while building the pipeline?

1 ACCEPTED SOLUTION

Accepted Solutions

Vivian_Wilfred
Databricks Employee
Databricks Employee

Hi @Dave Wilson​ %run or dbutils is not supported in DLT. This is intentionally disabled because DLT is declarative and we cannot perform data movement on our own.

To answer your first query, there is unfortunately no option to make the utils notebook run first. The only option is to combine utils and your main notebooks together. This does not address the reusability aspect in DLT and we have raised this feature request with the product team. The engineering team is working internally to address this issue. We can soon expect a feature that would address this usecase.

Thanks.

View solution in original post

6 REPLIES 6

Dave_Nithio
Contributor

Per another question we are unable to use either magic commands or dbutils.notebook.run with the pro level databricks account or Delta Live Tables. Are there any other solutions for utilizing generic functions from other notebooks within a Delta Live Table pipeline?

Vivian_Wilfred
Databricks Employee
Databricks Employee

Hi @Dave Wilson​ %run or dbutils is not supported in DLT. This is intentionally disabled because DLT is declarative and we cannot perform data movement on our own.

To answer your first query, there is unfortunately no option to make the utils notebook run first. The only option is to combine utils and your main notebooks together. This does not address the reusability aspect in DLT and we have raised this feature request with the product team. The engineering team is working internally to address this issue. We can soon expect a feature that would address this usecase.

Thanks.

fecavalc08
New Contributor III

Hi @Vivian Wilfred​ and @Dave Wilson​ we solved our reusability code with repos and pointing the code to our main code:

sys.path.append(os.path.abspath('/Workspace/Repos/[your repo]/[folder with the python scripts'))

from your_class import *

It just works if your reusable code is in python. Also depending on what you want to do we noticed that DLT is always executed as the last piece of the code no matter what is the position in the script

Could you help me with explain this in detail?

Lets i have notebook abc which is reusable and pqr is the one i will be mentioning in dlt pipeline.

how do i call functions from abc pipeline in pqr?

I changed in my notebooks the magic commands using the sys and os library but when I run the code in a cluster it works correctly, but when I do it from the delta live table pipeline it does not work, when I try to see the current directory data it is something different, what additional configuration should I do?

print(os.getcwd()) -> /databricks/driver

JackyL
New Contributor II

Hi Dave,

You can solve this by putting your utils into a python file and referencing your .py file in the DLT notebook. I provided a template for the python file below:

STEP 1:

 

#import functions
from pyspark.sql import SparkSession
import IPython

dbutils = IPython.get_ipython().user_ns["dbutils"]
spark = SparkSession.builder.getOrCreate()

def myfunc1():
test = 1

 

STEP 2: You will need to create a __init__.py file in the same directory your utils.py file lives.

STEP 3:

In your DLT notebook, you'll need to append your sys path and then import your utils file as a library.

 

# set path
import sys
sys.path.append("/Workspace/utils_folder")

# import libraries
import dlt
import my_utils

 

 I suggest to avoid naming your package with existing packages, e,g; pandas as a file name. I also suggest you put your utils file in a separate path from all your other files. This will make appending your path less risky.

Connect with Databricks Users in Your Area

Join a Regional User Group to connect with local Databricks users. Events will be happening in your city, and you won’t want to miss the chance to attend and share knowledge.

If there isn’t a group near you, start one and help create a community that brings people together.

Request a New Group