cancel
Showing results for 
Search instead for 
Did you mean: 
Data Engineering
Join discussions on data engineering best practices, architectures, and optimization strategies within the Databricks Community. Exchange insights and solutions with fellow data engineers.
cancel
Showing results for 
Search instead for 
Did you mean: 

Package in Python Wheel not Importing When Running on Serverless Compute

AP52
New Contributor II

Hi All, 

I am using a python wheel to execute ingestions with Databricks workflows based on entry points in the wheel for each workflow. Included in the .whl file is a separate script named functions.py which includes several functions which get imported for use across the different ingestion scripts. The import in an ingestion script looks like the below import.

 

from apps.functions import some_function

 

The functions import and work correctly when I use a custom cluster for compute. However, when trying to use serverless compute in the workflow the functions don't seem to be imported at all. One example of this is I have a function to append a load date onto a dataframe. When the data is loaded with a compute cluster the load date is appended correctly, but when it is run with serverless compute a load date does not get appended.

 

What am I missing here to make sure the functions file gets imported across the different ingestion entry points?

1 ACCEPTED SOLUTION

Accepted Solutions

AP52
New Contributor II

To close out this thread we found the issue we were having with serverless didn't have to do with our import, but with the isinstance check we are using in our if statements for different functions. In short, serverless uses a different DataFrame type than a compute cluster behind the scenes. This issue is mentioned in the below article:

pyspark.sql.connect.dataframe.DataFrame vs pyspark... - Databricks Community - 71055

View solution in original post

4 REPLIES 4

Alberto_Umana
Databricks Employee
Databricks Employee

Hi @AP52,

How are you importing the wheel package? Is it specified in workflow configuration?

AP52
New Contributor II

Hi @Alberto_Umana,

We have a Wheel dependency in the environment configuration to point to a volume where the .whl file is stored. This is running on version 2 of serverless and the run as user has permissions to access the volume. 

AP52
New Contributor II

Hi @Alberto_Umana , just wanted to bump this thread to see if you had any thoughts. Thanks!

AP52
New Contributor II

To close out this thread we found the issue we were having with serverless didn't have to do with our import, but with the isinstance check we are using in our if statements for different functions. In short, serverless uses a different DataFrame type than a compute cluster behind the scenes. This issue is mentioned in the below article:

pyspark.sql.connect.dataframe.DataFrame vs pyspark... - Databricks Community - 71055

Connect with Databricks Users in Your Area

Join a Regional User Group to connect with local Databricks users. Events will be happening in your city, and you won’t want to miss the chance to attend and share knowledge.

If there isn’t a group near you, start one and help create a community that brings people together.

Request a New Group