cancel
Showing results for 
Search instead for 
Did you mean: 
Data Engineering
Join discussions on data engineering best practices, architectures, and optimization strategies within the Databricks Community. Exchange insights and solutions with fellow data engineers.
cancel
Showing results for 
Search instead for 
Did you mean: 

Installing Custom Packages on Serverless Compute via Databricks Connect

ganesh_raskar
New Contributor II

I have a custom Python package that provides a PySpark DataSource implementation. I'm using Databricks Connect (16.4.10) and need to understand package installation options for serverless compute.

Works: Traditional Compute Cluster

Custom package pre-installed on cluster
spark = DatabricksSession.builder.clusterId("my-cluster-id").getOrCreate()
spark.dataSource.register(MyCustomDataSource)
df = spark.read.format("my_format").load()
Works perfectly

Doesn't Work: Serverless Compute

Custom package not available
spark = DatabricksSession.builder.serverless().getOrCreate()
spark.dataSource.register(MyCustomDataSource)
df = spark.read.format("my_format").load()
Error

What I've Tried

I attempted to use DatabricksEnv().withDependencies():

env = DatabricksEnv().withDependencies(["my-custom-package==0.4.0"])
spark = DatabricksSession.builder.serverless().withEnvironment(env).getOrCreate()

However, based on the documentation, withDependencies() appears to only work for Python UDFs, not for packages that need to be available at the driver or session level for custom DataSource registration.

Questions

  1. Is there a way to install custom packages on serverless compute when using Databricks Connect?

  2. Is support for custom package installation on serverless compute (similar to cluster libraries) on the roadmap?

  3. Are there any workarounds to make custom DataSources work with serverless compute?

Environment

Databricks Connect: 16.4.10
Python: 3.12
Custom package: Installed locally via pip, provides PySpark DataSource V2 API implementation

Additional Context
The custom package works perfectly with serverless environment in a notebook.

Links
https://docs.databricks.com/aws/en/dev-tools/databricks-connect/cluster-config#remote-meth
https://docs.databricks.com/aws/en/dev-tools/databricks-connect/python/udf#base-env

5 REPLIES 5

Hubert-Dudek
Databricks MVP

Just put the wheel on volume and add it to the environment?


My blog: https://databrickster.medium.com/

ganesh_raskar
New Contributor II
Thank you for your comment. I did try this as well, but no luck.

 

Uploaded the wheel package in the volume (read access to all users), lets say at this path:
/Volumes/main/my@my.com/test/dice-4.0.0-py3-none-any.whl

 

```python
volumes_deps = ["dbfs:/Volumes/main/my@my.com/test/dice-4.0.0-py3-none-any.whl"]
env = DatabricksEnv().withDependencies(volumes_deps)
spark = DatabricksSession.builder.serverless().withEnvironment(env).getOrCreate()
```
Also, noticed something strange in DatabricksEnv source code, it only verifies libraries installed in local virtual environment, I may be wrong here at interpreting the source code. 

Btw, I have been following your blog for some time now, thoroughly enjoy reading the articles.

Hi @ganesh_raskar  - Can you try to do pip install in a notebook shell first and then used the library, you can give a try on this. what is the package you want to install, please provide that detail, I will give  try.

Regards - San

Sanjeeb Mohapatra

ganesh_raskar
New Contributor II

@Sanjeeb2024 It works perfectly fine in notebook by either installing !pip install or pre-install on serverless environment that notebook is attached to. 

It just that with spark connect with serverless compute, I don't see an option to install it. 

I also tried configuring default workspace serverless environment but that applies to notebook and jobs. It does not apply to spark connect sessions. 

Sanjeeb2024
Contributor II

Hi @ganesh_raskar - If you can provide which custom package and exact code and error, I can try to replicate at my end and explore the suitable option. 

Sanjeeb Mohapatra

Join Us as a Local Community Builder!

Passionate about hosting events and connecting people? Help us grow a vibrant local community—sign up today to get started!

Sign Up Now