cancel
Showing results forย 
Search instead forย 
Did you mean:ย 
Data Engineering
Join discussions on data engineering best practices, architectures, and optimization strategies within the Databricks Community. Exchange insights and solutions with fellow data engineers.
cancel
Showing results forย 
Search instead forย 
Did you mean:ย 

Installing Custom Packages on Serverless Compute via Databricks Connect

ganesh_raskar
New Contributor

I have a custom Python package that provides a PySpark DataSource implementation. I'm using Databricks Connect (16.4.10) and need to understand package installation options for serverless compute.

Works: Traditional Compute Cluster

Custom package pre-installed on cluster
spark = DatabricksSession.builder.clusterId("my-cluster-id").getOrCreate()
spark.dataSource.register(MyCustomDataSource)
df = spark.read.format("my_format").load()
Works perfectly

Doesn't Work: Serverless Compute

Custom package not available
spark = DatabricksSession.builder.serverless().getOrCreate()
spark.dataSource.register(MyCustomDataSource)
df = spark.read.format("my_format").load()
Error

What I've Tried

I attempted to use DatabricksEnv().withDependencies():

env = DatabricksEnv().withDependencies(["my-custom-package==0.4.0"])
spark = DatabricksSession.builder.serverless().withEnvironment(env).getOrCreate()

However, based on the documentation, withDependencies() appears to only work for Python UDFs, not for packages that need to be available at the driver or session level for custom DataSource registration.

Questions

  1. Is there a way to install custom packages on serverless compute when using Databricks Connect?

  2. Is support for custom package installation on serverless compute (similar to cluster libraries) on the roadmap?

  3. Are there any workarounds to make custom DataSources work with serverless compute?

Environment

Databricks Connect: 16.4.10
Python: 3.12
Custom package: Installed locally via pip, provides PySpark DataSource V2 API implementation

Additional Context
The custom package works perfectly with serverless environment in a notebook.

Links
https://docs.databricks.com/aws/en/dev-tools/databricks-connect/cluster-config#remote-meth
https://docs.databricks.com/aws/en/dev-tools/databricks-connect/python/udf#base-env

0 REPLIES 0

Join Us as a Local Community Builder!

Passionate about hosting events and connecting people? Help us grow a vibrant local communityโ€”sign up today to get started!

Sign Up Now