Databricks Community

jeremy98 · ‎02-27-2025

Hi community,

Is there a way to install dependencies inside a notebook task using serveless compute using Databricks Asset Bundle? Is there a way to avoid installing everytime for each serverless task that compose a job the dependencies (or the library that stands in a shared workspace or in a volume obj)?

mark_ott · a month ago

For Databricks serverless compute jobs using Asset Bundles, custom dependencies (such as Python packages or wheel files) cannot be pre-installed on shared serverless infrastructure across job tasks as you can with traditional job clusters. Instead, dependencies must be installed within the notebook itself, typically using the %pip magic command or through the notebook's "Environment" panel. This process is repeated for each serverless notebook task run, which can increase runtime if many parallel tasks are launched.

Installing Dependencies: Serverless Compute

You can specify dependencies in your notebook (for example, %pip install /Workspace/path/to/wheel.whl), with the source either in workspace files, Unity Catalog volumes, or cloud storage.
With Databricks Asset Bundles, you can declare library dependencies in your bundle's configuration, but for serverless notebook tasks, you must fall back to notebook-scoped installation; specifying an environment or libraries block in the bundle for notebook tasks using serverless compute is not currently supported.

Avoiding Repeated Installs

On serverless compute, each notebook task runs in a dedicated, ephemeral environment. So, dependencies are not persisted between runs or across tasks. This means every serverless notebook task will execute the install, and there is no way to ensure dependencies are installed just once per job—unlike a classic or shared job cluster, where libraries persist for all tasks.
You can reduce repeated installs by making your custom libraries available via workspace files, Unity Catalog volumes, or cloud storage to speed up local installation and avoid network delays.
If minimizing install time is critical, consider using a "shared job cluster" instead of serverless compute, as libraries on clusters are persisted for job duration and shared among all tasks.

Using Workspace or Volume Libraries

You can reference wheel/package files stored in workspace files or Unity Catalog volumes to avoid downloading from the internet for each install, which can be faster and more stable.
Still, the install step is needed per notebook task per serverless job run since environments are isolated and do not persist between job executions.

Summary Table

Environment Type	Dependency Installation Persistence	Recommended Approach
Serverless Compute	Not persisted between tasks/runs	%pip/install in each notebook
Shared Job Cluster	Persists for job/cluster duration	Attach libraries to cluster/job

In summary, with Databricks serverless compute, there is currently no supported mechanism to avoid per-task dependency installation for notebook tasks; dependencies must be installed at runtime, and a cluster setup is necessary for persistent shared library installs.