For Databricks serverless compute jobs using Asset Bundles, custom dependencies (such as Python packages or wheel files) cannot be pre-installed on shared serverless infrastructure across job tasks as you can with traditional job clusters. Instead, dependencies must be installed within the notebook itself, typically using the %pip magic command or through the notebook's "Environment" panel. This process is repeated for each serverless notebook task run, which can increase runtime if many parallel tasks are launched.โ
Installing Dependencies: Serverless Compute
-
You can specify dependencies in your notebook (for example, %pip install /Workspace/path/to/wheel.whl), with the source either in workspace files, Unity Catalog volumes, or cloud storage.โ
-
With Databricks Asset Bundles, you can declare library dependencies in your bundle's configuration, but for serverless notebook tasks, you must fall back to notebook-scoped installation; specifying an environment or libraries block in the bundle for notebook tasks using serverless compute is not currently supported.โ
Avoiding Repeated Installs
-
On serverless compute, each notebook task runs in a dedicated, ephemeral environment. So, dependencies are not persisted between runs or across tasks. This means every serverless notebook task will execute the install, and there is no way to ensure dependencies are installed just once per jobโunlike a classic or shared job cluster, where libraries persist for all tasks.โ
-
You can reduce repeated installs by making your custom libraries available via workspace files, Unity Catalog volumes, or cloud storage to speed up local installation and avoid network delays.โ
-
If minimizing install time is critical, consider using a "shared job cluster" instead of serverless compute, as libraries on clusters are persisted for job duration and shared among all tasks.โ
Using Workspace or Volume Libraries
-
You can reference wheel/package files stored in workspace files or Unity Catalog volumes to avoid downloading from the internet for each install, which can be faster and more stable.โ
-
Still, the install step is needed per notebook task per serverless job run since environments are isolated and do not persist between job executions.โ
Summary Table
| Environment Type |
Dependency Installation Persistence |
Recommended Approach |
| Serverless Compute |
Not persisted between tasks/runs |
%pip/install in each notebookโ |
| Shared Job Cluster |
Persists for job/cluster duration |
Attach libraries to cluster/jobโ |
In summary, with Databricks serverless compute, there is currently no supported mechanism to avoid per-task dependency installation for notebook tasks; dependencies must be installed at runtime, and a cluster setup is necessary for persistent shared library installs.โ