cancel
Showing results forย 
Search instead forย 
Did you mean:ย 
Data Engineering
Join discussions on data engineering best practices, architectures, and optimization strategies within the Databricks Community. Exchange insights and solutions with fellow data engineers.
cancel
Showing results forย 
Search instead forย 
Did you mean:ย 

Serverless Custom Environment Imaging

AlexM
New Contributor

Hi,

I'm looking at moving from job clusters to serverless environments. Ideally to reduce cost and improve start up time.
I can see that it is now possible to specify a custom environment .yaml file - and specify Python packages to be installed.

Is there any mechanism to 'pre-install' or use a container image for the serverless? As these packages can take a while to install. I read that there is some caching going on - but the details are a bit opaque?

Would appreciate any advice,

Thanks,
Alex

 

1 ACCEPTED SOLUTION

Accepted Solutions

Ashwin_DSA
Databricks Employee
Databricks Employee

Hi @AlexM

There isnโ€™t currently a way to bring a pre-built container image into serverless notebooks/jobs. Serverless supports custom environment YAML files and dependency installation/caching, but Databricks Container Services isnโ€™t supported on serverless compute.

So if the goal is to reduce startup time and avoid repeated installs, the best-supported path today is usually to use a workspace-based environment... which is a reusable YAML spec that defines the serverless environment version plus your additional Python packages. Those base environments are pre-built and cached, helping notebooks and jobs start faster.

Also note that Init scripts / compute policies arenโ€™t available on serverless, so environment customisation needs to go through the Environment panel / YAML route rather than cluster bootstrap logic. Databricks says it automatically caches the notebook virtual environment, so reopening an existing notebook usually doesnโ€™t require reinstalling everything again, even after inactivity. That cache behaviour also helps jobs when tasks in the same run share the same dependency set.

For serverless jobs using custom base environments, only the dependencies required for the task are installed at runtime. If your use case needs fully baked images or OS/system-level packages, Iโ€™d probably look at standard or dedicated compute with Databricks Container Services instead of serverless.

If this answer resolves your question, could you mark it as โ€œAccept as Solutionโ€? That helps other users quickly find the correct fix.

Regards,
Ashwin | Delivery Solution Architect @ Databricks
Helping you build and scale the Data Intelligence Platform.
***Opinions are my own***

View solution in original post

1 REPLY 1

Ashwin_DSA
Databricks Employee
Databricks Employee

Hi @AlexM

There isnโ€™t currently a way to bring a pre-built container image into serverless notebooks/jobs. Serverless supports custom environment YAML files and dependency installation/caching, but Databricks Container Services isnโ€™t supported on serverless compute.

So if the goal is to reduce startup time and avoid repeated installs, the best-supported path today is usually to use a workspace-based environment... which is a reusable YAML spec that defines the serverless environment version plus your additional Python packages. Those base environments are pre-built and cached, helping notebooks and jobs start faster.

Also note that Init scripts / compute policies arenโ€™t available on serverless, so environment customisation needs to go through the Environment panel / YAML route rather than cluster bootstrap logic. Databricks says it automatically caches the notebook virtual environment, so reopening an existing notebook usually doesnโ€™t require reinstalling everything again, even after inactivity. That cache behaviour also helps jobs when tasks in the same run share the same dependency set.

For serverless jobs using custom base environments, only the dependencies required for the task are installed at runtime. If your use case needs fully baked images or OS/system-level packages, Iโ€™d probably look at standard or dedicated compute with Databricks Container Services instead of serverless.

If this answer resolves your question, could you mark it as โ€œAccept as Solutionโ€? That helps other users quickly find the correct fix.

Regards,
Ashwin | Delivery Solution Architect @ Databricks
Helping you build and scale the Data Intelligence Platform.
***Opinions are my own***