cancel
Showing results forย 
Search instead forย 
Did you mean:ย 
Community Articles
Dive into a collaborative space where members like YOU can exchange knowledge, tips, and best practices. Join the conversation today and unlock a wealth of collective wisdom to enhance your experience and drive success.
cancel
Showing results forย 
Search instead forย 
Did you mean:ย 

Databricks Container Services now available for Standard Compute - custom Docker images in shared co

szymon_dybczak
Esteemed Contributor III

Databricks has a new/updated feature in Beta: Databricks Container Services for standard compute.

Docs: https://docs.databricks.com/aws/en/compute/custom-containers-standard

With this feature, you can specify a Docker image when creating standard compute, which means custom workload environments can now be used in shared compute scenarios too.

A few things worth to mention:

  1. Requires Standard access mode + DBR 18.3+

  2. Databricks provides a base image - The recommended approach is to extend:FROM databricksruntime/environment:v5-standard

  3. Python dependencies should go into /databricks/python3 Example:RUN /databricks/python3/bin/python -m pip install simplejson This seems important because notebooks, Python wheel jobs, and Python script jobs read from this environment.

  4. Some Dockerfile instructions are ignored - instructions like USER, CMD, ENTRYPOINT, EXPOSE, HEALTHCHECK, SHELL, and STOPSIGNAL are ignored because of how workloads are launched.

  5. Init scripts no longer modify the workload Python environment - this is a big migration point. If your old setup used init scripts to install Python packages, those dependencies now need to move into the Docker image.

  6. Not everything is supported yet - current limitations include no compute-scoped libraries, no private package repositories, and no Databricks Runtime for Machine Learning support.

  7. ECR support uses instance profiles - for Amazon ECR images, authentication is handled through an instance profile with permission to pull the image.

This seems like a pretty important step toward reproducible Databricks environments, especially for teams that want parity between local/dev/prod environments without relying heavily on init scripts or ad-hoc cluster libraries.

0 REPLIES 0