yesterday
Hello,
I'm trying to set up a DAB job that runs an ML job. For this it would be useful to use a serverless ML environment, that I can select in notebooks. Anyway, I do not find a meaningful way to define the base environment as ML.
I do not want to give the requirements-ML.txt, as I think there would be a larger start-up time. I could not find any useful documentation for it.
I tried something like
environments:
- environment_key: default
spec:
environment_version: "5"
base_environment: "ML"
dependencies:
- ... light dependenciesbut it does expect a yml file on base environment.
Does anybody has a tip?
Thank you,
Daniele
yesterday
Hi @Daniele-T,
Thanks for sharing the exact error. It confirms this is not a quoting or syntax issue on your end.
This is more of a platform limitation, as far as I can tell. In the Jobs/DAB deployment path, base_environment is only accepted when it points to a custom environment spec file stored in Workspace files or a Unity Catalog Volume, for example, /Workspace/Shared/envs/ml-env.yml or /Volumes/my-catalog/my-schema/envs/ml-env.yml. Managed identifiers such as databricks_ml_v5 are rejected at the API layer regardless of the CLI version. I tested this in my own sandbox and reproduced the exact same error you hit, so this is consistent behaviour and not something specific to your setup. The serverless environment docs also note that if the base-environment preview is not enabled in a workspace, jobs expose environment_version rather than base_environment, and the Custom option expects a YAML file path.
So, for a scheduled Python script job deployed via Bundles and the CLI, you have two supported paths. The simpler one is to drop base_environment entirely and use environment_version with a small dependency list directly in your databricks.yml:
environments:
- environment_key: default
spec:
environment_version: "5"
dependencies:
- pandas==2.2.2
- scikit-learn==1.5.1
The second option, if you prefer to keep the environment spec separate from your bundle config, is to put the same spec in a YAML file, upload it to Workspace files or a UC Volume, and reference it via an absolute path:
environments:
- environment_key: default
spec:
base_environment: /Workspace/Users/your-user@company.com/envs/ml-env.yml
One important caveat on the second path. It is not equivalent to getting the Databricks ML runtime. The YAML file resolves to the same environment_version + dependencies mechanism, so you still need to list your packages explicitly. There is no way today to reference databricks_ml_v5 through the YAML path either.
On the startup time concern, it is worth knowing that serverless environments are cached, so after the first cold start, subsequent runs sharing the same dependency fingerprint will not reinstall. For a light dependency set, the overhead is smaller than it might seem.
If your intention is specifically to get the full Databricks ML runtime pre-loaded (MLflow, Delta, the full ML stack) without listing packages, that is not supported in the DAB/CLI deployment path as of now. You would need to add those packages explicitly to your dependencies list for now.
If this answer resolves your question, could you mark it as “Accept as Solution”? That helps other users quickly find the correct fix.
yesterday
Hi @Daniele-T,
ML is the UI label for a Databricks-managed serverless base environment, but in Jobs/DAB you generally need to use the job environment model rather than the notebook picker directly. Databricks documents that the serverless base environment options include Standard, ML, AI, previous versions, Custom (YAML), and workspace environments, and that job tasks are configured through the job environment settings.
If this is a notebook task, the simplest option may be to let the task use the notebook's own environment, because notebook tasks default to Notebook Environment unless you override them with a job-level environment. See Configure the serverless environment.
If you do want to define the environment in DAB, base_environment: "ML" is not the right value. For managed base environments, Databricks uses versioned identifiers, and Databricks-provided ML environments are versioned like databricks_ml_v5. The public environment APIs also describe Databricks-provided ML base environments as workspace-base-environments/databricks_ml_..., for example workspace-base-environments/databricks_ml_v5.
So the configuration should look more like this:
environments:
- environment_key: default
spec:
base_environment: databricks_ml_v5
dependencies:
- ...
In that case, do not also set environment_version in the same spec.
A second thing to check is workspace support. Databricks notes that selecting a managed base environment for jobs is in beta, and that if the workspace does not have that feature enabled, the job configuration shows an Environment version drop-down instead of Base environment. In those workspaces, the "Custom" option expects a YAML file, which matches what you are seeing. See Configure the serverless environment.
So...
Hope this helps.
If this answer resolves your question, could you mark it as “Accept as Solution”? That helps other users quickly find the correct fix.
yesterday
Hello @Ashwin_DSA ,
Thanks for your quick reply. My aim is to use it in a scheduled job (python script) deployed via the cli.
When I tried like you suggested:
environments:
- environment_key: default
spec:
base_environment: databricks_ml_v5with base_environment either quoted or unquoted -- I get this exception:
Error: cannot update job: Invalid base environment for 'default'. Only custom base environments (Workspace or Volume absolute paths ending with '.yaml' or '.yml') are currently supported.I'm currently deploying via databricks cli=1.2.1
yesterday
Hi @Daniele-T,
Thanks for sharing the exact error. It confirms this is not a quoting or syntax issue on your end.
This is more of a platform limitation, as far as I can tell. In the Jobs/DAB deployment path, base_environment is only accepted when it points to a custom environment spec file stored in Workspace files or a Unity Catalog Volume, for example, /Workspace/Shared/envs/ml-env.yml or /Volumes/my-catalog/my-schema/envs/ml-env.yml. Managed identifiers such as databricks_ml_v5 are rejected at the API layer regardless of the CLI version. I tested this in my own sandbox and reproduced the exact same error you hit, so this is consistent behaviour and not something specific to your setup. The serverless environment docs also note that if the base-environment preview is not enabled in a workspace, jobs expose environment_version rather than base_environment, and the Custom option expects a YAML file path.
So, for a scheduled Python script job deployed via Bundles and the CLI, you have two supported paths. The simpler one is to drop base_environment entirely and use environment_version with a small dependency list directly in your databricks.yml:
environments:
- environment_key: default
spec:
environment_version: "5"
dependencies:
- pandas==2.2.2
- scikit-learn==1.5.1
The second option, if you prefer to keep the environment spec separate from your bundle config, is to put the same spec in a YAML file, upload it to Workspace files or a UC Volume, and reference it via an absolute path:
environments:
- environment_key: default
spec:
base_environment: /Workspace/Users/your-user@company.com/envs/ml-env.yml
One important caveat on the second path. It is not equivalent to getting the Databricks ML runtime. The YAML file resolves to the same environment_version + dependencies mechanism, so you still need to list your packages explicitly. There is no way today to reference databricks_ml_v5 through the YAML path either.
On the startup time concern, it is worth knowing that serverless environments are cached, so after the first cold start, subsequent runs sharing the same dependency fingerprint will not reinstall. For a light dependency set, the overhead is smaller than it might seem.
If your intention is specifically to get the full Databricks ML runtime pre-loaded (MLflow, Delta, the full ML stack) without listing packages, that is not supported in the DAB/CLI deployment path as of now. You would need to add those packages explicitly to your dependencies list for now.
If this answer resolves your question, could you mark it as “Accept as Solution”? That helps other users quickly find the correct fix.
yesterday
You can create the environment in the DAB
environments:
- environment_key: default
spec:
environment_version: "5"
dependencies:
- xgboost
# AddYou can also create a file with packages and use it in base environment in the DAB
environment_version: '5'
dependencies:
- xgboost>=2.0.0
# Add librariesenvironments:
- environment_key: default
spec:
base_environment: /Workspace/mlp/env.yaml