cancel
Showing results for 
Search instead for 
Did you mean: 
Data Engineering
Join discussions on data engineering best practices, architectures, and optimization strategies within the Databricks Community. Exchange insights and solutions with fellow data engineers.
cancel
Showing results for 
Search instead for 
Did you mean: 

DAB Job - Serverless Cluster using configured base environment

slangenborg
New Contributor

I have configured a base serverless environment for my workspace that includes libraries from a private repository

slangenborg_0-1767733286443.png

This base environment has been set to default, and behaves as expected when running notebooks manually in the workspace with Serverless clusters.

Objective:

  • I would like to define and deploy job pipelines with DAB that use serverless clusters
  • I would like those clusters to use the default serverless base environment configured for the workspace

Issue:

  • Currently when I define a serverless job in DAB, the environment during run does not include the desired libraries from the private repository that are meant to be present in the default base environment.
  •  Is this intended behavior, a misconfiguration of the job in the DAB yml, or a feature that has not yet been developed?
3 REPLIES 3

davidmorton
Databricks Employee
Databricks Employee

The default serverless base environment is really only for notebooks, and doesn't apply to pipelines and jobs.

For jobs, you'll have to specify the dependencies in the YAML for the Job DAB. 

https://docs.databricks.com/aws/en/dev-tools/bundles/library-dependencies

slangenborg
New Contributor

When I create a job in the UI (using notebook tasks, I understand base environment is restricted to those) and specify it to use 'Notebook environment', then it succeeds, correctly using the base environment as deault.

Given that, is there no way to replicate that configuration specification in DAB yml? Would seem to be a limitation of DAB rather than any conceptual reason why it shouldn't work

mukul1409
New Contributor II

Hi @slangenborg 

 

According to the official Databricks Jobs REST API documentation, notebook tasks use the notebook environment only implicitly when no environment_key is provided. The API lets you explicitly configure environments only via an environments block and environment_key for serverless compute, and that is exactly what DAB YAML supports.

Because the UI choice called Notebook environment is not part of the public REST API specification, DAB cannot replicate that setting. DAB can only expose fields that exist in the Jobs REST API schema.

See the Jobs API documentation here
https://docs.databricks.com/api/workspace/jobs/create#environments-spec-environment_version

 

Mukul Chauhan