cancel
Showing results forย 
Search instead forย 
Did you mean:ย 
Data Engineering
Join discussions on data engineering best practices, architectures, and optimization strategies within the Databricks Community. Exchange insights and solutions with fellow data engineers.
cancel
Showing results forย 
Search instead forย 
Did you mean:ย 

Creating Cluster configuration with library dependency using DABS

sparklez
New Contributor III

I am trying to create a cluster configuration using DABS and defining library dependencies.

My yaml file looks like this:

 

resources:
clusters:
project_Job_Cluster:
cluster_name: "Project Cluster"
spark_version: "16.3.x-cpu-ml-scala2.12"
node_type_id: "Standard_DS3_v2"
#num_workers: 0
autotermination_minutes: 20
spark_conf:
"spark.databricks.delta.preview.enabled": "true"
"spark.databricks.cluster.profile": "singleNode"
"spark.master": "local[*, 4]"
custom_tags:
"Project": "Project"

libraries:
- pypi:
package: "databricks-sql-connector"
- pypi:
package: "python-docx"

The cluster configuration is recognised and it creates it but without the specified libraries and I get a warning:

"Warning: unknown field: libraries at resources.clusters.project_Job_Cluster in resources/project job cluster.yml:16:7"

Which leads me to conclude the I cannot use "libraries" in cluster config?What is the right way?

 

1 ACCEPTED SOLUTION

Accepted Solutions

lingareddy_Alva
Honored Contributor II

Hi @sparklez 

You're encountering this issue because the libraries field is not valid in the cluster configuration.
Libraries need to be specified at the job level, not the cluster level.

Option 1: Job-Level Libraries (Recommended)
Move the libraries section to your job configuration:

LRALVA_0-1748985240647.png

 


Option 2: All-Purpose Cluster with Libraries:
If you need an all-purpose cluster with libraries, create it separately:

LRALVA_1-1748985305389.png

 


Key Points:
1. Job clusters: Libraries are specified in the job definition
2. All-purpose clusters: Libraries are installed after cluster creation, typically through job definitions that reference the cluster
3. The libraries field is not supported in standalone cluster configurations in Databricks Asset Bundles
The job-level approach (Option 1) is generally preferred as it ensures libraries are installed
when the job runs and provides better isolation.

LR

View solution in original post

3 REPLIES 3

lingareddy_Alva
Honored Contributor II

Hi @sparklez 

You're encountering this issue because the libraries field is not valid in the cluster configuration.
Libraries need to be specified at the job level, not the cluster level.

Option 1: Job-Level Libraries (Recommended)
Move the libraries section to your job configuration:

LRALVA_0-1748985240647.png

 


Option 2: All-Purpose Cluster with Libraries:
If you need an all-purpose cluster with libraries, create it separately:

LRALVA_1-1748985305389.png

 


Key Points:
1. Job clusters: Libraries are specified in the job definition
2. All-purpose clusters: Libraries are installed after cluster creation, typically through job definitions that reference the cluster
3. The libraries field is not supported in standalone cluster configurations in Databricks Asset Bundles
The job-level approach (Option 1) is generally preferred as it ensures libraries are installed
when the job runs and provides better isolation.

LR

Thank you, this was very clear and informative!

lingareddy_Alva
Honored Contributor II

Welcome @sparklez . Happy it is useful. 

LR

Join Us as a Local Community Builder!

Passionate about hosting events and connecting people? Help us grow a vibrant local communityโ€”sign up today to get started!

Sign Up Now