โ06-03-2025 09:50 AM
I am trying to create a cluster configuration using DABS and defining library dependencies.
My yaml file looks like this:
resources:
clusters:
project_Job_Cluster:
cluster_name: "Project Cluster"
spark_version: "16.3.x-cpu-ml-scala2.12"
node_type_id: "Standard_DS3_v2"
#num_workers: 0
autotermination_minutes: 20
spark_conf:
"spark.databricks.delta.preview.enabled": "true"
"spark.databricks.cluster.profile": "singleNode"
"spark.master": "local[*, 4]"
custom_tags:
"Project": "Project"
libraries:
- pypi:
package: "databricks-sql-connector"
- pypi:
package: "python-docx"
The cluster configuration is recognised and it creates it but without the specified libraries and I get a warning:
"Warning: unknown field: libraries at resources.clusters.project_Job_Cluster in resources/project job cluster.yml:16:7"
Which leads me to conclude the I cannot use "libraries" in cluster config?What is the right way?
โ06-03-2025 02:11 PM - edited โ06-03-2025 02:15 PM
Hi @sparklez
You're encountering this issue because the libraries field is not valid in the cluster configuration.
Libraries need to be specified at the job level, not the cluster level.
Option 1: Job-Level Libraries (Recommended)
Move the libraries section to your job configuration:
Option 2: All-Purpose Cluster with Libraries:
If you need an all-purpose cluster with libraries, create it separately:
Key Points:
1. Job clusters: Libraries are specified in the job definition
2. All-purpose clusters: Libraries are installed after cluster creation, typically through job definitions that reference the cluster
3. The libraries field is not supported in standalone cluster configurations in Databricks Asset Bundles
The job-level approach (Option 1) is generally preferred as it ensures libraries are installed
when the job runs and provides better isolation.
โ06-03-2025 02:11 PM - edited โ06-03-2025 02:15 PM
Hi @sparklez
You're encountering this issue because the libraries field is not valid in the cluster configuration.
Libraries need to be specified at the job level, not the cluster level.
Option 1: Job-Level Libraries (Recommended)
Move the libraries section to your job configuration:
Option 2: All-Purpose Cluster with Libraries:
If you need an all-purpose cluster with libraries, create it separately:
Key Points:
1. Job clusters: Libraries are specified in the job definition
2. All-purpose clusters: Libraries are installed after cluster creation, typically through job definitions that reference the cluster
3. The libraries field is not supported in standalone cluster configurations in Databricks Asset Bundles
The job-level approach (Option 1) is generally preferred as it ensures libraries are installed
when the job runs and provides better isolation.
โ06-06-2025 07:16 AM
Thank you, this was very clear and informative!
โ06-06-2025 08:27 AM
Welcome @sparklez . Happy it is useful.
Passionate about hosting events and connecting people? Help us grow a vibrant local communityโsign up today to get started!
Sign Up Now