Hello everyone,
Let me give you some context. I am trying to deploy a Delta Live Table pipeline using Databricks Asset Bundles, which requires a private library hosted in Azure DevOps.
As far as I understand, this can be resolved in three ways:
- Installing it directly in the notebook using %install,... (recommended by Databricks).
- Using init scripts.
- Using cluster policies with init scripts.
The first option works fine. Now, I am testing options 2 and 3, but I am encountering errors in both.
Focusing on option 2 (init scripts), here is what I have done:
I created an init script in a Unity Catalog volume.
I added it to the allowlist to avoid permission errors.
To test its execution, I simplified the script to the following content:
#!/bin/bash
echo "The initialization script is running correctly."
clusters:
- label: default
node_type_id: Standard_DS3_v2
autoscale:
min_workers: 1
max_workers: 1
mode: ENHANCED
spark_env_vars:
PIP_INDEX_URL: xxx
init_scripts:
- volumes:
destination: /Volumes/system_metadata/init_scripts/cluster_dependencies.sh
libraries:
...
The deployment completes successfully, but when executing, I get the following error:
com.databricks.pipelines.common.errors.deployment.DeploymentException: Failed to launch pipeline cluster xxx: Init scripts failed. instance_id: xxx, databricks_error_message: Cluster scoped init script /Volumes/system_metadata/init_scripts/cluster_dependencies... This error is likely due to a misconfiguration in the pipeline. Check the pipeline cluster configuration and associated cluster policy.
The pipeline does not have any associated cluster policy. However, in the compute section, there are several policies created.
resources:
pipelines:
pipeline_dev_x:
name: "x pipeline"
clusters:
- label: default
node_type_id: Standard_DS3_v2
spark_env_vars:
PIP_INDEX_URL: xxx
init_scripts:
- volumes:
destination: /Volumes/system_metadata/init_scripts/cluster_dependencies.sh
autoscale:
min_workers: 1
max_workers: 1
mode: ENHANCED
libraries:
- notebook:
path: xxx
schema: xxx
development: true
channel: PREVIEW
catalog: xxx
deployment:
kind: BUNDLE
metadata_file_path: xxx
- Can these cluster policies be applied automatically to the pipeline without explicitly referencing them?
- I have reviewed the policies, and I don’t see any that could interfere, but has anyone faced this error before?
- Has anyone tackled this issue using init scripts or cluster policies (executing the init script from a policy)?