Databricks Community

SashankKotta · ‎06-16-2024

This guide is intended for those looking to install libraries on a cluster using a Custom Compute Policy and trigger Databricks jobs from an Azure Data Factory (ADF) linked service. While many users rely on init scripts for library installation, it is recommended to use Custom Compute Policies for this purpose. Custom Compute Policies provide better control and management over library installations and can be configured to ensure compliance with organizational standards.

Follow these steps to achieve this:

Create a Custom Compute Policy:
- Navigate to the Policies tab: Compute > Policies tab.
- Create a policy, Navigate to the library and add libraries as per your requirement.
- Please note, you can install libraries using jar/wheel/Maven co-ordinates.
- For more details, follow the detailed instructions in the link: Create and manage compute policies.
- After creating the custom compute policy, you will need to use the policy ID when executing the notebook via Azure Data Factory (ADF).
Trigger a Databricks Notebook from ADF:
- Use the Custom Compute Policy created in Step 1 to initiate your Databricks notebook via Azure Data Factory (ADF).
- Using the ADF Linked service, we can trigger a notebook with our custom policy attached to the job/all-purpose cluster. Navigate to the Advanced option in the linked service and update the policy as shown below.

Summary:

This guide explains how to install libraries on a Databricks cluster using a Custom Compute Policy and trigger Databricks jobs from an Azure Data Factory (ADF) linked service. It details creating a Custom Compute Policy, noting the policy ID, and using this policy ID to execute notebooks via ADF. This method is recommended over using init scripts for library installation.

Sashank Kotta

Sujitha · ‎06-25-2024

@SashankKotta Thank you for sharing!

hassan2 · ‎08-07-2024

The policyId doesn't seem to be working when using databricks pool instance. I am getting the following error

Cluster validation error: Validation failed for instance_pool_id, the value cannot be present (is {pool_id}); Validation failed for driver_instance_pool_id, the value cannot be present (is {pool_id}); Validation failed for azure_attributes.spot_bid_max_price from pool, the value must be present

SashankKotta · ‎08-07-2024

That is because your compute policy has all these issues:
1. instance_pool_id might have been set as forbidden please modify that, You need to pass the instance pool id in the compute policy you are using.
2. For driver instance pool id, please pass the driver instance pool id in your compute policy.
3. Please pass the correct value for your spot_bid_max_price in your compute policy.

For instance pools, I would suggest create a separate compute policy and use it.

Sashank Kotta

hassan2 · ‎08-15-2024

Thanks, @SashankKotta for pointing this out. I was able to resolve the instance pool ID issue but spot_bid_max_price is still failing I am getting the following error

Cluster validation error: Validation failed for azure_attributes.spot_bid_max_price from pool, the value must be present

here is the policy definition in cluster policy

"azure_attributes.spot_bid_max_price": {
    "type": "unlimited",
    "defaultValue": 100,
    "hidden": false
},

SashankKotta · ‎08-21-2024

It seems, it is expecting a "value" parameter in your policy something like this.

"azure_attributes.spot_bid_max_price": {
"type": "unlimited","value": 100,"defaultValue": 100,"hidden": false
},

If it still fails try to remove the defaultValue parameter in your policy.

Sashank Kotta

Wojciech_BUK · ‎10-14-2024

Hi @hassan2
I had same issue and found solution.
When I created POOL i created it as On-demand (not spot) and then policy only worked when I removed entire section "azure_attributes.spot_bid_max_price" from policy.
Looks like "azure_attributes.spot_bid_max_price" only works when you create POOL with SPOT instances only.

Databricks Community

Library Management via Custom Compute Policies and ADF Job Triggering

Connect with Databricks Users in Your Area

Databricks Learning Festival (Virtual): 15 January - 31 January 2025

Milestone: DatabricksTV Reaches 100 Videos!

Announcing the new Meta Llama 3.3 model on Databricks

Databricks Community Champion - December 2024 - Sujesh Menon

Dotmatics and Databricks Partner to Advance Scientific Intelligence in Life Sciences