06-16-2024 12:35 AM
This guide is intended for those looking to install libraries on a cluster using a Custom Compute Policy and trigger Databricks jobs from an Azure Data Factory (ADF) linked service. While many users rely on init scripts for library installation, it is recommended to use Custom Compute Policies for this purpose. Custom Compute Policies provide better control and management over library installations and can be configured to ensure compliance with organizational standards.
Follow these steps to achieve this:
Create a Custom Compute Policy:
Use the Custom Compute Policy created in Step 1 to initiate your Databricks notebook via Azure Data Factory (ADF).
Summary:
This guide explains how to install libraries on a Databricks cluster using a Custom Compute Policy and trigger Databricks jobs from an Azure Data Factory (ADF) linked service. It details creating a Custom Compute Policy, noting the policy ID, and using this policy ID to execute notebooks via ADF. This method is recommended over using init scripts for library installation.
06-25-2024 02:59 AM
@SashankKotta Thank you for sharing!
08-07-2024 07:46 AM
The policyId doesn't seem to be working when using databricks pool instance. I am getting the following error
Cluster validation error: Validation failed for instance_pool_id, the value cannot be present (is {pool_id}); Validation failed for driver_instance_pool_id, the value cannot be present (is {pool_id}); Validation failed for azure_attributes.spot_bid_max_price from pool, the value must be present
08-07-2024 09:39 AM
That is because your compute policy has all these issues:
1. instance_pool_id might have been set as forbidden please modify that, You need to pass the instance pool id in the compute policy you are using.
2. For driver instance pool id, please pass the driver instance pool id in your compute policy.
3. Please pass the correct value for your spot_bid_max_price in your compute policy.
For instance pools, I would suggest create a separate compute policy and use it.
08-15-2024 09:56 AM
Thanks, @SashankKotta for pointing this out. I was able to resolve the instance pool ID issue but spot_bid_max_price is still failing I am getting the following error
Cluster validation error: Validation failed for azure_attributes.spot_bid_max_price from pool, the value must be present
here is the policy definition in cluster policy
"azure_attributes.spot_bid_max_price": {
"type": "unlimited",
"defaultValue": 100,
"hidden": false
},
08-21-2024 03:17 AM
It seems, it is expecting a "value" parameter in your policy something like this.
"azure_attributes.spot_bid_max_price": {
"type": "unlimited","value": 100,"defaultValue": 100,"hidden": false
},
If it still fails try to remove the defaultValue parameter in your policy.
Join a Regional User Group to connect with local Databricks users. Events will be happening in your city, and you won’t want to miss the chance to attend and share knowledge.
If there isn’t a group near you, start one and help create a community that brings people together.
Request a New Group