cancel
Showing results for 
Search instead for 
Did you mean: 
Knowledge Sharing Hub
Dive into a collaborative space where members like YOU can exchange knowledge, tips, and best practices. Join the conversation today and unlock a wealth of collective wisdom to enhance your experience and drive success.
cancel
Showing results for 
Search instead for 
Did you mean: 

Library Management via Custom Compute Policies and ADF Job Triggering

SashankKotta
Contributor

This guide is intended for those looking to install libraries on a cluster using a Custom Compute Policy and trigger Databricks jobs from an Azure Data Factory (ADF) linked service. While many users rely on init scripts for library installation, it is recommended to use Custom Compute Policies for this purpose. Custom Compute Policies provide better control and management over library installations and can be configured to ensure compliance with organizational standards.

Follow these steps to achieve this:

  1. Create a Custom Compute Policy:

    • Navigate to the Policies tab: Compute > Policies tab.
    • Create a policy, Navigate to the library and add libraries as per your requirement.
    • Please note, you can install libraries using jar/wheel/Maven co-ordinates.
    • For more details, follow the detailed instructions in the link: Create and manage compute policies.
    • After creating the custom compute policy, you will need to use the policy ID when executing the notebook via Azure Data Factory (ADF).Screenshot 2024-06-16 at 12.34.09 PM.png

       

  2. Trigger a Databricks Notebook from ADF:
    • Use the Custom Compute Policy created in Step 1 to initiate your Databricks notebook via Azure Data Factory (ADF).

    • Using the ADF Linked service, we can trigger a notebook with our custom policy attached to the job/all-purpose cluster. Navigate to the Advanced option in the linked service and update the policy as shown below.Screenshot 2024-06-16 at 12.38.33 PM.png

Summary:

This guide explains how to install libraries on a Databricks cluster using a Custom Compute Policy and trigger Databricks jobs from an Azure Data Factory (ADF) linked service. It details creating a Custom Compute Policy, noting the policy ID, and using this policy ID to execute notebooks via ADF. This method is recommended over using init scripts for library installation.

 

Sashank Kotta
5 REPLIES 5

Sujitha
Community Manager
Community Manager

@SashankKotta Thank you for sharing!

hassan2
New Contributor II

The policyId doesn't seem to be working when using databricks pool instance. I am getting the following error

Cluster validation error: Validation failed for instance_pool_id, the value cannot be present (is {pool_id}); Validation failed for driver_instance_pool_id, the value cannot be present (is {pool_id}); Validation failed for azure_attributes.spot_bid_max_price from pool, the value must be present

That is because your compute policy has all these issues:
1. instance_pool_id might have been set as forbidden please modify that, You need to pass the instance pool id in the compute policy you are using.
2. For driver instance pool id, please pass the driver instance pool id in your compute policy.
3. Please pass the correct value for your spot_bid_max_price in your compute policy.

For instance pools, I would suggest create a separate compute policy and use it.

Sashank Kotta

hassan2
New Contributor II

Thanks, @SashankKotta  for pointing this out. I was able to resolve the instance pool ID issue but spot_bid_max_price is still failing I am getting the following error 

Cluster validation error: Validation failed for azure_attributes.spot_bid_max_price from pool, the value must be present

here is the policy definition in cluster policy 

"azure_attributes.spot_bid_max_price": {
    "type": "unlimited",
    "defaultValue": 100,
    "hidden": false
  },

 

It seems, it is expecting a "value" parameter in your policy something like this.

"azure_attributes.spot_bid_max_price": {
    "type": "unlimited","value": 100,"defaultValue": 100,"hidden": false
  },

If it still fails try to remove the defaultValue parameter in your policy.

Sashank Kotta

Connect with Databricks Users in Your Area

Join a Regional User Group to connect with local Databricks users. Events will be happening in your city, and you won’t want to miss the chance to attend and share knowledge.

If there isn’t a group near you, start one and help create a community that brings people together.

Request a New Group