Library Management via Custom Compute Policies and...

SashankKotta · ‎06-16-2024

This guide is intended for those looking to install libraries on a cluster using a Custom Compute Policy and trigger Databricks jobs from an Azure Data Factory (ADF) linked service. While many users rely on init scripts for library installation, it is recommended to use Custom Compute Policies for this purpose. Custom Compute Policies provide better control and management over library installations and can be configured to ensure compliance with organizational standards.

Follow these steps to achieve this:

Create a Custom Compute Policy:
- Navigate to the Policies tab: Compute > Policies tab.
- Create a policy, Navigate to the library and add libraries as per your requirement.
- Please note, you can install libraries using jar/wheel/Maven co-ordinates.
- For more details, follow the detailed instructions in the link: Create and manage compute policies.
- After creating the custom compute policy, you will need to use the policy ID when executing the notebook via Azure Data Factory (ADF).
Trigger a Databricks Notebook from ADF:
- Use the Custom Compute Policy created in Step 1 to initiate your Databricks notebook via Azure Data Factory (ADF).
- Using the ADF Linked service, we can trigger a notebook with our custom policy attached to the job/all-purpose cluster. Navigate to the Advanced option in the linked service and update the policy as shown below.

Summary:

This guide explains how to install libraries on a Databricks cluster using a Custom Compute Policy and trigger Databricks jobs from an Azure Data Factory (ADF) linked service. It details creating a Custom Compute Policy, noting the policy ID, and using this policy ID to execute notebooks via ADF. This method is recommended over using init scripts for library installation.

Sashank Kotta

Library Management via Custom Compute Policies and ADF Job Triggering