cancel
Showing results for 
Search instead for 
Did you mean: 
Data Engineering
Join discussions on data engineering best practices, architectures, and optimization strategies within the Databricks Community. Exchange insights and solutions with fellow data engineers.
cancel
Showing results for 
Search instead for 
Did you mean: 

Use init script for Databricks job cluster via Azure Data Factory

sachamourier
New Contributor III

Hello,

I would like to install some libraries (both public and private) on a job cluster. I am using Azure Data Factory to run my Databricks notebooks and hence would like to use job clusters to run these jobs.

I have passed my init script to the job cluster but sometimes the package installs work, sometimes not, with no real pattern. The workspace paths to my packages well exist and are correctly set up. 

What's wrong ? Is there anything I should check ? Is there another more robust way to do it so that it always works? Since it's not robust, sometimes my libraries are well installed on my job cluster, sometimes not.

I have attached the configuration I am using in Azure Data Factory to use my init script, and also a screenshot of what my init script looks like.adf config for init scriptadf config for init scriptinit scriptinit script

Thank you very much in advance for the help,

Sacha

4 REPLIES 4

Alberto_Umana
Databricks Employee
Databricks Employee

Hi @sachamourier,

What is the failure that it gives you when the init script fails?

Hi @Alberto_Umana , I don’t get any failure. When my notebook gets run on the newly created job cluster, my package imports fail as they have not been installed on my cluster. 

As you can see on the attached images, it looks like it's searching or finding my init script though. Is there another way to do it otherwise ?init script finished JSONinit script finished JSONimports issueimports issuejob cluster event logjob cluster event log

sachamourier
New Contributor III

Hi @Alberto_Umana , do you have a solution for such issue ? 
Thanks a lot for your help,
Sacha

Alberto_Umana
Databricks Employee
Databricks Employee

Hi @sachamourier,

Have you considered using cluster libraries? The behavior you are observing you require additional debugging since init script is installed successfully, can you enable cluster logging and research through the logs: https://docs.databricks.com/aws/en/compute/configure#compute-log-delivery

Also as a test can you run the init via a notebook to ensure it works fine?

Connect with Databricks Users in Your Area

Join a Regional User Group to connect with local Databricks users. Events will be happening in your city, and you won’t want to miss the chance to attend and share knowledge.

If there isn’t a group near you, start one and help create a community that brings people together.

Request a New Group