cancel
Showing results for 
Search instead for 
Did you mean: 
Data Engineering
Join discussions on data engineering best practices, architectures, and optimization strategies within the Databricks Community. Exchange insights and solutions with fellow data engineers.
cancel
Showing results for 
Search instead for 
Did you mean: 

Is it possible to have Cluster with pre-installed dependencies?

joao_albuquerqu
New Contributor II

I run some jobs in the Databricks environment where some resources need authentication. I do this (and I need to) through the vault-cli in the init-script.

However, every time in the init-script I need to install vault-cli and other libraries. Is there any way I can have them pre-installed somehow? I would like to avoid this installation every time I run a job

2 REPLIES 2

Anonymous
Not applicable

@João Victor Albuquerque​ :

Yes, there are a few ways to pre-install libraries and tools in the Databricks environment:

  1. Cluster-scoped init scripts: You can specify a shell script to be run when a cluster is created or restarted. This script can include commands to install libraries and tools using package managers like pip or apt-get. This way, every time a cluster starts, the required packages will be pre-installed.
  2. Databricks environments: You can create a Databricks environment that includes the required libraries and tools. An environment is a versioned set of libraries, and you can specify the environment to use when creating or starting a cluster. This way, every time a cluster starts, it will have the required environment pre-installed.
  3. Custom container images: You can create a custom Docker container image with the required libraries and tools pre-installed. You can then use this container image as the base image for your Databricks clusters. This way, every time a cluster starts, it will use the custom container image with the required packages pre-installed.

You can choose the approach that best fits your needs and preferences.

I currently use this first option (init-scripts). But my intention is not to need to be installing every time a cluster starts. I wanted one with the libraries already installed in the environment. It seems to me that the 2nd and 3rd option would have that. Is there any documentation for them? Especially the second option. I didn't find anything about it

Connect with Databricks Users in Your Area

Join a Regional User Group to connect with local Databricks users. Events will be happening in your city, and you won’t want to miss the chance to attend and share knowledge.

If there isn’t a group near you, start one and help create a community that brings people together.

Request a New Group