cancel
Showing results for 
Search instead for 
Did you mean: 
Data Engineering
Join discussions on data engineering best practices, architectures, and optimization strategies within the Databricks Community. Exchange insights and solutions with fellow data engineers.
cancel
Showing results for 
Search instead for 
Did you mean: 

Running a jar on Databricks shared cluster using Airflow

ayush19
New Contributor III

Hello,

I have a requirement to run a jar already installed on a Databricks cluster. It needs to be orchestrated using Apache Airflow. 

I followed the docs for the operator which can be used to do so https://airflow.apache.org/docs/apache-airflow-providers-databricks/1.0.0/operators.html 

The issue is, that every time I run this DAG, the cluster restarts and the jar file is installed on the cluster again. The file is already stored in Volume and installed on cluster, yet it restarts and re-installs jar on cluster. 

How can I avoid this?

2 REPLIES 2

Alberto_Umana
Databricks Employee
Databricks Employee

Hello @ayush19,

Here are some suggestions, but would need to check how are the parameters configured.

Use an Existing Cluster: Instead of creating a new cluster each time, configure the DatabricksSubmitRunOperator to use an existing cluster. This can be done by specifying the existing_cluster_id parameter in the operator. This way, the cluster will not restart, and the jar file will not be reinstalled.

Cluster Configuration: Ensure that the cluster configuration does not force instance replacement upon restart. According to the context, one way to achieve this is by disabling multi-AZ (Availability Zone) selection in the cluster configuration. This can help in reusing the same instances rather than creating new ones

Hi Alberto, 

I am using an existing cluster for it and not creating new cluster. I am using an all purpose cluster and which is used by multiple people in different regions so I'm not sure if I can disable Multi AZ. Is there a solution in which I can use an existing instance of cluster? 
Also if you could please explain why is it restarting exactly? the Jar file is already installed on cluster, then what's the need to install it again?

Connect with Databricks Users in Your Area

Join a Regional User Group to connect with local Databricks users. Events will be happening in your city, and you won’t want to miss the chance to attend and share knowledge.

If there isn’t a group near you, start one and help create a community that brings people together.

Request a New Group