cancel
Showing results for 
Search instead for 
Did you mean: 
Data Engineering
Join discussions on data engineering best practices, architectures, and optimization strategies within the Databricks Community. Exchange insights and solutions with fellow data engineers.
cancel
Showing results for 
Search instead for 
Did you mean: 

Running jar on Databricks cluster from Airflow

ayush19
New Contributor III

Hello,

I have a jar file which is installed on a cluster. I need to run this jar from Airflow using DatabricksSubmitRunOperator. I followed the standard instructions as available on Airflow docs

https://airflow.apache.org/docs/apache-airflow-providers-databricks/1.0.0/operators.html

There is a parameter which is passed to the Operator, "libraries" which is supposed to contain the path to jar file. Since the jar file is already installed on the cluster, I don't wish to provide any specific path to jar. I tried few things but everything seems to be failing

1. Did not include libraries parameter - Failed with an error that it is required

ayush19_0-1722491889219.png

 

2. Added libraries parameter but kept it empty - Failed with an error that it needs some value

ayush19_1-1722491926724.png

 

3. Added path to jar file where it is stored - Failed with an error because it tried to install the jar to cluster and the user does not have 'manage' permission to do so

ayush19_2-1722491964523.png

4. Passed 'jar' key but value as empty - Got error "Library installation failed for library due to user error. Error messages:\nJava JARs must be stored in UC Volumes, dbfs, s3, adls, gs or as a workspace file/local file. Make sure the URI begins with 'dbfs:', 'file:', 's3:', 'abfss:', 'gs:', 'wasbs:', '/Volumes', or '/Workspace'but the URI is ''

ayush19_3-1722492023707.png

What should I do so that I can run the jar which is already installed on the cluster? Is there any dummy value I can use instead of mentioning jar file path? 

Again, the actual jar which I want to use is already installed on the cluster and do not actually want to install anything else. I have ran the main class of this jar file from notebook and it ran fine

2 REPLIES 2

Kaniz_Fatma
Community Manager
Community Manager

Hi @ayush19To run a JAR file that is already installed on a Databricks cluster using the DatabricksSubmitRunOperator in Airflow, you must provide the libraries parameter, even if the JAR is already installed. Unfortunately, there is no way to bypass this requirement directly.

ayush19
New Contributor III

Thank you for the reply. Is there any other way to run the main class of jar file from Airflow? using any particular APIs?

Connect with Databricks Users in Your Area

Join a Regional User Group to connect with local Databricks users. Events will be happening in your city, and you won’t want to miss the chance to attend and share knowledge.

If there isn’t a group near you, start one and help create a community that brings people together.

Request a New Group