cancel
Showing results for 
Search instead for 
Did you mean: 
Data Engineering
Join discussions on data engineering best practices, architectures, and optimization strategies within the Databricks Community. Exchange insights and solutions with fellow data engineers.
cancel
Showing results for 
Search instead for 
Did you mean: 

Databricks Asset Bundles library dependencies - JAR file

theanhdo
New Contributor III

Hi there,

I have used databricks asset bundles (DAB) to deploy workflows. For each job, I will create a job cluster and install external libraries by specifying libraries in each task, for example:

- task_key: my-task
  job_cluster_key: my-cluster
  notebook_task:
    notebook_path: ../notebooks/my_notebook.ipynb
  libraries:
    - whl: /Workspace${workspace.file_path}/libraries/PyYAML-6.0.whl
    - jar: /Workspace${workspace.file_path}/libraries/mongo-spark-connector_2.12-10.1.1-all.jar

For Python wheel file (whl), I can see the library PyYAML-6.0.whl is installed. However, for JAR file, it failed to install the library mongo-spark-connector_2.12-10.1.1-all.jar. For JAR file, I know that I can install it using Unity Catalog volumes, however I want to install all libraries from workspace.

From this document, it says that To add a JAR file to a job task, in libraries specify a jar mapping for each library to be installed. You can install a JAR from workspace files, Unity Catalog volumes, cloud object storage, or a local file path.
https://docs.databricks.com/en/dev-tools/bundles/library-dependencies.html
However, it is not working for JAR file when I use workspace to install JAR file.

Even when I created a cluster and tried to install JAR file by the following steps, it is not working:
1. Create a cluster using Databricks Runtime Version 14.3 LTS
2. Go to Libraries tab and click Install new button
3. In the popup, select Workspace and navigate to the workspace libraries folder
Then there are 2 libraries PyYAML-6.0.whl and mongo-spark-connector_2.12-10.1.1-all.jar. I can only select whl library while JAR library is not selectable.

Do you know is there any way we can install JAR files from workspace?

2 ACCEPTED SOLUTIONS

Accepted Solutions

Kaniz_Fatma
Community Manager
Community Manager

Hi @theanhdoinstalling JAR files directly from the Databricks workspace is not currently supported.

Limitations of Installing JAR Files from WorkspaceThe key points are:

  1. Workspace Libraries Deprecation: The documentation states that "Workspace libraries have been deprecated and should not be used." Instead, Databricks recommends storing libraries as workspace files or using Unity Catalog volumes.
  2. DBFS Deprecation: The documentation also mentions that storing library files in the DBFS root is deprecated and disabled by default in Databricks Runtime 15.1 and above. This further limits the ability to install JAR files directly from the workspace.
  3. Workspace Availability in Init Scripts: There are indications that the workspace may not be available during the execution of init scripts, which is a common way to install libraries on a cluster. This suggests the workspace may not be a reliable source for installing libraries.

View solution in original post

theanhdo
New Contributor III

Thanks very much @Kaniz_Fatma for your thorough answer.

View solution in original post

2 REPLIES 2

Kaniz_Fatma
Community Manager
Community Manager

Hi @theanhdoinstalling JAR files directly from the Databricks workspace is not currently supported.

Limitations of Installing JAR Files from WorkspaceThe key points are:

  1. Workspace Libraries Deprecation: The documentation states that "Workspace libraries have been deprecated and should not be used." Instead, Databricks recommends storing libraries as workspace files or using Unity Catalog volumes.
  2. DBFS Deprecation: The documentation also mentions that storing library files in the DBFS root is deprecated and disabled by default in Databricks Runtime 15.1 and above. This further limits the ability to install JAR files directly from the workspace.
  3. Workspace Availability in Init Scripts: There are indications that the workspace may not be available during the execution of init scripts, which is a common way to install libraries on a cluster. This suggests the workspace may not be a reliable source for installing libraries.

theanhdo
New Contributor III

Thanks very much @Kaniz_Fatma for your thorough answer.

Join 100K+ Data Experts: Register Now & Grow with Us!

Excited to expand your horizons with us? Click here to Register and begin your journey to success!

Already a member? Login and join your local regional user group! If there isn’t one near you, fill out this form and we’ll create one for you to join!