cancel
Showing results for 
Search instead for 
Did you mean: 
Data Engineering
Join discussions on data engineering best practices, architectures, and optimization strategies within the Databricks Community. Exchange insights and solutions with fellow data engineers.
cancel
Showing results for 
Search instead for 
Did you mean: 

How to access a jar file stored in Databricks Workspace ?

ranged_coop
Valued Contributor II

Hi All,

We have a couple of jars stored in a workspace folder.

We are using init scripts to copy the jars in the workspace to the /databricks/jars path.

The init scripts do not seem to be able to find the files.

The scripts are failing saying the files could not be found.

```bash

#!/bin/bash

cp /Workspace/jars/file_name.jar /databricks/jars/

cp /Workspace/jars/file_name.jar /databricks/databricks-hive/

```

Can you please let me know if this is even possible ?

What is the correct path for a file in a Workspace Folder.

Thank you...

Edit:

  1. Have tried file path with and without /Workspace - both are failing saying file is not available.
  2. I have also tried sleep for up to 2 minutes and the files are still not available.

Would be nice if someone from databricks can confirm if binary files from Workspace such as jars are accessible via init scripts, if yes what would their path be like ?

1 ACCEPTED SOLUTION

Accepted Solutions

ranged_coop
Valued Contributor II

Thank you for sharing the link. It was useful.

It is a little sad seeing that it is not possible having spent so much time analyzing and trying out various options.

I hope this is from a valid source. If so, I hope Databricks would consider adding this option seeing that many (well atleast 2 :)) are expecting this feature. Using CLI and API would just complicate things and not that practical.

View solution in original post

21 REPLIES 21

-werners-
Esteemed Contributor III

that seems to be ok. Probably the file system is not yet mounted when you do the copy operations.

What is your use case of copying jars in an init script? There might be alternatives.

ranged_coop
Valued Contributor II

Thank you for your response...

We just have an inbuilt jar file that we have in DBFS and move to above mentioned paths as part of the init scripts...

As part of the process to move the init scripts to a workspace location, we are just trying to see if we can have the jars organized in Workspace and use similar init script to move it...We also planned on having a few files copied similarly...

Follow up Questions:

  1. Should the workspace file path begin with `/Workspace/jars/file_name.jar` or just `/jars/file_name.jar`
  2. What would be the best option to work around the file system delay ? is it possible to check if it is up or a sleep command ?

-werners-
Esteemed Contributor III
  1. /Workspace is a directory (you can check by using the %sh magic command in a notebook)
  2. Well, the way I use my jars is by installing them on a cluster as a library. By then everything is mounted and the jars can be found. My jars reside in the /Filestore/jars directory and do not move or are not being copied.

ranged_coop
Valued Contributor II

Thank you for your response...

  1. The reason for the question is when I mention an init script as part of the cluster config, there the word /Workspace is not used. On the other hand %sh path uses that word.
  2. Currently we also have it placed similarly, but when I need to show the files/jars to someone it is a pain. Showing from the workspace is even easier. Also some arbitrary files can be pushed to git easily from Workspace.

-werners-
Esteemed Contributor III

I see, but isn't it easier to share the source code in git instead of a jar?

ranged_coop
Valued Contributor II

Code is in git only, we have blackbox jars provided by different teams and used as part of the code. These are the ones we currently have in DBFS and plan to move to workspace.

-werners-
Esteemed Contributor III

I still struggle to understand why you copy them, sorry.

A jar is an artifact. You import them and use them in your program. For that you do not need a copy.

If you need to know what is in the jar, go to git and look at the class code.

Probably I am missing something.

ranged_coop
Valued Contributor II

No issues...I will try to explain...

We have several teams...One of them produces a jar file whose logic is black box to us. We only use it. We are not aware of its contents. It is a carry forward from old legacy code.

To use the jar file, we copy it to the drive so that our code can reference the classes inside the jar file. To do that we copy the jar file to the path where databricks copies all the jars - you can see it some environmental variable - I do not remember...

-werners-
Esteemed Contributor III

ok i get it.

is the databricks-cli an option? Because my guess is that Workspace is not available during init script( you do use a cluster-scoped init script right?).

Or you can put the jars to be copied in a /databricks subfolder like /zzjars as /databricks should exist.

ranged_coop
Valued Contributor II

Are you saying the Workspace path will not be available even if I sleep for a minute or so ? Can you please confirm ? This is will save me a lot of effort if it is not going to load. Ideally it should not be the case right since Databricks is proposing that the init scripts should be maintained as part of Workspace meaning Workspace should logically be available first before the init scripts execute right ?

Any idea on the order of availability ?

Driver Node

Mount DBFS

Mount Workspace

Init Scripts

Other Mounts ?

-werners-
Esteemed Contributor III

Hm, you are right. If you need to put init scripts in the workspace, it should be available already.

So it seems that you need to get the path correct.

Can you try with the /Worspace dir in the path (as you mentioned the script does not use it)?

ranged_coop
Valued Contributor II

Have tried it both ways with and without /Workspace...the files do not seem to be available...Also have tried sleeping for up to 2 minutes to see if it is delay in mounting issue...I just think binary files such as jars are not supported, would be nice if someone from databricks can confirm...

-werners-
Esteemed Contributor III

Can you write the directory structure to a file in the init script? like that you can take a look if you are missing a part of the path.

ranged_coop
Valued Contributor II

I did...A very simple init script

Tried with and without Workspace....

```bash

#!/bin/bash

list_of_files=$(ls /Workspace/jars/)

printf "$list_of_files" > /dbfs/FileStore/temp_files/init_script_error.txt

```

In both the cases got the similar error message...

```bash

ls: cannot access '/Workspace/jars/': Invalid argument

```

Connect with Databricks Users in Your Area

Join a Regional User Group to connect with local Databricks users. Events will be happening in your city, and you won’t want to miss the chance to attend and share knowledge.

If there isn’t a group near you, start one and help create a community that brings people together.

Request a New Group