06-07-2023 03:52 AM
Hi All,
We have a couple of jars stored in a workspace folder.
We are using init scripts to copy the jars in the workspace to the /databricks/jars path.
The init scripts do not seem to be able to find the files.
The scripts are failing saying the files could not be found.
```bash
#!/bin/bash
cp /Workspace/jars/file_name.jar /databricks/jars/
cp /Workspace/jars/file_name.jar /databricks/databricks-hive/
```
Can you please let me know if this is even possible ?
What is the correct path for a file in a Workspace Folder.
Thank you...
Edit:
Would be nice if someone from databricks can confirm if binary files from Workspace such as jars are accessible via init scripts, if yes what would their path be like ?
06-14-2023 01:09 AM
Thank you for sharing the link. It was useful.
It is a little sad seeing that it is not possible having spent so much time analyzing and trying out various options.
I hope this is from a valid source. If so, I hope Databricks would consider adding this option seeing that many (well atleast 2 :)) are expecting this feature. Using CLI and API would just complicate things and not that practical.
06-07-2023 04:08 AM
that seems to be ok. Probably the file system is not yet mounted when you do the copy operations.
What is your use case of copying jars in an init script? There might be alternatives.
06-07-2023 04:16 AM
Thank you for your response...
We just have an inbuilt jar file that we have in DBFS and move to above mentioned paths as part of the init scripts...
As part of the process to move the init scripts to a workspace location, we are just trying to see if we can have the jars organized in Workspace and use similar init script to move it...We also planned on having a few files copied similarly...
Follow up Questions:
06-07-2023 04:19 AM
06-07-2023 04:25 AM
Thank you for your response...
06-07-2023 04:28 AM
I see, but isn't it easier to share the source code in git instead of a jar?
06-07-2023 04:31 AM
Code is in git only, we have blackbox jars provided by different teams and used as part of the code. These are the ones we currently have in DBFS and plan to move to workspace.
06-07-2023 04:35 AM
I still struggle to understand why you copy them, sorry.
A jar is an artifact. You import them and use them in your program. For that you do not need a copy.
If you need to know what is in the jar, go to git and look at the class code.
Probably I am missing something.
06-07-2023 04:43 AM
No issues...I will try to explain...
We have several teams...One of them produces a jar file whose logic is black box to us. We only use it. We are not aware of its contents. It is a carry forward from old legacy code.
To use the jar file, we copy it to the drive so that our code can reference the classes inside the jar file. To do that we copy the jar file to the path where databricks copies all the jars - you can see it some environmental variable - I do not remember...
06-07-2023 04:55 AM
ok i get it.
is the databricks-cli an option? Because my guess is that Workspace is not available during init script( you do use a cluster-scoped init script right?).
Or you can put the jars to be copied in a /databricks subfolder like /zzjars as /databricks should exist.
06-07-2023 05:10 AM
Are you saying the Workspace path will not be available even if I sleep for a minute or so ? Can you please confirm ? This is will save me a lot of effort if it is not going to load. Ideally it should not be the case right since Databricks is proposing that the init scripts should be maintained as part of Workspace meaning Workspace should logically be available first before the init scripts execute right ?
Any idea on the order of availability ?
Driver Node
Mount DBFS
Mount Workspace
Init Scripts
Other Mounts ?
06-07-2023 05:35 AM
Hm, you are right. If you need to put init scripts in the workspace, it should be available already.
So it seems that you need to get the path correct.
Can you try with the /Worspace dir in the path (as you mentioned the script does not use it)?
06-07-2023 06:14 AM
Have tried it both ways with and without /Workspace...the files do not seem to be available...Also have tried sleeping for up to 2 minutes to see if it is delay in mounting issue...I just think binary files such as jars are not supported, would be nice if someone from databricks can confirm...
06-07-2023 06:17 AM
Can you write the directory structure to a file in the init script? like that you can take a look if you are missing a part of the path.
06-07-2023 06:35 AM
I did...A very simple init script
Tried with and without Workspace....
```bash
#!/bin/bash
list_of_files=$(ls /Workspace/jars/)
printf "$list_of_files" > /dbfs/FileStore/temp_files/init_script_error.txt
```
In both the cases got the similar error message...
```bash
ls: cannot access '/Workspace/jars/': Invalid argument
```
Join a Regional User Group to connect with local Databricks users. Events will be happening in your city, and you won’t want to miss the chance to attend and share knowledge.
If there isn’t a group near you, start one and help create a community that brings people together.
Request a New Group