cancel
Showing results for 
Search instead for 
Did you mean: 
Get Started Discussions
Start your journey with Databricks by joining discussions on getting started guides, tutorials, and introductory topics. Connect with beginners and experts alike to kickstart your Databricks experience.
cancel
Showing results for 
Search instead for 
Did you mean: 

Archive file support in Jar Type application

Abhay_1002
New Contributor

In my spark application, I am using set of python libraries. I am submitting spark application as Jar Task. But I am not able to find any option provide Archive Files.

So, in order to handle python dependencies, I am using approach:

  • Create archive file of Python virtual environment using required set of libraries (<environment_name.tar.gz>)
  • Keep it on DBFS
  • In code, add this archive file using sparkSession.sparkContext.addArchive(<dbfs:/environment_name.tar.gz>)

As spark suggested, this archive file can be accessible and use with relative path ./<environment_name>/bin/python

But when I am running application on data bricks cluster, I am getting following error
Path not found : ./environment/bin/python

I checked, this archive file is present in different directories in driver and executor nodes. There is no classpath directory as such and due to this I can not use relative path.

Following are directories where archive file is present in different nodes
driver : /local_disk0/spark-xxxxxx-xxxx-xxxx/userFiles-xxxxx-xxxxx-xxxxx/
executor : /local_disk0/spark-xxxxx-xxxxx-xxxxx/executor-xxxxxx-xxxxxx-xxxxxxxx/spark-xxxxx-xxxx-xxxxx-xxxx/

Please suggest how I can use archive files in spark application submitted as Jar Type.

Version : Databricks Runtime Version 12.2 LTS (includes Apache Spark 3.3.2, Scala 2.12)

0 REPLIES 0

Join Us as a Local Community Builder!

Passionate about hosting events and connecting people? Help us grow a vibrant local community—sign up today to get started!

Sign Up Now