cancel
Showing results for 
Search instead for 
Did you mean: 
Data Engineering
Join discussions on data engineering best practices, architectures, and optimization strategies within the Databricks Community. Exchange insights and solutions with fellow data engineers.
cancel
Showing results for 
Search instead for 
Did you mean: 

How to install python package on spark cluster

kidexp
New Contributor II

Hi,

How can I install python packages on spark cluster? in local, I can use pip install.

I want to use some external packages which is not installed on was spark cluster.

Thanks for any suggestions.

1 ACCEPTED SOLUTION

Accepted Solutions

arsalan1
Contributor

@kidexp​ 

From the workspace dropdown, you can select New Library, and then select Python eggs or specify specific packages. Please see attached screenshots.

screen-shot-2015-04-14-at-30305-pm.png34-screen-shot-2015-04-14-at-30248-pm 

View solution in original post

7 REPLIES 7

arsalan1
Contributor

@kidexp​ 

From the workspace dropdown, you can select New Library, and then select Python eggs or specify specific packages. Please see attached screenshots.

screen-shot-2015-04-14-at-30305-pm.png34-screen-shot-2015-04-14-at-30248-pm 

kidexp
New Contributor II

Thanks very much @Arsalan Tavakoli-Shiraji​ 

@Arsalan Tavakoli-Shiraji​  how do we attach it to a specific cluster programmatically (and not just all clusters by checking that box)

You can use the Databricks Libraries API to programmatically attach libraries to specific clusters. For more information: https://docs.databricks.com/api/latest/libraries.html#install

Anonymous
Not applicable

Introduce Python bundle on flash group

Make a virtualenv only for your Flash hubs.

Each time you run a Flash work, run a new pip introduce of all your own in-house Python libraries. ...

Zoom up the site-bundles dir of the virtualenv. ...

Pass the single .compress document, containing your libraries and their conditions as a contention to - - py-records.

Custom Boxes With Logo | Custom Labels and Stickers

Mikejerere
New Contributor

Use --py-files with Spark Submit: Zip the package and add it using --py-files when you run spark-submit. For example:

spark-submit --py-files path/to/your_package.zip your_script.py

Mikejerere
New Contributor

If --py-files doesn’t work, try this shorter method:

  1. Create a Conda Environment: Install your packages.

    conda create -n myenv python=3.x
    conda activate myenv
    pip install your-package

    Package and Submit: Use conda-pack and spark-submit with --archives.

    conda pack -n myenv -o myenv.tar.gz
    spark-submit --archives myenv.tar.gz#myenv --conf spark.pyspark.python=myenv/bin/python your_script.py

    This runs your Spark job with the required packages.

    Regards,
    Summa Marketing

     

Connect with Databricks Users in Your Area

Join a Regional User Group to connect with local Databricks users. Events will be happening in your city, and you won’t want to miss the chance to attend and share knowledge.

If there isn’t a group near you, start one and help create a community that brings people together.

Request a New Group