cancel
Showing results for 
Search instead for 
Did you mean: 
Data Engineering
Join discussions on data engineering best practices, architectures, and optimization strategies within the Databricks Community. Exchange insights and solutions with fellow data engineers.
cancel
Showing results for 
Search instead for 
Did you mean: 

How to install python package on spark cluster

kidexp
New Contributor II

Hi,

How can I install python packages on spark cluster? in local, I can use pip install.

I want to use some external packages which is not installed on was spark cluster.

Thanks for any suggestions.

1 ACCEPTED SOLUTION

Accepted Solutions

arsalan1
Contributor

@kidexpโ€‹ 

From the workspace dropdown, you can select New Library, and then select Python eggs or specify specific packages. Please see attached screenshots.

screen-shot-2015-04-14-at-30305-pm.png34-screen-shot-2015-04-14-at-30248-pm 

View solution in original post

7 REPLIES 7

arsalan1
Contributor

@kidexpโ€‹ 

From the workspace dropdown, you can select New Library, and then select Python eggs or specify specific packages. Please see attached screenshots.

screen-shot-2015-04-14-at-30305-pm.png34-screen-shot-2015-04-14-at-30248-pm 

kidexp
New Contributor II

Thanks very much @Arsalan Tavakoli-Shirajiโ€‹ 

@Arsalan Tavakoli-Shirajiโ€‹  how do we attach it to a specific cluster programmatically (and not just all clusters by checking that box)

You can use the Databricks Libraries API to programmatically attach libraries to specific clusters. For more information: https://docs.databricks.com/api/latest/libraries.html#install

Anonymous
Not applicable

Introduce Python bundle on flash group

Make a virtualenv only for your Flash hubs.

Each time you run a Flash work, run a new pip introduce of all your own in-house Python libraries. ...

Zoom up the site-bundles dir of the virtualenv. ...

Pass the single .compress document, containing your libraries and their conditions as a contention to - - py-records.

Custom Boxes With Logo | Custom Labels and Stickers

Mikejerere
New Contributor II

Use --py-files with Spark Submit: Zip the package and add it using --py-files when you run spark-submit. For example:

spark-submit --py-files path/to/your_package.zip your_script.py

Mikejerere
New Contributor II

If --py-files doesnโ€™t work, try this shorter method:

  1. Create a Conda Environment: Install your packages.

    conda create -n myenv python=3.x
    conda activate myenv
    pip install your-package

    Package and Submit: Use conda-pack and spark-submit with --archives.

    conda pack -n myenv -o myenv.tar.gz
    spark-submit --archives myenv.tar.gz#myenv --conf spark.pyspark.python=myenv/bin/python your_script.py

    This runs your Spark job with the required packages.

    Regards,
    Summa Marketing

     

Join Us as a Local Community Builder!

Passionate about hosting events and connecting people? Help us grow a vibrant local communityโ€”sign up today to get started!

Sign Up Now