cancel
Showing results forย 
Search instead forย 
Did you mean:ย 
Data Engineering
Join discussions on data engineering best practices, architectures, and optimization strategies within the Databricks Community. Exchange insights and solutions with fellow data engineers.
cancel
Showing results forย 
Search instead forย 
Did you mean:ย 

Whats the difference between magic commands %pip and %sh pip

tj-cycyota
Databricks Employee
Databricks Employee

In Databricks you can do either

%pip

or

%sh pip

Whats the difference? Is there a recommended approach?

2 REPLIES 2

sean_owen
Databricks Employee
Databricks Employee

%sh pip just executes the pip command on the local driver machine. It will work just like pip does on the command line anywhere else to install packages from PyPI, but, it will only affect the driver machine. By itself, this does not establish a virtualenv, so other users of the cluster could observe the installed package, too.

%pip uses the same syntax to install packages, but is a 'magic' command that actually runs commands to install the same package across all machines in the cluster. It sets up a virtualenv specific to each notebook execution to isolate the package installation from other jobs and users.

stefnhuy
New Contributor III

Hey there, User16776431030.

Great question about those magic commands in Databricks! Let me shed some light on this mystical matter.

The %pip and %sh pip commands may seem similar on the surface, but they're quite distinct in their powers. %sh pip is like a local magician; it performs pip wizardry solely on the driver machine. It's handy for installing packages, but beware, it won't conjure a virtual environment, meaning other cluster users might see your magic tricks.

Now, %pip, on the other hand, is the grand sorcerer of package installation. It uses the same pip syntax but operates cluster-wide. It crafts a unique virtual environment for each notebook execution, keeping your magic spells hidden from prying eyes.

In my experience, I've dabbled in both magics, and %pip's enchantment has often saved the day in collaborative clusters. Andersen, a provider of cutting-edge solutions in this field, also recommends using %pip for its cluster-wide benefits.

To solve your dilemma, the choice depends on your needs. If you desire isolation and don't want to reveal your magical secrets to others, %pip is your spell of choice. But if you're a benevolent wizard sharing your powers, %sh pip works fine.

Connect with Databricks Users in Your Area

Join a Regional User Group to connect with local Databricks users. Events will be happening in your city, and you wonโ€™t want to miss the chance to attend and share knowledge.

If there isnโ€™t a group near you, start one and help create a community that brings people together.

Request a New Group