cancel
Showing results for 
Search instead for 
Did you mean: 
Data Engineering
Join discussions on data engineering best practices, architectures, and optimization strategies within the Databricks Community. Exchange insights and solutions with fellow data engineers.
cancel
Showing results for 
Search instead for 
Did you mean: 

Using a Virtual environment

hvsk
New Contributor

Hi All,

We are working on training NHits/TFT (a Pytorch-forecasting implementation) for timeseries forecasting. However, we are having some issues with package dependency conflicts.

Is there a way to consistently use a virtual environment across cells in a Databricks notebooks? What is the recommended approach here?

Here is a screenshot of a minimal example where we are creating a virtual environment in one cell and in the next cell we are unable to access it.

Thank You

2 REPLIES 2

Anonymous
Not applicable

@Harsh Kalra​ :

there are a few ways to manage package dependencies in Databricks notebooks:

  1. Use Databricks' built-in package management: Databricks allows you to install packages using a built-in package manager. You can do this through the UI by going to the "Libraries" tab of your cluster and adding the packages you need. Alternatively, you can use the dbutils.library.install command to install packages programmatically.
  2. Use a virtual environment: You can create a virtual environment using conda or pipenv and install all the packages you need there. Then, you can activate the environment in each cell where you need to use those packages. However, note that this approach can be tricky to set up, as you'll need to make sure that all the packages you need are installed in the virtual environment and that you activate the environment correctly in each cell.
  3. Install packages in each cell: You can install the necessary packages in each cell where you need them using pip
  4. However, this can be time-consuming and can clutter your notebook code.

The recommended approach may depend on your specific use case and the complexity of your package dependencies. If you're having issues with package dependency conflicts, it may be worth trying the built-in package management approach first, as this will automatically handle dependency resolution.

Anonymous
Not applicable

Hi @Harsh Kalra​ 

Hope everything is going great.

Just wanted to check in if you were able to resolve your issue. If yes, would you be happy to mark an answer as best so that other members can find the solution more quickly? If not, please tell us so we can help you. 

Cheers!

Connect with Databricks Users in Your Area

Join a Regional User Group to connect with local Databricks users. Events will be happening in your city, and you won’t want to miss the chance to attend and share knowledge.

If there isn’t a group near you, start one and help create a community that brings people together.

Request a New Group