06-28-2023 05:02 AM
Hello,
What is the correct way to install packages from requierements.txt within databricks repo. Do I need to add some utils notebooks with additional scripts to my repo and run them before any of the script from the file? I suppose adding pip install on every file is a bit extreme so what is correct approach to this?
Would greatly appreciate any help, since it is my first time working with databricks throught repo
06-28-2023 07:16 AM
The .py files are very handy if you create classes etc. They contain modules which you can import into a notebook with the import statement. They are not meant to be run.
06-28-2023 06:12 AM
Define an environment in a requirements.txt file in the repo. Then just run pip install -r requirements.txt from a notebook to install the packages and create the environment for the notebook.
Using Repos is practically the same as the Workspace, except it is linked to git (so you need to commit/push/pull) and the paths are different.
06-28-2023 07:11 AM
Appreciate your response,
pip install -r requirements.txt worked when I created new notebook and run some code there, but not for the files in the repo. When I try to run '*.py' file in my repo through databricks run command, I get 'ModuleNotFoundError'. Maybe I am just misunderstanding the concept here, and you are not suppose to run those files directly in databricks and if you do, it is better to have them as notebook files as opposed to '.py.'
As a side note, I was reading on global init script and wondering if that would be a way to run my files within databricks.
Maybe someone could point me to some information (docs, video or anything) about working in databricks repo that goes beyond integration
06-28-2023 07:16 AM
The .py files are very handy if you create classes etc. They contain modules which you can import into a notebook with the import statement. They are not meant to be run.
Passionate about hosting events and connecting people? Help us grow a vibrant local community—sign up today to get started!
Sign Up Now