cancel
Showing results for 
Search instead for 
Did you mean: 
Data Engineering
Join discussions on data engineering best practices, architectures, and optimization strategies within the Databricks Community. Exchange insights and solutions with fellow data engineers.
cancel
Showing results for 
Search instead for 
Did you mean: 

Using Init Scipt to execute python notebook at all-purpose cluster level

prashant151
New Contributor II

Hi

We have setup.py in my databricks workspace.

This script is executed in other transformation scripts using

%run /Workspace/Common/setup.py

which consume lot of time.

 

This setup.py internally calls other utilities notebooks using %run

%run /Workspace/Common/01_Utilities.py
%run /Workspace/Common/02_Utilities.py

We are trying to run setup.py. at cluster level. Currently shell files are allowed in init scripts.

Please help how we can execute this setup.py at cluster level and we can remove execution of this notebook in rest of notebooks.

@Advika 

2 ACCEPTED SOLUTIONS

Accepted Solutions

Raman_Unifeye
Contributor III

@prashant151 - Unlike legacy (pre-UC) clusters, you cannot directly run a Databricks notebook (like setup.py) from a cluster init script, because init scripts only support shell commands — not %run or notebook execution.

You will need to refactor your setup logic into a Python module and install it via the init script.

I would do below instead

  • Refactor steup.py into python package (.whl)
  • Store your .whl to UC Volume
  • Install via Cluster Init Script (Attach this init script to your cluster under Advanced Options → Init Scripts)
  • Now you can import it in your notebook instead of %run

 


RG #Driving Business Outcomes with Data Intelligence

View solution in original post

iyashk-DB
Databricks Employee
Databricks Employee

You can’t “%run a notebook” from a cluster init script—init scripts are shell-only and meant for environment setup (install libs, set env vars), not for executing notebooks or sharing Python state across sessions. +1 to what @Raman_Unifeye has told.

Convert your common code into a Python module or wheel and import it in notebooks instead of %run.

Option 1: Workspace modules (no build step)

Move Common/01_Utilities.py, 02_Utilities.py into Workspace Files as .py modules (add init.py). Then import them directly (DR 11.3+ auto-adds CWD to PYTHONPATH).

Example:

from Common.utilities import init_env, foo
init_env()

Option 2: Build a wheel and install per cluster

Package your code (setup.py or pyproject.toml), build a wheel, then install it as a cluster library or via %pip install /Workspace/.../yourpkg.whl (or from a UC volume).
Example:

%pip install /Workspace/Common/dist/yourpkg-0.1.0-py3-none-any.whl
import yourpkg
yourpkg.init_env()

View solution in original post

2 REPLIES 2

Raman_Unifeye
Contributor III

@prashant151 - Unlike legacy (pre-UC) clusters, you cannot directly run a Databricks notebook (like setup.py) from a cluster init script, because init scripts only support shell commands — not %run or notebook execution.

You will need to refactor your setup logic into a Python module and install it via the init script.

I would do below instead

  • Refactor steup.py into python package (.whl)
  • Store your .whl to UC Volume
  • Install via Cluster Init Script (Attach this init script to your cluster under Advanced Options → Init Scripts)
  • Now you can import it in your notebook instead of %run

 


RG #Driving Business Outcomes with Data Intelligence

iyashk-DB
Databricks Employee
Databricks Employee

You can’t “%run a notebook” from a cluster init script—init scripts are shell-only and meant for environment setup (install libs, set env vars), not for executing notebooks or sharing Python state across sessions. +1 to what @Raman_Unifeye has told.

Convert your common code into a Python module or wheel and import it in notebooks instead of %run.

Option 1: Workspace modules (no build step)

Move Common/01_Utilities.py, 02_Utilities.py into Workspace Files as .py modules (add init.py). Then import them directly (DR 11.3+ auto-adds CWD to PYTHONPATH).

Example:

from Common.utilities import init_env, foo
init_env()

Option 2: Build a wheel and install per cluster

Package your code (setup.py or pyproject.toml), build a wheel, then install it as a cluster library or via %pip install /Workspace/.../yourpkg.whl (or from a UC volume).
Example:

%pip install /Workspace/Common/dist/yourpkg-0.1.0-py3-none-any.whl
import yourpkg
yourpkg.init_env()

Join Us as a Local Community Builder!

Passionate about hosting events and connecting people? Help us grow a vibrant local community—sign up today to get started!

Sign Up Now