Databricks Community

adrianhernandez · ‎10-06-2025

Looking for ways to convert a Databricks notebook to Python library. Some context :

Don't want to give execute permissions to shared notebooks as we want to hide code from users.
Proposed solution is to have our shared notebook converted into a Python library. Goal is to import this Python library so users can create their own notebooks.
Exploring the use of Python Wheels, but, these may be restricted in our environment (managed by another team with very strict security).
Also tried converting code to UDFs but you cannot reference any spark in your code as UDF must be fully Python (cannot mix and match PySpark and Spark SQL in the same function).
Besides wheels and library imports is there any other way where we can accomplish sharing code w/o users viewing the code?

Thanks.

mark_ott · ‎10-07-2025

The best way to share code from a Databricks notebook as a reusable module while hiding implementation details from users—without using wheels or granting direct notebook execution permissions—is to convert your notebook into a Python module, store it securely in the Databricks workspace, and import it using relative workspace paths. This method allows importing custom Python modules directly from workspace files, which users can use without seeing source code, as long as permissions are set appropriately.

Workspace Modules Approach

Python source files (.py) can be uploaded and stored alongside notebooks in the Databricks workspace.
These modules can be organized in folders, and notebooks can import functions/classes using Python’s standard import syntax and relative paths (e.g., from workspace.module import func).
By assigning workspace permissions, users can only read and use the module interface; actual code can be hidden if the file/folder access is restricted.
If workspace permissions are appropriately managed, this can hide the code implementation while exposing APIs for user notebooks to call.

Limitations of Wheel Files and Library Installs

Python wheels are commonly used for sharing and deploying modules, but strict environment policies can limit their use.
Library uploads (including wheels) are not always feasible due to administrative restrictions.
Notebook-scoped libraries are also possible but may not meet your code privacy requirements—users might still see source.

Additional Strategies

Code obfuscation/minification: Not ideal for Python, as bytecode isn’t very secure and users might still find ways to read code if they have access.
Docker containers: You can deploy Spark code in containers, hiding source, but this requires cluster admin and more setup.
Unity Catalog or secret management: These help protect sensitive data, but cannot fully hide code logic.

For your needs, using Python modules stored in workspace files with carefully set permissions is the most suitable way to share code without exposing internals—provided your environment supports this feature. Wheels and other binary approaches remain best but may be restricted. UDFs and notebook-scoped libraries do not fully solve the visibility and Spark referencing problems.

View solution in original post

mark_ott · ‎10-08-2025

A Databricks notebook can help automate much of the wheel (whl) packaging and installation process, but it cannot fully eliminate the requirement of building the wheel artifact itself. However, you can create a notebook (or workflow) that covers most steps, from package building to deploying and installing the wheel onto your workspace or cluster, thereby minimizing manual intervention.

Key Capabilities

Building the wheel (whl) file automatically: With tools like setuptools, poetry, or uv, a notebook can run shell commands (%sh) or Magic commands to build the wheel directly from code stored within the Databricks workspace or fetched from a repository.
Uploading and installing the wheel: The notebook can upload the generated .whl to DBFS or a workspace path, then install it using %pip install /dbfs/path/to/your_package.whl or a similar command.
Automated configuration: Any additional setup, such as installing dependencies from a requirements.txt, can also be scripted within the same notebook.

Practical Limitations

You must still follow the basic structure of Python packaging: having a setup.py (or equivalent) and metadata files is necessary since these are required by the Python ecosystem to build wheels.
The initial setup (creating setup.py, organizing code, and writing build commands) happens once. Afterward, updating the wheel and deploying new versions can be fully automated in a notebook workflow.

Example Workflow

Place your source code and setup files (e.g., setup.py, pyproject.toml) in a workspace or accessible location.
Use a notebook cell to run the wheel build process:

text

%sh python setup.py bdist_wheel
Use another cell to upload and install the newly built wheel:

text

%pip install /dbfs/path/to/dist/your_package.whl
Optionally, automate the copying/upload of the wheel with the Databricks CLI or REST API.

This approach largely replaces manual building and uploading with a repeatable, notebook-driven process, streamlining your team's workflow.

In summary, while a notebook can't avoid the need for wheel-building prerequisites (setup files, code structure), it can effectively automate package creation, configuration, and installation to the point where manual intervention is minimal and repeatable updates become much easier.

View solution in original post

mark_ott · ‎10-07-2025

The best way to share code from a Databricks notebook as a reusable module while hiding implementation details from users—without using wheels or granting direct notebook execution permissions—is to convert your notebook into a Python module, store it securely in the Databricks workspace, and import it using relative workspace paths. This method allows importing custom Python modules directly from workspace files, which users can use without seeing source code, as long as permissions are set appropriately.

Workspace Modules Approach

Python source files (.py) can be uploaded and stored alongside notebooks in the Databricks workspace.
These modules can be organized in folders, and notebooks can import functions/classes using Python’s standard import syntax and relative paths (e.g., from workspace.module import func).
By assigning workspace permissions, users can only read and use the module interface; actual code can be hidden if the file/folder access is restricted.
If workspace permissions are appropriately managed, this can hide the code implementation while exposing APIs for user notebooks to call.

Limitations of Wheel Files and Library Installs

Python wheels are commonly used for sharing and deploying modules, but strict environment policies can limit their use.
Library uploads (including wheels) are not always feasible due to administrative restrictions.
Notebook-scoped libraries are also possible but may not meet your code privacy requirements—users might still see source.

Additional Strategies

Code obfuscation/minification: Not ideal for Python, as bytecode isn’t very secure and users might still find ways to read code if they have access.
Docker containers: You can deploy Spark code in containers, hiding source, but this requires cluster admin and more setup.
Unity Catalog or secret management: These help protect sensitive data, but cannot fully hide code logic.

For your needs, using Python modules stored in workspace files with carefully set permissions is the most suitable way to share code without exposing internals—provided your environment supports this feature. Wheels and other binary approaches remain best but may be restricted. UDFs and notebook-scoped libraries do not fully solve the visibility and Spark referencing problems.

adrianhernandez · ‎10-07-2025

Thanks for the great information. Our team has decided to do this as a wheel. Can a notebook be created that pushes new versions of code w/o having to go thru the manual process of creating a whl and other configuration files? In other words, can I create a notebook that will setup/configure and install the wheel?

mark_ott · ‎10-08-2025

A Databricks notebook can help automate much of the wheel (whl) packaging and installation process, but it cannot fully eliminate the requirement of building the wheel artifact itself. However, you can create a notebook (or workflow) that covers most steps, from package building to deploying and installing the wheel onto your workspace or cluster, thereby minimizing manual intervention.

Key Capabilities

Building the wheel (whl) file automatically: With tools like setuptools, poetry, or uv, a notebook can run shell commands (%sh) or Magic commands to build the wheel directly from code stored within the Databricks workspace or fetched from a repository.
Uploading and installing the wheel: The notebook can upload the generated .whl to DBFS or a workspace path, then install it using %pip install /dbfs/path/to/your_package.whl or a similar command.
Automated configuration: Any additional setup, such as installing dependencies from a requirements.txt, can also be scripted within the same notebook.

Practical Limitations

You must still follow the basic structure of Python packaging: having a setup.py (or equivalent) and metadata files is necessary since these are required by the Python ecosystem to build wheels.
The initial setup (creating setup.py, organizing code, and writing build commands) happens once. Afterward, updating the wheel and deploying new versions can be fully automated in a notebook workflow.

Example Workflow

Place your source code and setup files (e.g., setup.py, pyproject.toml) in a workspace or accessible location.
Use a notebook cell to run the wheel build process:

text

%sh python setup.py bdist_wheel
Use another cell to upload and install the newly built wheel:

text

%pip install /dbfs/path/to/dist/your_package.whl
Optionally, automate the copying/upload of the wheel with the Databricks CLI or REST API.

This approach largely replaces manual building and uploading with a repeatable, notebook-driven process, streamlining your team's workflow.

In summary, while a notebook can't avoid the need for wheel-building prerequisites (setup files, code structure), it can effectively automate package creation, configuration, and installation to the point where manual intervention is minimal and repeatable updates become much easier.