The best way to share code from a Databricks notebook as a reusable module while hiding implementation details from users—without using wheels or granting direct notebook execution permissions—is to convert your notebook into a Python module, store it securely in the Databricks workspace, and import it using relative workspace paths. This method allows importing custom Python modules directly from workspace files, which users can use without seeing source code, as long as permissions are set appropriately.
Workspace Modules Approach
-
Python source files (.py) can be uploaded and stored alongside notebooks in the Databricks workspace.
-
These modules can be organized in folders, and notebooks can import functions/classes using Python’s standard import syntax and relative paths (e.g., from workspace.module import func
).
-
By assigning workspace permissions, users can only read and use the module interface; actual code can be hidden if the file/folder access is restricted.
-
If workspace permissions are appropriately managed, this can hide the code implementation while exposing APIs for user notebooks to call.
Limitations of Wheel Files and Library Installs
-
Python wheels are commonly used for sharing and deploying modules, but strict environment policies can limit their use.
-
Library uploads (including wheels) are not always feasible due to administrative restrictions.
-
Notebook-scoped libraries are also possible but may not meet your code privacy requirements—users might still see source.
Additional Strategies
-
Code obfuscation/minification: Not ideal for Python, as bytecode isn’t very secure and users might still find ways to read code if they have access.
-
Docker containers: You can deploy Spark code in containers, hiding source, but this requires cluster admin and more setup.
-
Unity Catalog or secret management: These help protect sensitive data, but cannot fully hide code logic.
For your needs, using Python modules stored in workspace files with carefully set permissions is the most suitable way to share code without exposing internals—provided your environment supports this feature. Wheels and other binary approaches remain best but may be restricted. UDFs and notebook-scoped libraries do not fully solve the visibility and Spark referencing problems.