โ02-17-2026 07:00 AM
%sh python modules run losses access to spark. How do i regain spark session and access the databricks tables
โ02-17-2026 09:25 AM
%sh runs a shell command on the driver nodeโs OS, not inside the notebookโs Python/Spark runtime. It basically opens a separate Linux process on the driver machine.
The Spark session, on the other hand, is attached to the notebook runtime. So when you use normal Python cells, youโre inside the Spark-enabled environment.
May I know what you are running using %sh?
โ02-17-2026 10:38 PM
@soloengine to run existing python notebooks. we are trying to make it support spark as well
โ02-18-2026 12:23 AM
The Spark session never disappears %sh simply runs outside it.
do this:
from mymodule import myfunc
myfunc(spark)
dont do this :
%sh
python my_script.py
2 weeks ago
Hi @ajay_wavicle,
Thanks for the detailed writeup. The reason you lose access to Spark when using %sh is that it launches a completely separate Linux process on the driver node. That process runs outside the notebook runtime, so it has no connection to the SparkSession, dbutils, or any of the variables defined in your notebook cells.
There are several approaches to run your existing Python code while keeping full Spark access. Here is a rundown from simplest to most flexible.
OPTION 1: USE %run TO EXECUTE NOTEBOOKS INLINE
If your existing Python code is already in other Databricks notebooks, the easiest approach is %run. This executes the target notebook in the same Spark session, so all Spark APIs, tables, and dbutils are fully available.
%run /path/to/your_notebook
Any functions and variables defined in the called notebook become available in the calling notebook. One constraint: %run must be the only content in the cell.
Documentation: https://docs.databricks.com/en/notebooks/notebook-workflows.html
OPTION 2: IMPORT PYTHON FILES AS MODULES (RECOMMENDED)
Starting with Databricks Runtime 11.3 LTS and above, you can store .py files directly in the workspace alongside your notebooks and import them as regular Python modules. This is the cleanest approach for reusing existing Python code with Spark.
1. Upload or create your Python files in the workspace (for example, alongside your notebook or in a subfolder).
2. If needed, add the directory to your Python path:
import sys
import os
sys.path.append(os.path.abspath('/Workspace/path/to/your/modules'))
3. Import and call your functions, passing the spark session explicitly:
from my_module import my_function
result = my_function(spark)
Inside my_module.py, your function receives the active SparkSession:
def my_function(spark):
df = spark.table("my_catalog.my_schema.my_table")
# do your processing
return df
On Databricks Runtime 14.0 and above, the current working directory defaults to the directory containing the notebook, so relative imports are even simpler.
During development, you can enable autoreload so changes to your modules are picked up without restarting the kernel:
%load_ext autoreload
%autoreload 2
Documentation: https://docs.databricks.com/en/files/workspace-modules.html
OPTION 3: USE dbutils.notebook.run() FOR ORCHESTRATION
If you need to run a notebook as a separate job (for example, with different parameters or in a workflow), use dbutils.notebook.run(). This launches the notebook as a new job run with its own Spark session and full access to tables.
result = dbutils.notebook.run(
"/path/to/your_notebook",
timeout_seconds=600,
arguments={"param1": "value1"}
)
The called notebook can return a string result and create global temporary views to share data back.
Documentation: https://docs.databricks.com/en/notebooks/notebook-workflows.html
OPTION 4: DATABRICKS CONNECT (FOR EXTERNAL SCRIPTS)
If you truly need to run standalone Python scripts (outside the notebook environment) that connect to Spark on a Databricks cluster, Databricks Connect is the right tool. It lets external Python processes establish a remote SparkSession.
from databricks.connect import DatabricksSession
spark = DatabricksSession.builder.getOrCreate()
df = spark.table("my_catalog.my_schema.my_table")
This is ideal if you have a large codebase developed locally or in a CI/CD pipeline and need cluster-backed Spark execution.
Documentation: https://docs.databricks.com/en/dev-tools/databricks-connect/python/index.html
QUICK SUMMARY
- Do not use %sh to run Python scripts that need Spark. The shell process cannot access the SparkSession.
- For notebook-to-notebook calls, use %run (shares the same session) or dbutils.notebook.run() (new session with parameters).
- For .py file reuse, import them as Python modules and pass the spark object explicitly.
- For external/standalone scripts, use Databricks Connect.
Hope this helps you get your existing Python code running with full Spark access. Let us know which approach works best for your use case.
* This reply used an agent system I built to research and draft this response based on the wide set of documentation I have available and previous memory. I personally review the draft for any obvious issues and for monitoring system reliability and update it when I detect any drift, but there is still a small chance that something is inaccurate, especially if you are experimenting with brand new features.