cancel
Showing results for 
Search instead for 
Did you mean: 
Machine Learning
Dive into the world of machine learning on the Databricks platform. Explore discussions on algorithms, model training, deployment, and more. Connect with ML enthusiasts and experts.
cancel
Showing results for 
Search instead for 
Did you mean: 

cluster sharing between different notebooks

johnp
New Contributor III

I have two structured streaming notebooks running continuously for anomaly detection. Both notebooks import the same python module to mount the Azure blob storage, but each has its own container.  Each notebook runs well when it has its own cluster.  When I run both notebooks on the same cluster, error happened. Debugging shows that I am using the same variable name in two notebooks for ADLS storage path:

      adls_path='abfss://%s@%s.dfs.core.windows.net/' % (container_name, account_name)
Since the  container name is different for each notebook,  the adls_path will be different too. 
When I run both notebooks on the same cluster, how does databricks set adls_path? Will there be a race condition? Is there any way to fix the issue without giving different variable name?
4 REPLIES 4

szymon_dybczak
Contributor III

Hi @johnp ,

When running multiple notebooks on the same Databricks cluster, each notebook runs in its own isolated environment. This means that variable names and their values in one notebook should not interfere with those in another notebook. In theory, this isolation should prevent race conditions or conflicts from occurring due to variable name overlap.

Every notebook attached to a cluster has a pre-defined variable named spark that represents a SparkSession. SparkSession is the entry point for using Spark APIs as well as setting runtime configurations. 

Spark session isolation  is enabled by default. 

But there is one caveat. If those notebooks are run from some kind of main notebook via %run command, they will share the same session. So, question is how do you run your notebooks? Because here may lay the problem.

 

https://learn.microsoft.com/en-us/azure/databricks/notebooks/notebook-isolation

 

Rajani
Contributor II

Hi @szymon_dybczak 
you are right we can create a temp view in main notebook and use it in the same session.
but is there any option for creating a global variable which i can use across notebook for same session.

I tried creating global in main and using it in following spark session but  notebook but it gave me error stating no such variable exist

johnp
New Contributor III

@szymon_dybczak Thanks for the help! In my case, the two notebooks are totally separated, but both imports (Not a Run) the same python module to get the variable adls_path.  From your explanation, they should not interfere with each other. I will run the test again to confirm.

Rishabh_Tiwari
Community Manager
Community Manager

Hi @johnp ,

Thank you for reaching out to our community! We're here to help you. 

To ensure we provide you with the best support, could you please take a moment to review the response and choose the one that best answers your question? Your feedback not only helps us assist you better but also benefits other community members who may have similar questions in the future.

If you found the answer helpful, consider giving it a kudo. If the response fully addresses your question, please mark it as the accepted solution. This will help us close the thread and ensure your question is resolved.

We appreciate your participation and are here to assist you further if you need it!

Thanks,

Rishabh

Connect with Databricks Users in Your Area

Join a Regional User Group to connect with local Databricks users. Events will be happening in your city, and you won’t want to miss the chance to attend and share knowledge.

If there isn’t a group near you, start one and help create a community that brings people together.

Request a New Group