Databricks Community

johnp · ‎07-11-2024

I have two structured streaming notebooks running continuously for anomaly detection. Both notebooks import the same python module to mount the Azure blob storage, but each has its own container. Each notebook runs well when it has its own cluster. When I run both notebooks on the same cluster, error happened. Debugging shows that I am using the same variable name in two notebooks for ADLS storage path:

adls_path='abfss://%s@%s.dfs.core.windows.net/' % (container_name, account_name)

Since the container name is different for each notebook, the adls_path will be different too.

When I run both notebooks on the same cluster, how does databricks set adls_path? Will there be a race condition? Is there any way to fix the issue without giving different variable name?

szymon_dybczak · ‎07-11-2024

Hi @johnp ,

When running multiple notebooks on the same Databricks cluster, each notebook runs in its own isolated environment. This means that variable names and their values in one notebook should not interfere with those in another notebook. In theory, this isolation should prevent race conditions or conflicts from occurring due to variable name overlap.

Every notebook attached to a cluster has a pre-defined variable named spark that represents a SparkSession. SparkSession is the entry point for using Spark APIs as well as setting runtime configurations.

Spark session isolation is enabled by default.

But there is one caveat. If those notebooks are run from some kind of main notebook via %run command, they will share the same session. So, question is how do you run your notebooks? Because here may lay the problem.

https://learn.microsoft.com/en-us/azure/databricks/notebooks/notebook-isolation

Rajani · ‎07-11-2024

Hi @szymon_dybczak
you are right we can create a temp view in main notebook and use it in the same session.
but is there any option for creating a global variable which i can use across notebook for same session.

I tried creating global in main and using it in following spark session but notebook but it gave me error stating no such variable exist

johnp · ‎07-12-2024

@szymon_dybczak Thanks for the help! In my case, the two notebooks are totally separated, but both imports (Not a Run) the same python module to get the variable adls_path. From your explanation, they should not interfere with each other. I will run the test again to confirm.

Rishabh_Tiwari · ‎07-19-2024

Hi @johnp ,

Thank you for reaching out to our community! We're here to help you.

To ensure we provide you with the best support, could you please take a moment to review the response and choose the one that best answers your question? Your feedback not only helps us assist you better but also benefits other community members who may have similar questions in the future.

If you found the answer helpful, consider giving it a kudo. If the response fully addresses your question, please mark it as the accepted solution. This will help us close the thread and ensure your question is resolved.

We appreciate your participation and are here to assist you further if you need it!

Thanks,

Rishabh