cancel
Showing results forย 
Search instead forย 
Did you mean:ย 
Data Engineering
Join discussions on data engineering best practices, architectures, and optimization strategies within the Databricks Community. Exchange insights and solutions with fellow data engineers.
cancel
Showing results forย 
Search instead forย 
Did you mean:ย 

Using a cluster of type SINGLE_USER to run parallel python tasks in one job

oye
New Contributor

Hi, 

I have set up a job of multiple spark python tasks running in parallel. I have only set up one job cluster, single node, data security mode SINGLE_USER, using Databricks Runtime version 14.3.x-scala2.12.

These parallel spark python tasks share some similar variable names, but they are not technically global variables, everything is defined under one main function per file.

Will the python tasks somehow share these variables since I am using the same clusters? Can this ever happen using Databricks cluster?

1 REPLY 1

Coffee77
Contributor III

Not sure to understand completely but If you are running parallel tasks with each task executed in a given notebook with same variable names, answer is no. The scope of those variables is kind of the spark session or notebook, not the cluster.

To share "data" at cluster level you can use Cluster-Scoped Environment Variables, Global Temp Views, Databricks Secrets for confidential data or even Shared files.

 


Lifelong Learner Cloud & Data Solution Architect | https://www.youtube.com/@CafeConData