Can I run multiple jobs(for example: 100+) in parallel that refers the same notebook? I supply each job with a different parameter. If we can do this, what would be the impact? (for example: reliability, performance, troubleshooting etc. )
Example:
Notebook:
table_name = dbutils.widgets.get("table_name")
df = (spark.read.format("parquet").load(f's3://data_source_bucket_name/{table_name}/'))
<process the data >
df.write.saveAsTable(table_name,mode="overwrite")
Job 1 Parameters:
table_name = 'Table_1'
Job 2 Parameters:
table_name = 'Table_2'
.
.
.
.
Job 100 Parameters:
table_name = 'Table_100'
Explanation : Read parquet files from the table folder and load into delta table after processing. The processing steps are the same for all the tables.