Hello,
In my use case, my data is pushed to an adls gen2 container called ingest
After some data processing on a databricks cluster of the ingest workspace, I declare the associated table in an external metastore for this workspace
At the end of this processing (according to certain criteria) I push the curated data (a simple copy) to other containers (lab/qal/prd, each container contains data for a databricks workspace)
and I want to declare the metastores for these 3 workspaces.
One solution is to launch 3 tasks after this first task. Each cluster associated with these tasks is configured with the metastore of each databricks workspace. It works but this solution is cumbersome:
- need to start a cluster for each workspace
- even if the table is already declared in the metastore, you have to start the cluster to check.
- slow down our data process
Another solution could be to update the cluster configuration on the fly in the first task. I tried to modify the spark session configuration with the above lines of code:
spark.sparkContext.getConf().set("spark.hadoop.javax.jdo.option.ConnectionURL","jdbc:sqlserver://lab_env.database.windows.net:1433;database=labdatabase")
Or
spark.conf.set("spark.hadoop.javax.jdo.option.ConnectionURL","jdbc:sqlserver://lab_env.database.windows.net:1433;database=labdatabase")
but it seems that it doesn't work.
My question is simple : Do you know if there is a way to change this configuration in a notebook, or if it is not possible at all
Thanking you in advance for your help