How to update external metastore cluster configuration on the fly ?

Oliver_Floyd
Contributor

Hello,

In my use case, my data is pushed to an adls gen2 container called ingest

After some data processing on a databricks cluster of the ingest workspace, I declare the associated table in an external metastore for this workspace

At the end of this processing (according to certain criteria) I push the curated data (a simple copy) to other containers (lab/qal/prd, each container contains data for a databricks workspace)

 and I want to declare the metastores for these 3 workspaces.

One solution is to launch 3 tasks after this first task. Each cluster associated with these tasks is configured with the metastore of each databricks workspace. It works but this solution is cumbersome:

  • need to start a cluster for each workspace
  • even if the table is already declared in the metastore, you have to start the cluster to check.
  • slow down our data process

Another solution could be to update the cluster configuration on the fly in the first task. I tried to modify the spark session configuration with the above lines of code:

spark.sparkContext.getConf().set("spark.hadoop.javax.jdo.option.ConnectionURL","jdbc:sqlserver://lab_env.database.windows.net:1433;database=labdatabase")

Or

spark.conf.set("spark.hadoop.javax.jdo.option.ConnectionURL","jdbc:sqlserver://lab_env.database.windows.net:1433;database=labdatabase")

but it seems that it doesn't work.

My question is simple : Do you know if there is a way to change this configuration in a notebook, or if it is not possible at all

Thanking you in advance for your help