Databricks

Oliver_Floyd · ‎02-03-2022

Hello,

In my use case, my data is pushed to an adls gen2 container called ingest

After some data processing on a databricks cluster of the ingest workspace, I declare the associated table in an external metastore for this workspace

At the end of this processing (according to certain criteria) I push the curated data (a simple copy) to other containers (lab/qal/prd, each container contains data for a databricks workspace)

and I want to declare the metastores for these 3 workspaces.

One solution is to launch 3 tasks after this first task. Each cluster associated with these tasks is configured with the metastore of each databricks workspace. It works but this solution is cumbersome:

need to start a cluster for each workspace
even if the table is already declared in the metastore, you have to start the cluster to check.
slow down our data process

Another solution could be to update the cluster configuration on the fly in the first task. I tried to modify the spark session configuration with the above lines of code:

spark.sparkContext.getConf().set("spark.hadoop.javax.jdo.option.ConnectionURL","jdbc:sqlserver://lab_env.database.windows.net:1433;database=labdatabase")

Or

spark.conf.set("spark.hadoop.javax.jdo.option.ConnectionURL","jdbc:sqlserver://lab_env.database.windows.net:1433;database=labdatabase")

but it seems that it doesn't work.

My question is simple : Do you know if there is a way to change this configuration in a notebook, or if it is not possible at all

Thanking you in advance for your help

Atanu · ‎02-12-2022

Hi @oliv vier as per our doc this can be achieved by only through

Spark config
Init script

So I think , on the fly it won't work. Thanks. But may be you can have this as feature request to our product team.

View solution in original post

Kaniz · ‎02-03-2022

Hi @ Oliver Floyd! My name is Kaniz, and I'm the technical moderator here. Great to meet you, and thanks for your question! Let's see if your peers in the community have an answer to your question first. Or else I will get back to you soon. Thanks.