cancel
Showing results for 
Search instead for 
Did you mean: 
Data Engineering
cancel
Showing results for 
Search instead for 
Did you mean: 

How to update external metastore cluster configuration on the fly ?

Oliver_Floyd
Contributor

Hello,

In my use case, my data is pushed to an adls gen2 container called ingest

After some data processing on a databricks cluster of the ingest workspace, I declare the associated table in an external metastore for this workspace

At the end of this processing (according to certain criteria) I push the curated data (a simple copy) to other containers (lab/qal/prd, each container contains data for a databricks workspace)

 and I want to declare the metastores for these 3 workspaces.

One solution is to launch 3 tasks after this first task. Each cluster associated with these tasks is configured with the metastore of each databricks workspace. It works but this solution is cumbersome:

  • need to start a cluster for each workspace
  • even if the table is already declared in the metastore, you have to start the cluster to check.
  • slow down our data process

Another solution could be to update the cluster configuration on the fly in the first task. I tried to modify the spark session configuration with the above lines of code:

spark.sparkContext.getConf().set("spark.hadoop.javax.jdo.option.ConnectionURL","jdbc:sqlserver://lab_env.database.windows.net:1433;database=labdatabase")

Or

spark.conf.set("spark.hadoop.javax.jdo.option.ConnectionURL","jdbc:sqlserver://lab_env.database.windows.net:1433;database=labdatabase")

but it seems that it doesn't work.

My question is simple : Do you know if there is a way to change this configuration in a notebook, or if it is not possible at all

Thanking you in advance for your help

1 ACCEPTED SOLUTION

Accepted Solutions

Atanu
Esteemed Contributor
Esteemed Contributor

Hi @oliv vier​  as per our doc this can be achieved by only through

  1. Spark config
  2. Init script

So I think , on the fly it won't work. Thanks. But may be you can have this as feature request to our product team.

View solution in original post

3 REPLIES 3

Kaniz
Community Manager
Community Manager

Hi @ Oliver Floyd! My name is Kaniz, and I'm the technical moderator here. Great to meet you, and thanks for your question! Let's see if your peers in the community have an answer to your question first. Or else I will get back to you soon. Thanks.

Atanu
Esteemed Contributor
Esteemed Contributor

Hi @oliv vier​  as per our doc this can be achieved by only through

  1. Spark config
  2. Init script

So I think , on the fly it won't work. Thanks. But may be you can have this as feature request to our product team.

Oliver_Floyd
Contributor

Hello @Atanu Sarkar​ ,

Thank you for your answer. I have created a feature request. I hope, it will be soon accepted ^^

Welcome to Databricks Community: Lets learn, network and celebrate together

Join our fast-growing data practitioner and expert community of 80K+ members, ready to discover, help and collaborate together while making meaningful connections. 

Click here to register and join today! 

Engage in exciting technical discussions, join a group with your peers and meet our Featured Members.