cancel
Showing results for 
Search instead for 
Did you mean: 
Data Engineering
Join discussions on data engineering best practices, architectures, and optimization strategies within the Databricks Community. Exchange insights and solutions with fellow data engineers.
cancel
Showing results for 
Search instead for 
Did you mean: 

How do I get the current cluster id?

ammobear
New Contributor III

I am adding Application Insights telemetry to my Databricks jobs and would like to include the cluster ID of the job run. How can I access the cluster id at run time?

The requirement is that my job can programmatically retrieve the cluster id to insert into all telemetry. Retrieving the cluster ID through the UI will not be sufficient.

I don't see any dbutils commands that would be of use.

1 ACCEPTED SOLUTION

Accepted Solutions

Arti
New Contributor III

You can use the cluster node initiaization script to grab the environment variable DB_CLUSTER_ID. Refer here https://docs.databricks.com/user-guide/clusters/init-scripts.html#environment-variables.

Also from the notebook , the following also works :

spark.conf.get("spark.databricks.clusterUsageTags.clusterName")

spark.conf.get("spark.databricks.clusterUsageTags.clusterId")

The entire list of spark properties is available in UI in Spark UI --> Environment tab.

Hope this helps!

View solution in original post

11 REPLIES 11

tonyp
New Contributor II

In Databricks click on your cluster in the Clusters tab, Change the UI interface to json, It will give the all details about your cluster

0693f000007OrnRAAS

ammobear
New Contributor III

Thank you for your answer. I have added more detail to my question. Unfortunately, the UI will not work as I need my job code to programmatically pull the cluster id.

Arti
New Contributor III

You can use the cluster node initiaization script to grab the environment variable DB_CLUSTER_ID. Refer here https://docs.databricks.com/user-guide/clusters/init-scripts.html#environment-variables. Also from the notebook, the following also works :

spark.conf.get("spark.databricks.clusterUsageTags.clusterName")

spark.conf.get("spark.databricks.clusterUsageTags.clusterId")

ammobear
New Contributor III

That's perfect! Thank you for your help Arti.

Do you know if the properties of

clusterUsageTabs
are documented anywhere? I wonder what other useful properties I might be able to log.

Arti
New Contributor III

I am not sure if these are documented. Use the following code to get the entire list:

%scala

val configMap = spark.conf.getAll

configMap.foreach(println)

ammobear
New Contributor III

Thanks! Feel free to make your first comment a separate post and I'll mark it as the answer.

Arti
New Contributor III

You can use the cluster node initiaization script to grab the environment variable DB_CLUSTER_ID. Refer here https://docs.databricks.com/user-guide/clusters/init-scripts.html#environment-variables.

Also from the notebook , the following also works :

spark.conf.get("spark.databricks.clusterUsageTags.clusterName")

spark.conf.get("spark.databricks.clusterUsageTags.clusterId")

The entire list of spark properties is available in UI in Spark UI --> Environment tab.

Hope this helps!

tonyp
New Contributor II

Thank you for your answer.@Arti

EricBellet
New Contributor III

Hi, I'm trying to install a library using the init script option. But the $DB_CLUSTER_ID it is empty, why? Also I tried to do in a Notebook cell a %sh echo $DB_CLUSTER_ID and also is empty

curl -X POST https://dbc-myid.cloud.databricks.com/api/2.0/libraries/install -H 'Cache-Control:no-cache' -H 'Content-Type:application/json' -H 'Authorization: Bearer mytoken' -d '{"cluster_id": "$DB_CLUSTER_ID", "libraries": [{"egg": "dbfs:/FileStore/jars/library-0.1-py3.7.egg"}]}'

Arti
New Contributor III

Are you using cluster init script or global init script ?

DB_CLUSTER_ID - This env variable is available only during the cluster init script execution. I guess you will not be able to get the values for the these init script's env variables if you try from notebook.

EricBellet
New Contributor III

I fixed, it should be "'$DB_CLUSTER_ID'"

Connect with Databricks Users in Your Area

Join a Regional User Group to connect with local Databricks users. Events will be happening in your city, and you won’t want to miss the chance to attend and share knowledge.

If there isn’t a group near you, start one and help create a community that brings people together.

Request a New Group