Databricks Community

ammobear · ‎02-11-2019

I am adding Application Insights telemetry to my Databricks jobs and would like to include the cluster ID of the job run. How can I access the cluster id at run time?

The requirement is that my job can programmatically retrieve the cluster id to insert into all telemetry. Retrieving the cluster ID through the UI will not be sufficient.

I don't see any dbutils commands that would be of use.

Arti · ‎02-12-2019

You can use the cluster node initiaization script to grab the environment variable DB_CLUSTER_ID. Refer here https://docs.databricks.com/user-guide/clusters/init-scripts.html#environment-variables.

Also from the notebook , the following also works :

spark.conf.get("spark.databricks.clusterUsageTags.clusterName")

spark.conf.get("spark.databricks.clusterUsageTags.clusterId")

The entire list of spark properties is available in UI in Spark UI --> Environment tab.

Hope this helps!

View solution in original post

tonyp · ‎02-11-2019

In Databricks click on your cluster in the Clusters tab, Change the UI interface to json, It will give the all details about your cluster

ammobear · ‎02-12-2019

Thank you for your answer. I have added more detail to my question. Unfortunately, the UI will not work as I need my job code to programmatically pull the cluster id.

Arti · ‎02-12-2019

You can use the cluster node initiaization script to grab the environment variable DB_CLUSTER_ID. Refer here https://docs.databricks.com/user-guide/clusters/init-scripts.html#environment-variables. Also from the notebook, the following also works :

spark.conf.get("spark.databricks.clusterUsageTags.clusterName")

spark.conf.get("spark.databricks.clusterUsageTags.clusterId")

ammobear · ‎02-12-2019

That's perfect! Thank you for your help Arti.

Do you know if the properties of

clusterUsageTabs

are documented anywhere? I wonder what other useful properties I might be able to log.

Arti · ‎02-12-2019

I am not sure if these are documented. Use the following code to get the entire list:

%scala

val configMap = spark.conf.getAll

configMap.foreach(println)

ammobear · ‎02-12-2019

Thanks! Feel free to make your first comment a separate post and I'll mark it as the answer.

Arti · ‎02-12-2019

You can use the cluster node initiaization script to grab the environment variable DB_CLUSTER_ID. Refer here https://docs.databricks.com/user-guide/clusters/init-scripts.html#environment-variables.

Also from the notebook , the following also works :

spark.conf.get("spark.databricks.clusterUsageTags.clusterName")

spark.conf.get("spark.databricks.clusterUsageTags.clusterId")

The entire list of spark properties is available in UI in Spark UI --> Environment tab.

Hope this helps!

tonyp · ‎02-12-2019

Thank you for your answer.@Arti

EricBellet · ‎10-03-2019

Hi, I'm trying to install a library using the init script option. But the $DB_CLUSTER_ID it is empty, why? Also I tried to do in a Notebook cell a %sh echo $DB_CLUSTER_ID and also is empty

curl -X POST https://dbc-myid.cloud.databricks.com/api/2.0/libraries/install -H 'Cache-Control:no-cache' -H 'Content-Type:application/json' -H 'Authorization: Bearer mytoken' -d '{"cluster_id": "$DB_CLUSTER_ID", "libraries": [{"egg": "dbfs:/FileStore/jars/library-0.1-py3.7.egg"}]}'

Arti · ‎10-07-2019

Are you using cluster init script or global init script ?

DB_CLUSTER_ID - This env variable is available only during the cluster init script execution. I guess you will not be able to get the values for the these init script's env variables if you try from notebook.