02-11-2019 03:43 PM
I am adding Application Insights telemetry to my Databricks jobs and would like to include the cluster ID of the job run. How can I access the cluster id at run time?
The requirement is that my job can programmatically retrieve the cluster id to insert into all telemetry. Retrieving the cluster ID through the UI will not be sufficient.
I don't see any dbutils commands that would be of use.
02-12-2019 08:08 PM
You can use the cluster node initiaization script to grab the environment variable DB_CLUSTER_ID. Refer here https://docs.databricks.com/user-guide/clusters/init-scripts.html#environment-variables.
Also from the notebook , the following also works :
spark.conf.get("spark.databricks.clusterUsageTags.clusterName")
spark.conf.get("spark.databricks.clusterUsageTags.clusterId")
The entire list of spark properties is available in UI in Spark UI --> Environment tab.
Hope this helps!
02-11-2019 08:31 PM
02-12-2019 08:33 AM
Thank you for your answer. I have added more detail to my question. Unfortunately, the UI will not work as I need my job code to programmatically pull the cluster id.
02-12-2019 10:23 AM
You can use the cluster node initiaization script to grab the environment variable DB_CLUSTER_ID. Refer here https://docs.databricks.com/user-guide/clusters/init-scripts.html#environment-variables. Also from the notebook, the following also works :
spark.conf.get("spark.databricks.clusterUsageTags.clusterName")
spark.conf.get("spark.databricks.clusterUsageTags.clusterId")
02-12-2019 10:54 AM
That's perfect! Thank you for your help Arti.
Do you know if the properties of
clusterUsageTabs
are documented anywhere? I wonder what other useful properties I might be able to log.
02-12-2019 11:00 AM
I am not sure if these are documented. Use the following code to get the entire list:
%scala
val configMap = spark.conf.getAll
configMap.foreach(println)
02-12-2019 11:31 AM
Thanks! Feel free to make your first comment a separate post and I'll mark it as the answer.
02-12-2019 08:08 PM
You can use the cluster node initiaization script to grab the environment variable DB_CLUSTER_ID. Refer here https://docs.databricks.com/user-guide/clusters/init-scripts.html#environment-variables.
Also from the notebook , the following also works :
spark.conf.get("spark.databricks.clusterUsageTags.clusterName")
spark.conf.get("spark.databricks.clusterUsageTags.clusterId")
The entire list of spark properties is available in UI in Spark UI --> Environment tab.
Hope this helps!
02-12-2019 08:20 PM
Thank you for your answer.@Arti
10-03-2019 09:47 AM
Hi, I'm trying to install a library using the init script option. But the $DB_CLUSTER_ID it is empty, why? Also I tried to do in a Notebook cell a %sh echo $DB_CLUSTER_ID and also is empty
curl -X POST https://dbc-myid.cloud.databricks.com/api/2.0/libraries/install -H 'Cache-Control:no-cache' -H 'Content-Type:application/json' -H 'Authorization: Bearer mytoken' -d '{"cluster_id": "$DB_CLUSTER_ID", "libraries": [{"egg": "dbfs:/FileStore/jars/library-0.1-py3.7.egg"}]}'
10-07-2019 04:21 PM
Are you using cluster init script or global init script ?
DB_CLUSTER_ID - This env variable is available only during the cluster init script execution. I guess you will not be able to get the values for the these init script's env variables if you try from notebook.
10-08-2019 09:06 AM
I fixed, it should be "'$DB_CLUSTER_ID'"
Join a Regional User Group to connect with local Databricks users. Events will be happening in your city, and you won’t want to miss the chance to attend and share knowledge.
If there isn’t a group near you, start one and help create a community that brings people together.
Request a New Group