- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
02-11-2019 03:43 PM
I am adding Application Insights telemetry to my Databricks jobs and would like to include the cluster ID of the job run. How can I access the cluster id at run time?
The requirement is that my job can programmatically retrieve the cluster id to insert into all telemetry. Retrieving the cluster ID through the UI will not be sufficient.
I don't see any dbutils commands that would be of use.
Accepted Solutions
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
02-12-2019 08:08 PM
You can use the cluster node initiaization script to grab the environment variable DB_CLUSTER_ID. Refer here https://docs.databricks.com/user-guide/clusters/init-scripts.html#environment-variables.
Also from the notebook , the following also works :
spark.conf.get("spark.databricks.clusterUsageTags.clusterName")
spark.conf.get("spark.databricks.clusterUsageTags.clusterId")
The entire list of spark properties is available in UI in Spark UI --> Environment tab.
Hope this helps!
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
02-11-2019 08:31 PM
In Databricks click on your cluster in the Clusters tab, Change the UI interface to json, It will give the all details about your cluster
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
02-12-2019 08:33 AM
Thank you for your answer. I have added more detail to my question. Unfortunately, the UI will not work as I need my job code to programmatically pull the cluster id.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
02-12-2019 10:23 AM
You can use the cluster node initiaization script to grab the environment variable DB_CLUSTER_ID. Refer here https://docs.databricks.com/user-guide/clusters/init-scripts.html#environment-variables. Also from the notebook, the following also works :
spark.conf.get("spark.databricks.clusterUsageTags.clusterName")
spark.conf.get("spark.databricks.clusterUsageTags.clusterId")
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
02-12-2019 10:54 AM
That's perfect! Thank you for your help Arti.
Do you know if the properties of
clusterUsageTabs
are documented anywhere? I wonder what other useful properties I might be able to log.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
02-12-2019 11:00 AM
I am not sure if these are documented. Use the following code to get the entire list:
%scala
val configMap = spark.conf.getAll
configMap.foreach(println)
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
02-12-2019 11:31 AM
Thanks! Feel free to make your first comment a separate post and I'll mark it as the answer.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
02-12-2019 08:08 PM
You can use the cluster node initiaization script to grab the environment variable DB_CLUSTER_ID. Refer here https://docs.databricks.com/user-guide/clusters/init-scripts.html#environment-variables.
Also from the notebook , the following also works :
spark.conf.get("spark.databricks.clusterUsageTags.clusterName")
spark.conf.get("spark.databricks.clusterUsageTags.clusterId")
The entire list of spark properties is available in UI in Spark UI --> Environment tab.
Hope this helps!
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
02-12-2019 08:20 PM
Thank you for your answer.@Arti
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
10-03-2019 09:47 AM
Hi, I'm trying to install a library using the init script option. But the $DB_CLUSTER_ID it is empty, why? Also I tried to do in a Notebook cell a %sh echo $DB_CLUSTER_ID and also is empty
curl -X POST https://dbc-myid.cloud.databricks.com/api/2.0/libraries/install -H 'Cache-Control:no-cache' -H 'Content-Type:application/json' -H 'Authorization: Bearer mytoken' -d '{"cluster_id": "$DB_CLUSTER_ID", "libraries": [{"egg": "dbfs:/FileStore/jars/library-0.1-py3.7.egg"}]}'
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
10-07-2019 04:21 PM
Are you using cluster init script or global init script ?
DB_CLUSTER_ID - This env variable is available only during the cluster init script execution. I guess you will not be able to get the values for the these init script's env variables if you try from notebook.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
10-08-2019 09:06 AM
I fixed, it should be "'$DB_CLUSTER_ID'"

