cancel
Showing results for 
Search instead for 
Did you mean: 
Data Engineering
cancel
Showing results for 
Search instead for 
Did you mean: 

Databricks Tasks Python wheel : How access to JobID & runID ?

GGG_P
New Contributor III

I'm using Python (as Python wheel application) on Databricks.

I deploy & run my jobs using dbx.

I defined some Databricks Workflow using Python wheel tasks.

Everything is working fine, but I'm having issue to extract "databricks_job_id" & "databricks_run_id" for logging/monitoring purpose.

I'm used to defined {{job_id}} & {{run_id}} as parameter in "Notebook Task" or other task type, its works fine.

But with Python wheel I'm not able to define theses :

With Python wheel task, parameters are basically an array of string :

["/dbfs/Shared/dbx/projects/myproject/66655665aac24e748d4e7b28c6f4d624/artifacts/myparameter.yml","/dbfs/Shared/dbx/projects/myproject/66655665aac24e748d4e7b28c6f4d624/artifacts/conf"]

Adding "{{job_id}}" & "{{run_id}}" in this array doesn't seems to work ...

Do you have any ideas ? Don't want to use any REST API during my workload just to extract theses ids...

I guess that I cannot use dbutils / notebook context to got thoses IDs since I don't use any notebooks ...

3 REPLIES 3

Anonymous
Not applicable

@Grégoire PORTIER​ :

You can use the dbutils module to retrieve the job ID and run ID from within your Python wheel application. Here's an example of how you can do this:

from pyspark.sql import SparkSession
import requests
import json
import os
 
# Get the current SparkSession
spark = SparkSession.builder.getOrCreate()
 
# Get the Databricks job ID and run ID from the environment variables
job_id = os.environ.get("DATABRICKS_JOB_ID")
run_id = os.environ.get("DATABRICKS_RUN_ID")
 
# Print the job ID and run ID for logging/monitoring purposes
print(f"Databricks Job ID: {job_id}")
print(f"Databricks Run ID: {run_id}")

You can then add this code to your Python wheel task to extract the job ID and run ID and use them for logging/monitoring purposes.

Note that the environment variables DATABRICKS_JOB_ID and DATABRICKS_RUN_ID are automatically set by Databricks when you run a job, so you don't need to pass them as parameters.

GGG_P
New Contributor III

Hey Suteja,

Thank you for your response, but unfortunalty, it doesn't work with environment variables.

Got null value for both variable ?

Do you have any idea which DBR should I use ?

Or any documentation about this environment variables ?

Thank you

AndréSalvati
New Contributor III

There you can see a complete template project with Databricks Asset Bundles and python wheel task. Please, follow the instructions for deployment.

https://github.com/andre-salvati/databricks-template

In particular, take a look at the workflow definition here.

Welcome to Databricks Community: Lets learn, network and celebrate together

Join our fast-growing data practitioner and expert community of 80K+ members, ready to discover, help and collaborate together while making meaningful connections. 

Click here to register and join today! 

Engage in exciting technical discussions, join a group with your peers and meet our Featured Members.