cancel
Showing results for 
Search instead for 
Did you mean: 
Data Engineering
Join discussions on data engineering best practices, architectures, and optimization strategies within the Databricks Community. Exchange insights and solutions with fellow data engineers.
cancel
Showing results for 
Search instead for 
Did you mean: 

Databricks Tasks Python wheel : How access to JobID & runID ?

GGG_P
New Contributor III

I'm using Python (as Python wheel application) on Databricks.

I deploy & run my jobs using dbx.

I defined some Databricks Workflow using Python wheel tasks.

Everything is working fine, but I'm having issue to extract "databricks_job_id" & "databricks_run_id" for logging/monitoring purpose.

I'm used to defined {{job_id}} & {{run_id}} as parameter in "Notebook Task" or other task type, its works fine.

But with Python wheel I'm not able to define theses :

With Python wheel task, parameters are basically an array of string :

["/dbfs/Shared/dbx/projects/myproject/66655665aac24e748d4e7b28c6f4d624/artifacts/myparameter.yml","/dbfs/Shared/dbx/projects/myproject/66655665aac24e748d4e7b28c6f4d624/artifacts/conf"]

Adding "{{job_id}}" & "{{run_id}}" in this array doesn't seems to work ...

Do you have any ideas ? Don't want to use any REST API during my workload just to extract theses ids...

I guess that I cannot use dbutils / notebook context to got thoses IDs since I don't use any notebooks ...

3 REPLIES 3

Anonymous
Not applicable

@Grégoire PORTIER​ :

You can use the dbutils module to retrieve the job ID and run ID from within your Python wheel application. Here's an example of how you can do this:

from pyspark.sql import SparkSession
import requests
import json
import os
 
# Get the current SparkSession
spark = SparkSession.builder.getOrCreate()
 
# Get the Databricks job ID and run ID from the environment variables
job_id = os.environ.get("DATABRICKS_JOB_ID")
run_id = os.environ.get("DATABRICKS_RUN_ID")
 
# Print the job ID and run ID for logging/monitoring purposes
print(f"Databricks Job ID: {job_id}")
print(f"Databricks Run ID: {run_id}")

You can then add this code to your Python wheel task to extract the job ID and run ID and use them for logging/monitoring purposes.

Note that the environment variables DATABRICKS_JOB_ID and DATABRICKS_RUN_ID are automatically set by Databricks when you run a job, so you don't need to pass them as parameters.

GGG_P
New Contributor III

Hey Suteja,

Thank you for your response, but unfortunalty, it doesn't work with environment variables.

Got null value for both variable ?

Do you have any idea which DBR should I use ?

Or any documentation about this environment variables ?

Thank you

AndréSalvati
New Contributor III

There you can see a complete template project with Databricks Asset Bundles and python wheel task. Please, follow the instructions for deployment.

https://github.com/andre-salvati/databricks-template

In particular, take a look at the workflow definition here.

Connect with Databricks Users in Your Area

Join a Regional User Group to connect with local Databricks users. Events will be happening in your city, and you won’t want to miss the chance to attend and share knowledge.

If there isn’t a group near you, start one and help create a community that brings people together.

Request a New Group