cancel
Showing results for 
Search instead for 
Did you mean: 
Data Engineering
Join discussions on data engineering best practices, architectures, and optimization strategies within the Databricks Community. Exchange insights and solutions with fellow data engineers.
cancel
Showing results for 
Search instead for 
Did you mean: 

Databricks Tasks Python wheel : How access to JobID & runID ?

GGG_P
New Contributor III

I'm using Python (as Python wheel application) on Databricks.

I deploy & run my jobs using dbx.

I defined some Databricks Workflow using Python wheel tasks.

Everything is working fine, but I'm having issue to extract "databricks_job_id" & "databricks_run_id" for logging/monitoring purpose.

I'm used to defined {{job_id}} & {{run_id}} as parameter in "Notebook Task" or other task type, its works fine.

But with Python wheel I'm not able to define theses :

With Python wheel task, parameters are basically an array of string :

["/dbfs/Shared/dbx/projects/myproject/66655665aac24e748d4e7b28c6f4d624/artifacts/myparameter.yml","/dbfs/Shared/dbx/projects/myproject/66655665aac24e748d4e7b28c6f4d624/artifacts/conf"]

Adding "{{job_id}}" & "{{run_id}}" in this array doesn't seems to work ...

Do you have any ideas ? Don't want to use any REST API during my workload just to extract theses ids...

I guess that I cannot use dbutils / notebook context to got thoses IDs since I don't use any notebooks ...

3 REPLIES 3

Anonymous
Not applicable

@Grégoire PORTIER​ :

You can use the dbutils module to retrieve the job ID and run ID from within your Python wheel application. Here's an example of how you can do this:

from pyspark.sql import SparkSession
import requests
import json
import os
 
# Get the current SparkSession
spark = SparkSession.builder.getOrCreate()
 
# Get the Databricks job ID and run ID from the environment variables
job_id = os.environ.get("DATABRICKS_JOB_ID")
run_id = os.environ.get("DATABRICKS_RUN_ID")
 
# Print the job ID and run ID for logging/monitoring purposes
print(f"Databricks Job ID: {job_id}")
print(f"Databricks Run ID: {run_id}")

You can then add this code to your Python wheel task to extract the job ID and run ID and use them for logging/monitoring purposes.

Note that the environment variables DATABRICKS_JOB_ID and DATABRICKS_RUN_ID are automatically set by Databricks when you run a job, so you don't need to pass them as parameters.

GGG_P
New Contributor III

Hey Suteja,

Thank you for your response, but unfortunalty, it doesn't work with environment variables.

Got null value for both variable ?

Do you have any idea which DBR should I use ?

Or any documentation about this environment variables ?

Thank you

AndréSalvati
New Contributor III

There you can see a complete template project with Databricks Asset Bundles and python wheel task. Please, follow the instructions for deployment.

https://github.com/andre-salvati/databricks-template

In particular, take a look at the workflow definition here.

Join 100K+ Data Experts: Register Now & Grow with Us!

Excited to expand your horizons with us? Click here to Register and begin your journey to success!

Already a member? Login and join your local regional user group! If there isn’t one near you, fill out this form and we’ll create one for you to join!