04-28-2023 11:45 AM
I'm using Python (as Python wheel application) on Databricks.
I deploy & run my jobs using dbx.
I defined some Databricks Workflow using Python wheel tasks.
Everything is working fine, but I'm having issue to extract "databricks_job_id" & "databricks_run_id" for logging/monitoring purpose.
I'm used to defined {{job_id}} & {{run_id}} as parameter in "Notebook Task" or other task type, its works fine.
But with Python wheel I'm not able to define theses :
With Python wheel task, parameters are basically an array of string :
["/dbfs/Shared/dbx/projects/myproject/66655665aac24e748d4e7b28c6f4d624/artifacts/myparameter.yml","/dbfs/Shared/dbx/projects/myproject/66655665aac24e748d4e7b28c6f4d624/artifacts/conf"]
Adding "{{job_id}}" & "{{run_id}}" in this array doesn't seems to work ...
Do you have any ideas ? Don't want to use any REST API during my workload just to extract theses ids...
I guess that I cannot use dbutils / notebook context to got thoses IDs since I don't use any notebooks ...
05-13-2023 09:53 AM
@Grégoire PORTIER :
You can use the dbutils module to retrieve the job ID and run ID from within your Python wheel application. Here's an example of how you can do this:
from pyspark.sql import SparkSession
import requests
import json
import os
# Get the current SparkSession
spark = SparkSession.builder.getOrCreate()
# Get the Databricks job ID and run ID from the environment variables
job_id = os.environ.get("DATABRICKS_JOB_ID")
run_id = os.environ.get("DATABRICKS_RUN_ID")
# Print the job ID and run ID for logging/monitoring purposes
print(f"Databricks Job ID: {job_id}")
print(f"Databricks Run ID: {run_id}")
You can then add this code to your Python wheel task to extract the job ID and run ID and use them for logging/monitoring purposes.
Note that the environment variables DATABRICKS_JOB_ID and DATABRICKS_RUN_ID are automatically set by Databricks when you run a job, so you don't need to pass them as parameters.
05-23-2023 06:32 AM
Hey Suteja,
Thank you for your response, but unfortunalty, it doesn't work with environment variables.
Got null value for both variable ?
Do you have any idea which DBR should I use ?
Or any documentation about this environment variables ?
Thank you
03-06-2024 08:54 AM
There you can see a complete template project with Databricks Asset Bundles and python wheel task. Please, follow the instructions for deployment.
https://github.com/andre-salvati/databricks-template
In particular, take a look at the workflow definition here.
Join a Regional User Group to connect with local Databricks users. Events will be happening in your city, and you won’t want to miss the chance to attend and share knowledge.
If there isn’t a group near you, start one and help create a community that brings people together.
Request a New Group