Hello,
I am writing to bring to your attention an issue that we have encountered while working with Databricks and seek your assistance in resolving it.
When running a Job of Workflow with the task "Run Job" and clicking on "View YAML/JSON," we have observed that the parameter of the run_job_task, specifically the job_id, is being versioned. However, we have encountered difficulties when attempting to deploy in different environments, such as "stage" and "production." In these instances, the id is loaded with the value from the laboratory of course, and it’s causing errors when trying to create the job in another envinroments with tools like Terraform or Databricks Assets Bundle, because these jobs maybe exists or not (it will be created if not exist), but the job_id will always be different in different environments:
To perform exactly these actions, run the following command to apply:
terraform apply "prod.plan"
Error: cannot create job: Job 902577056531277 does not exist.
with module.databricks_workflow_job_module["job_father_one"].databricks_job.main,
on modules/databricks_workflow_job/main.tf line 7, in resource "databricks_job" "main":
Error: cannot create job: Job 1068053310953144 does not exist.
with module.databricks_workflow_job_module["job_father_two"].databricks_job.main,
on modules/databricks_workflow_job/main.tf line 7, in resource "databricks_job" "main":
##[error]Bash exited with code '1'.
In this case, the jobs 902577056531277 and 1068053310953144 does not exist in stage and production envinroments. So, in this way, we need to submit one sequential pull request and merge for each layer of "Run Job" task, changing the job_id accordingly to the correct job_id of that job in each environment, which is not an optimal approach.
To address this issue, we propose an alternative approach. Instead of versioning and referencing jobs in the "Run Job" task using job_id, we suggest versioning based on the job_name:
{
"name": "job_father_one",
"email_notifications": {},
...
"tasks": [
{
"task_key": "job_father_one",
"run_if": "ALL_SUCCESS",
"run_job_task": {
"job_name": "job_child_one"
},
"timeout_seconds": 0,
"email_notifications": {},
"notification_settings": {}
},
{
"task_key": "job_father_two",
"run_if": "ALL_SUCCESS",
"run_job_task": {
"job_name": "job_child_two"
},
"timeout_seconds": 0,
"email_notifications": {},
"notification_settings": {}
}
],
"tags": {},
"run_as": {
"user_name": "test@test.com"
}
}
Is that possible? In this way, we don't need to take care with the job_id when sending to stage and production envinroments, because it will make the reference with another jobs by their names, ensuring a smoother experience across different environments.
Thank you for your time and assistance.
Best regards,
Harlem Muniz.