cancel
Showing results for 
Search instead for 
Did you mean: 
Data Engineering
Join discussions on data engineering best practices, architectures, and optimization strategies within the Databricks Community. Exchange insights and solutions with fellow data engineers.
cancel
Showing results for 
Search instead for 
Did you mean: 

Issue with Job Versioning with “Run Job” tasks and Deployments between envinronments

harlemmuniz
New Contributor II

Hello,

I am writing to bring to your attention an issue that we have encountered while working with Databricks and seek your assistance in resolving it.

When running a Job of Workflow with the task "Run Job" and clicking on "View YAML/JSON," we have observed that the parameter of the run_job_task, specifically the job_id, is being versioned. However, we have encountered difficulties when attempting to deploy in different environments, such as "stage" and "production." In these instances, the id is loaded with the value from the laboratory of course, and it’s causing errors when trying to create the job in another envinroments with tools like Terraform or Databricks Assets Bundle, because these jobs maybe exists or not (it will be created if not exist), but the job_id will always be different in different environments:

To perform exactly these actions, run the following command to apply:
    terraform apply "prod.plan"

Error: cannot create job: Job 902577056531277 does not exist.

  with module.databricks_workflow_job_module["job_father_one"].databricks_job.main,
       on modules/databricks_workflow_job/main.tf line 7, in resource "databricks_job" "main":

Error: cannot create job: Job 1068053310953144 does not exist.

  with module.databricks_workflow_job_module["job_father_two"].databricks_job.main,
       on modules/databricks_workflow_job/main.tf line 7, in resource "databricks_job" "main":

##[error]Bash exited with code '1'.

In this case, the jobs 902577056531277 and 1068053310953144 does not exist in stage and production envinroments. So, in this way, we need to submit one sequential pull request and merge for each layer of "Run Job" task, changing the job_id accordingly to the correct job_id of that job in each environment, which is not an optimal approach.

To address this issue, we propose an alternative approach. Instead of versioning and referencing jobs in the "Run Job" task using job_id, we suggest versioning based on the job_name:

{
  "name": "job_father_one",
  "email_notifications": {},
  ...
  "tasks": [
    {
      "task_key": "job_father_one",
      "run_if": "ALL_SUCCESS",
      "run_job_task": {
        "job_name": "job_child_one"
      },
      "timeout_seconds": 0,
      "email_notifications": {},
      "notification_settings": {}
    },
    {
      "task_key": "job_father_two",
      "run_if": "ALL_SUCCESS",
      "run_job_task": {
        "job_name": "job_child_two"
      },
      "timeout_seconds": 0,
      "email_notifications": {},
      "notification_settings": {}
    }
  ],
  "tags": {},
  "run_as": {
    "user_name": "test@test.com"
  }
}

Is that possible? In this way, we don't need to take care with the job_id when sending to stage and production envinroments, because it will make the reference with another jobs by their names, ensuring a smoother experience across different environments.

Thank you for your time and assistance.

Best regards,

Harlem Muniz.

8 REPLIES 8

harlemmuniz
New Contributor II

Hi @Retired_mod, thank you for your fast response.

However, the versioned JSON or YAML (via Databricks Asset Bundle) in the Job UI should also include the job_name, or we have to change it manually by replacing the job_id with the job_name. For this reason, I didn't open an issue on the Databricks Terraform provider GitHub. I genuinely believe Databricks should make the following changes:

  1. When clicking on View YAML/JSON, the parameter of the key run_job_task should be changed from job_id to job_name, allowing us to copy the JSON/YAML without needing manual adjustments.
  2. The Terraform provider should accept job_name as a reference to another job.
  3. Databricks Asset Bundle should also accept job_name as a reference to another job.

What do you think I should do? Should I open an issue on the terraform-provider-databricks GitHub? Is there anything else I must to do?

Let me know, and I'll take the necessary steps.

 

sid_001
New Contributor II

Hi, is there any update on above issue.

pipelinebuilder
New Contributor II

Hi,

I have the same problem, I don't understand how I am supposed to use job_id in a good way to create my terraform files. Could you please provide an update, or at least a workaround?

saurabh18cs
Valued Contributor III

Hi , Sorry if I don't understand your usecase, are your trying to start/stop databricks job via terraform? for this reason do you want to harcode job-id??

sid_001
New Contributor II

Hi @saurabh18cs ,

In my case we are generating Databricks jobs through Terraform. And for job details we are passing JSON files. We deploy the same JSON in different environments like dev, sit and uat.

But when we have run_job task, it requires the job_id of the Databricks job, and in each environment the same job_name will have the different job_id so here is the issue.

Example:

{
  "name": "RUN_JOB_TEST",
  "email_notifications": {
    "no_alert_for_skipped_runs": false
  },
  "webhook_notifications": {},
  "timeout_seconds": 0,
  "max_concurrent_runs": 1,
  "tasks": [
    {
      "task_key": "RUN_JOB_TEST",
      "run_if": "ALL_SUCCESS",
      "run_job_task": {
        "job_id": 370187610293026
      },
      "timeout_seconds": 0,
      "email_notifications": {}
    }
  ],
  "queue": {
    "enabled": true
  },
  "run_as": {
    "user_name": "abc@xyz.com"
  }
}

Now, this configuration will works fine in dev environment but when we deploy the same json in sit, it fails as the job_id value is incorrect. 

 

saurabh18cs
Valued Contributor III

 

Hi @sid_001 why you need to hardcode job_id anyways to run a task. you shouldn't be specifying any job_id in your json files either. This should be done by job_name and job_id will be autogenerated.

sid_001
New Contributor II

Hi,
In Terraform there is no attribute job_name when we have task type run_job.

https://registry.terraform.io/providers/databricks/databricks/latest/docs/resources/job

run_job_task Configuration Block

  • job_id - (Required)(String) ID of the job
  • job_parameters - (Optional)(Map) Job parameters for the task

    The JSON was extracted from Databricks UI, we are following process where developer creates the workflow on UI and then we extract the JSON file from UI and deploy in higher environments.




saurabh18cs
Valued Contributor III

you can handle this with databricks cli and adding null resource to your terraform.

add following to your devops pipeline:

# databricks cli is needed to run local-exec inside terraform
- script: |
    python3.6 -m pip install databricks-cli --user
  displayName: Install databricks cli
 
add following to your terraform ( match it to your terraform guidelines):
# Define the job names
locals {
  job_names = ["job_name_1", "job_name_2"]
}
 
resource "databricks_token" "this" {
  comment  = "Terraform Provisioning"
}
 
resource "null_resource" "start_stop_existing_job" {
  for_each toset(local.job_names)

  provisioner "local-exec" {
    command = <<-EOT
      echo "Running command to stop job: $JOB_ID"
      run_id=$($HOME/.local/bin/databricks runs list --active-only --job-id $JOB_ID | cut -f1 -d' ')
      echo "Found run_id: $run_id"
      if [ -n "$run_id" ]; then
        echo "Cancelling run with run_id: $run_id"
        $HOME/.local/bin/databricks runs cancel --run-id $run_id
      else
        echo "No active runs found for job $JOB_ID"
      fi

      echo "Re-running job with job_id: $JOB_ID"
      EOF
      sleep 30
      $HOME/.local/bin/databricks jobs run-now --job-id $JOB_ID
      EOT
    interpreter = ["bash", "-c"]
    environment = {
      DATABRICKS_HOST   = "https://${data.azurerm_databricks_workspace.this.workspace_url}"
      DATABRICKS_TOKEN  = databricks_token.this.token_value
      JOB_ID            = databricks_job.this[each.key].id
    }
  }

  triggers = {
    always_run = timestamp()
  }

  depends_on = [
    databricks_job.this,
    databricks_token.this
  ]
}
 
TRY and let me know your results, Thanks

Connect with Databricks Users in Your Area

Join a Regional User Group to connect with local Databricks users. Events will be happening in your city, and you won’t want to miss the chance to attend and share knowledge.

If there isn’t a group near you, start one and help create a community that brings people together.

Request a New Group