Databricks Community

Kishor · ‎05-10-2024

Hi Databricks Support,

I'm encountering an issue with creating and running jobs on Databricks. Here are the details:

Problem Description:
When attempting to create and run a job using the old JSON (which was successfully used to create and run jobs using the old Databricks CLI version 0.17.8), I encountered an error. Although job creation was successful, running the job resulted in the following error: "Error: No task is specified."

Steps Taken:
Created a job using the old JSON file with the command:

databricks jobs create --json @sample.json

Job creation was successful, but running the job resulted in an error.

Updated the JSON file based on a sample from the Databricks GitHub repository and tried creating and running the job again. This time, both job creation and job run commands worked fine.

However, I encountered another error when attempting to retrieve the run output:

databricks jobs get-run-output 89359307425900

The error message received was: "Error: Retrieving the output of runs with multiple tasks is not supported. Please retrieve the output of each individual task run instead."

JSON Details:

Initial JSON (used for job creation with old Databricks CLI v0.17.8):

{
"libraries": [],
"name": "nabu-sparkbot-custom-code-arg-test",
"max_concurrent_runs": 1,
"timeout_seconds": 259200,
"access_control_list": [],
"notebook_task": {
"notebook_path": "/dbfs/tmp/sample/sample.py"
},
"new_cluster": {
"spark_version": "10.4.x-scala2.12",
"node_type_id": "Standard_DS3_v2",
"enable_elastic_disk": true,
"num_workers": 2,
"spark_conf": {
"spark.dynamicAllocation.enabled": "false"
},
"runtime_engine": "STANDARD"
}
}

Updated JSON (used for successful job creation and run):

{
"name": "nabu-sparkbot-custom-code-arg-test",
"tasks": [
{
"job_cluster_key": "create-job-without-workers-cluster",
"task_key": "create-job-without-workers-cluster1",
"libraries": [],
"max_concurrent_runs": 1,
"timeout_seconds": 259200,
"notebook_task": {
"notebook_path": "/dbfs/tmp/sample/sample.py"
}
}
],
"job_clusters": [
{
"job_cluster_key": "create-job-without-workers-cluster",
"new_cluster": {
"spark_version": "10.4.x-scala2.12",
"node_type_id": "Standard_DS3_v2",
"enable_elastic_disk": true,
"num_workers": 2,
"spark_conf": {
"spark.dynamicAllocation.enabled": "false"
},
"runtime_engine": "STANDARD"
}
}
]
}

Request:
Could you please assist in resolving the issue with job creation and retrieval of run output? Additionally, any guidance on creating and running individual task runs would be greatly appreciated.

Thank you for your assistance.

Best regards,
kishor.chintanpalli@modak.com

Kishor · ‎06-02-2024

Hi @Retired_mod ,

Thanks for the reference links for the solution.

I found the solution mentioned in this https://github.com/databricks/databricks-sdk-go/discussions/384 GitHub. By using the get-run API, I was able to retrieve the running status of my job along with a detailed description.

Following the guidance and using the `get-run option, I managed to get the job running status and description successfully.

Databricks Community

Issue with Creating and Running Databricks Jobs with new databricks cli v0.214.0

Connect with Databricks Users in Your Area

Introducing SAP Databricks

Serverless Compute for Notebooks, Workflows and Pipelines is now Generally Available on Google Cloud

Welcoming BladeBridge to Databricks: Accelerating Data Warehouse Migrations to Lakehouse

Databricks Clean Rooms: Now Generally Available on AWS and Azure

Securely share data, analytics and AI