cancel
Showing results for 
Search instead for 
Did you mean: 
Data Engineering
Join discussions on data engineering best practices, architectures, and optimization strategies within the Databricks Community. Exchange insights and solutions with fellow data engineers.
cancel
Showing results for 
Search instead for 
Did you mean: 

Job runs on serverless eventhough Job config has cluster definitions

AshMod
Visitor

Hi,

I am defining the job along with job cluster specification using python sdk. But when the job runs it is using the serverless compute, instead of the defined cluster. I can say the job uses serverless from the job_run log and also from the system.billing.usage table SKU used.  

What I am doing wrong here ? How do I create and use the job cluster in the right way. Thanks for looking.

a) current cluster spec 

AshMod_0-1761230207138.png

b) I can see the job defined along with the cluster details, including the node types. 

AshMod_1-1761230258750.png

c) But when I run the job I see that it is using the Serverless compute.

AshMod_2-1761231900872.png

 

 

2 REPLIES 2

ManojkMohan
Honored Contributor

@AshMod 

In your attached image, the compute environment is clearly labeled "Serverless," which confirms that the job is not using a dedicated or job cluster as intended

Use the GET /api/2.0/clusters/get endpoint with your cluster_id to retrieve cluster details.

Inspect the "state" field in the response JSON, which indicates the cluster status such as "RUNNING", "TERMINATED", "PENDING", etc.

If the cluster state is anything other than "RUNNING", it implies the cluster is either stopped or provisioning, which might cause a fallback to serverless compute for jobs.

ManojkMohan_0-1761239374263.png

If you see discrepancies in the system.billing.usage table (like serverless compute being used), that further suggests that the job isn't properly associated with the cluster

Set the Cluster as the Only Compute Option

If the issue persists, ensure no other conflicting compute resources are configured. For example, if you are using multiple compute options, double-check that only the cluster you intend to use is included and confirm in logs

AshMod
Visitor

Thanks for checking @ManojkMohan. I found the issue in the job task definition. There is a job_clusters list in the job definition, where I provide the cluster config details. But this alone is not sufficient to have the task use the cluster. The job_cluster_key will have to be passed to the task as well. 

Otherwise a cluster will be defined, but will not be used. We can notice *Unused* in the second screenshot.

Job.from_dict(
    {
        "name": "copy_ingest_tpch",
        "tasks": [
            {
                "task_key": "copy_ingest",
                "notebook_task": {
                    "notebook_path": os.path.abspath("./transform/copy_ingest"),
                    "source": "WORKSPACE"
                },
                ## this matters to assign the cluster to task.
                "job_cluster_key": job_cluster_key
            }
        ],
        "queue": {
            "enabled": True,
        },
        "job_clusters" : [JobCluster.from_dict(
            {"job_cluster_key": job_cluster_key, 
             "new_cluster":cluster_spec.as_dict()
            }).as_dict() 
        ],

 

Join Us as a Local Community Builder!

Passionate about hosting events and connecting people? Help us grow a vibrant local community—sign up today to get started!

Sign Up Now