cancel
Showing results forย 
Search instead forย 
Did you mean:ย 
Data Engineering
Join discussions on data engineering best practices, architectures, and optimization strategies within the Databricks Community. Exchange insights and solutions with fellow data engineers.
cancel
Showing results forย 
Search instead forย 
Did you mean:ย 

Timeout for dbutils.jobs.taskValues.set(key, value)

novytskyi
New Contributor

I have a job that call notebook with dbutils.jobs.taskValues.set(key, value) method and assigns around 20 parameters.

When I run it - it works.

But when I try to call 2 or more copies of a job with different parameters - it fails with error on different parts of dbutils.jobs.taskValues.set(key, value

An error occurred while calling o366.setJson. : org.apache.http.conn.ConnectTimeoutException: Connect to us-central1.gcp.databricks.com:443 [us-central1.gcp.databricks.com/xx.xx.xx.xx] failed: connect timed out at org.apache.http.impl.conn.DefaultHttpClientConnectionOperator.connect(DefaultHttpClientConnectionOperator.java:151) at org.apache.http.impl.conn.PoolingHttpClientConnectionManager.connect(PoolingHttpClientConnectionManager.java:376) at org.apache.http.impl.execchain.MainClientExec.establishRoute(MainClientExec.java:393) at org.apache.http.impl.execchain.MainClientExec.execute(MainClientExec.java:236) at org.apache.http.impl.execchain.ProtocolExec.execute(ProtocolExec.java:186) at org.apache.http.impl.execchain.RetryExec.execute(RetryExec.java:89) at org.apache.http.impl.execchain.RedirectExec.execute(RedirectExec.java:110) at org.apache.http.impl.client.InternalHttpClient.doExecute(InternalHttpClient.java:185) at org.apache.http.impl.client.CloseableHttpClient.execute(CloseableHttpClient.java:72) at com.databricks.common.client.RawDBHttpClient.$anonfun$httpRequestInternal$1(DBHttpClient.scala:1203) at com.databricks.logging.UsageLogging.$anonfun$recordOperation$1(UsageLogging.scala:582) at com.databricks.logging.UsageLogging.executeThunkAndCaptureResultTags$1(UsageLogging.scala:685) at com.databricks.logging.UsageLogging.$anonfun$recordOperationWithResultTags$4(UsageLogging.scala:703) at com.databricks.logging.UsageLogging.$anonfun$withAttributionContext$1(UsageLogging.scala:435) at scala.util.DynamicVariable.withValue(DynamicVariable.scala:62) at com.databricks.logging.AttributionContext$.withValue(AttributionContext.scala:216) at com.databricks.logging.UsageLogging.withAttributionContext(UsageLogging.scala:433) at com.databricks.logging.UsageLogging.withAttributionContext$(UsageLogging.scala:427) at com.databricks.common.client.RawDBHttpClient.withAttributionContext(DBHttpClient.scala:603) at com.databricks.logging.UsageLogging.withAttributionTags(UsageLogging.scala:481) at com.databricks.logging.UsageLogging.withAttributionTags$(UsageLogging.scala:464) at com.databricks.common.client.RawDBHttpClient.withAttributionTags(DBHttpClient.scala:603) at com.databricks.logging.UsageLogging.recordOperationWithResultTags(UsageLogging.scala:680) at com.databricks.logging.UsageLogging.recordOperationWithResultTags$(UsageLogging.scala:591) at com.databricks.common.client.RawDBHttpClient.recordOperationWithResultTags(DBHttpClient.scala:603) at com.databricks.logging.UsageLogging.recordOperation(UsageLogging.scala:582) at com.databricks.logging.UsageLogging.recordOperation$(UsageLogging.scala:551) at com.databricks.common.client.RawDBHttpClient.recordOperation(DBHttpClient.scala:603) at com.databricks.common.client.RawDBHttpClient.httpRequestInternal(DBHttpClient.scala:1189) at com.databricks.common.client.RawDBHttpClient.entityEnclosingRequestInternal(DBHttpClient.scala:1178) at com.databricks.common.client.RawDBHttpClient.postInternal(DBHttpClient.scala:1062) at com.databricks.common.client.RawDBHttpClient.postJson(DBHttpClient.scala:757) at com.databricks.common.client.DBHttpClient.postJson(DBHttpClient.scala:574) at com.databricks.workflow.SimpleJobsSessionClient.setTaskValue(JobsSessionClient.scala:244) at com.databricks.workflow.ReliableJobsSessionClient.$anonfun$setTaskValue$1(JobsSessionClient.scala:438) at scala.runtime.java8.JFunction0$mcV$sp.apply(JFunction0$mcV$sp.java:23) at com.databricks.common.client.DBHttpClient$.retryWithDeadline(DBHttpClient.scala:375) at com.databricks.workflow.ReliableJobsSessionClient.withRetry(JobsSessionClient.scala:401) at com.databricks.workflow.ReliableJobsSessionClient.setTaskValue(JobsSessionClient.scala:438) at com.databricks.workflow.WorkflowDriver.setTaskValue(WorkflowDriver.scala:52) at com.databricks.dbutils_v1.impl.TaskValuesUtilsImpl.setJson(TaskValuesUtilsImpl.scala:49) at sun.reflect.GeneratedMethodAccessor230.invoke(Unknown Source) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:498) at py4j.reflection.MethodInvoker.invoke(MethodInvoker.java:244) at py4j.reflection.ReflectionEngine.invoke(ReflectionEngine.java:397) at py4j.Gateway.invoke(Gateway.java:306) at py4j.commands.AbstractCommand.invokeMethod(AbstractCommand.java:132) at py4j.commands.CallCommand.execute(CallCommand.java:79) at py4j.ClientServerConnection.waitForCommands(ClientServerConnection.java:199) at py4j.ClientServerConnection.run(ClientServerConnection.java:119) at java.lang.Thread.run(Thread.java:750) Caused by: java.net.SocketTimeoutException: connect timed out at java.net.PlainSocketImpl.socketConnect(Native Method) at java.net.AbstractPlainSocketImpl.doConnect(AbstractPlainSocketImpl.java:350) at java.net.AbstractPlainSocketImpl.connectToAddress(AbstractPlainSocketImpl.java:206) at java.net.AbstractPlainSocketImpl.connect(AbstractPlainSocketImpl.java:188) at java.net.SocksSocketImpl.connect(SocksSocketImpl.java:392) at java.net.Socket.connect(Socket.java:613) at org.apache.http.conn.ssl.SSLConnectionSocketFactory.connectSocket(SSLConnectionSocketFactory.java:368) at org.apache.http.impl.conn.DefaultHttpClientConnectionOperator.connect(DefaultHttpClientConnectionOperator.java:142) ... 51 more

1 REPLY 1

mark_ott
Databricks Employee
Databricks Employee

The error you are encountering when running multiple simultaneous Databricks jobs using dbutils.jobs.taskValues.set(key, value) indicates a connection timeout issue to the Databricks backend API (connect timed out at ...us-central1.gcp.databricks.com:443) rather than a problem with your code or parameters specifically.

What This Error Means

  • The ConnectTimeoutException occurs when a network connection to the Databricks workspace API cannot be established within the allocated time.

  • When you launch several copies of the job at once (especially with many parameters), each job independently tries to communicate with the Databricks API. If there are too many simultaneous requests, they can overwhelm available network resources, Databricks API rate limits, or hit concurrency limits, leading to timeout errors.

Why Does It Work with One Job, But Not Many?

  • A single job doesn't stress your Databricks workspace's API/network resources.

  • Multiple jobs running in parallelโ€”even if each sets only a few parametersโ€”significantly increase the number of HTTP requests to Databricks at once, making timeouts more likely.

How To Fix & Troubleshoot

1. Stagger Job Launches

  • Instead of starting all job runs simultaneously, try launching them in batches with a slight delay, allowing resources and connections to recover between launches.

2. Reduce API Calls

  • Limit the number of calls to dbutils.jobs.taskValues.setโ€”combine related values into a single data structure (e.g., a dictionary) and pass them all at once, reducing overall API traffic.

3. Resource and Quota Check

  • Check workspace resource quotas, API rate limits, and concurrent job run limits on your Databricks workspace. Databricks enforces limits per workspace โ€” review your cluster and workspace quotas and request an increase if needed.

  • Ensure the cluster itself has enough network bandwidth.

4. Network Troubleshooting

  • Ensure no network bottlenecks exist between your cluster and the Databricks control plane. If running on a secure network, test public access, VPN latency, or firewall rules.

5. Increase Timeout

  • If your logic allows, increase the connection/HTTP timeout settings, if applicable, though Databricks default timeouts are intended to ensure stability.

6. Retry Logic

  • Implement robust retry logic for failed API calls. Some Databricks SDKs and APIs offer automatic retries for transient errors.

7. Databricks Support/Docs

  • If this persists, collect all error logs and submit a case to Databricks supportโ€”as this may indicate a workspace-specific networking or control plane issue not solvable by code changes.

Summary Table

Potential Cause Resolution Step
API concurrency/rate limits Stagger jobs, batch parameters, check quotas
Network bottlenecks Review cluster/network configuration
Workspace resource limits Request workspace/cluster limits increase
Excessive API calls Reduce/aggregate parameters per call
Transient/timeout error Add retry logic, increase timeouts
 
 

This problem is common when scaling up Databricks job orchestration and typically relates to workspace or network limitations, not the correctness of the underlying application code.