Timeout for dbutils.jobs.taskValues.set(key, value)
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
08-14-2024 07:24 AM
I have a job that call notebook with dbutils.jobs.taskValues.set(key, value) method and assigns around 20 parameters.
When I run it - it works.
But when I try to call 2 or more copies of a job with different parameters - it fails with error on different parts of dbutils.jobs.taskValues.set(key, value)
An error occurred while calling o366.setJson. : org.apache.http.conn.ConnectTimeoutException: Connect to us-central1.gcp.databricks.com:443 [us-central1.gcp.databricks.com/xx.xx.xx.xx] failed: connect timed out at org.apache.http.impl.conn.DefaultHttpClientConnectionOperator.connect(DefaultHttpClientConnectionOperator.java:151) at org.apache.http.impl.conn.PoolingHttpClientConnectionManager.connect(PoolingHttpClientConnectionManager.java:376) at org.apache.http.impl.execchain.MainClientExec.establishRoute(MainClientExec.java:393) at org.apache.http.impl.execchain.MainClientExec.execute(MainClientExec.java:236) at org.apache.http.impl.execchain.ProtocolExec.execute(ProtocolExec.java:186) at org.apache.http.impl.execchain.RetryExec.execute(RetryExec.java:89) at org.apache.http.impl.execchain.RedirectExec.execute(RedirectExec.java:110) at org.apache.http.impl.client.InternalHttpClient.doExecute(InternalHttpClient.java:185) at org.apache.http.impl.client.CloseableHttpClient.execute(CloseableHttpClient.java:72) at com.databricks.common.client.RawDBHttpClient.$anonfun$httpRequestInternal$1(DBHttpClient.scala:1203) at com.databricks.logging.UsageLogging.$anonfun$recordOperation$1(UsageLogging.scala:582) at com.databricks.logging.UsageLogging.executeThunkAndCaptureResultTags$1(UsageLogging.scala:685) at com.databricks.logging.UsageLogging.$anonfun$recordOperationWithResultTags$4(UsageLogging.scala:703) at com.databricks.logging.UsageLogging.$anonfun$withAttributionContext$1(UsageLogging.scala:435) at scala.util.DynamicVariable.withValue(DynamicVariable.scala:62) at com.databricks.logging.AttributionContext$.withValue(AttributionContext.scala:216) at com.databricks.logging.UsageLogging.withAttributionContext(UsageLogging.scala:433) at com.databricks.logging.UsageLogging.withAttributionContext$(UsageLogging.scala:427) at com.databricks.common.client.RawDBHttpClient.withAttributionContext(DBHttpClient.scala:603) at com.databricks.logging.UsageLogging.withAttributionTags(UsageLogging.scala:481) at com.databricks.logging.UsageLogging.withAttributionTags$(UsageLogging.scala:464) at com.databricks.common.client.RawDBHttpClient.withAttributionTags(DBHttpClient.scala:603) at com.databricks.logging.UsageLogging.recordOperationWithResultTags(UsageLogging.scala:680) at com.databricks.logging.UsageLogging.recordOperationWithResultTags$(UsageLogging.scala:591) at com.databricks.common.client.RawDBHttpClient.recordOperationWithResultTags(DBHttpClient.scala:603) at com.databricks.logging.UsageLogging.recordOperation(UsageLogging.scala:582) at com.databricks.logging.UsageLogging.recordOperation$(UsageLogging.scala:551) at com.databricks.common.client.RawDBHttpClient.recordOperation(DBHttpClient.scala:603) at com.databricks.common.client.RawDBHttpClient.httpRequestInternal(DBHttpClient.scala:1189) at com.databricks.common.client.RawDBHttpClient.entityEnclosingRequestInternal(DBHttpClient.scala:1178) at com.databricks.common.client.RawDBHttpClient.postInternal(DBHttpClient.scala:1062) at com.databricks.common.client.RawDBHttpClient.postJson(DBHttpClient.scala:757) at com.databricks.common.client.DBHttpClient.postJson(DBHttpClient.scala:574) at com.databricks.workflow.SimpleJobsSessionClient.setTaskValue(JobsSessionClient.scala:244) at com.databricks.workflow.ReliableJobsSessionClient.$anonfun$setTaskValue$1(JobsSessionClient.scala:438) at scala.runtime.java8.JFunction0$mcV$sp.apply(JFunction0$mcV$sp.java:23) at com.databricks.common.client.DBHttpClient$.retryWithDeadline(DBHttpClient.scala:375) at com.databricks.workflow.ReliableJobsSessionClient.withRetry(JobsSessionClient.scala:401) at com.databricks.workflow.ReliableJobsSessionClient.setTaskValue(JobsSessionClient.scala:438) at com.databricks.workflow.WorkflowDriver.setTaskValue(WorkflowDriver.scala:52) at com.databricks.dbutils_v1.impl.TaskValuesUtilsImpl.setJson(TaskValuesUtilsImpl.scala:49) at sun.reflect.GeneratedMethodAccessor230.invoke(Unknown Source) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:498) at py4j.reflection.MethodInvoker.invoke(MethodInvoker.java:244) at py4j.reflection.ReflectionEngine.invoke(ReflectionEngine.java:397) at py4j.Gateway.invoke(Gateway.java:306) at py4j.commands.AbstractCommand.invokeMethod(AbstractCommand.java:132) at py4j.commands.CallCommand.execute(CallCommand.java:79) at py4j.ClientServerConnection.waitForCommands(ClientServerConnection.java:199) at py4j.ClientServerConnection.run(ClientServerConnection.java:119) at java.lang.Thread.run(Thread.java:750) Caused by: java.net.SocketTimeoutException: connect timed out at java.net.PlainSocketImpl.socketConnect(Native Method) at java.net.AbstractPlainSocketImpl.doConnect(AbstractPlainSocketImpl.java:350) at java.net.AbstractPlainSocketImpl.connectToAddress(AbstractPlainSocketImpl.java:206) at java.net.AbstractPlainSocketImpl.connect(AbstractPlainSocketImpl.java:188) at java.net.SocksSocketImpl.connect(SocksSocketImpl.java:392) at java.net.Socket.connect(Socket.java:613) at org.apache.http.conn.ssl.SSLConnectionSocketFactory.connectSocket(SSLConnectionSocketFactory.java:368) at org.apache.http.impl.conn.DefaultHttpClientConnectionOperator.connect(DefaultHttpClientConnectionOperator.java:142) ... 51 more
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
11-17-2025 03:38 AM
The error you are encountering when running multiple simultaneous Databricks jobs using dbutils.jobs.taskValues.set(key, value) indicates a connection timeout issue to the Databricks backend API (connect timed out at ...us-central1.gcp.databricks.com:443) rather than a problem with your code or parameters specifically.
What This Error Means
-
The
ConnectTimeoutExceptionoccurs when a network connection to the Databricks workspace API cannot be established within the allocated time. -
When you launch several copies of the job at once (especially with many parameters), each job independently tries to communicate with the Databricks API. If there are too many simultaneous requests, they can overwhelm available network resources, Databricks API rate limits, or hit concurrency limits, leading to timeout errors.
Why Does It Work with One Job, But Not Many?
-
A single job doesn't stress your Databricks workspace's API/network resources.
-
Multiple jobs running in parallel—even if each sets only a few parameters—significantly increase the number of HTTP requests to Databricks at once, making timeouts more likely.
How To Fix & Troubleshoot
1. Stagger Job Launches
-
Instead of starting all job runs simultaneously, try launching them in batches with a slight delay, allowing resources and connections to recover between launches.
2. Reduce API Calls
-
Limit the number of calls to
dbutils.jobs.taskValues.set—combine related values into a single data structure (e.g., a dictionary) and pass them all at once, reducing overall API traffic.
3. Resource and Quota Check
-
Check workspace resource quotas, API rate limits, and concurrent job run limits on your Databricks workspace. Databricks enforces limits per workspace — review your cluster and workspace quotas and request an increase if needed.
-
Ensure the cluster itself has enough network bandwidth.
4. Network Troubleshooting
-
Ensure no network bottlenecks exist between your cluster and the Databricks control plane. If running on a secure network, test public access, VPN latency, or firewall rules.
5. Increase Timeout
-
If your logic allows, increase the connection/HTTP timeout settings, if applicable, though Databricks default timeouts are intended to ensure stability.
6. Retry Logic
-
Implement robust retry logic for failed API calls. Some Databricks SDKs and APIs offer automatic retries for transient errors.
7. Databricks Support/Docs
-
If this persists, collect all error logs and submit a case to Databricks support—as this may indicate a workspace-specific networking or control plane issue not solvable by code changes.
Summary Table
| Potential Cause | Resolution Step |
|---|---|
| API concurrency/rate limits | Stagger jobs, batch parameters, check quotas |
| Network bottlenecks | Review cluster/network configuration |
| Workspace resource limits | Request workspace/cluster limits increase |
| Excessive API calls | Reduce/aggregate parameters per call |
| Transient/timeout error | Add retry logic, increase timeouts |
This problem is common when scaling up Databricks job orchestration and typically relates to workspace or network limitations, not the correctness of the underlying application code.