You are experiencing different behaviors running a long-running requests.post() operation in Azure Databricks (Python) versus running it locally. Locally, the timeout behaves as expected, but in Databricks the client “hangs indefinitely” even after server post-processing has completed and a response (204) is sent. However, alternatives like curl as a subprocess in Databricks work as expected.
Key Observations
-
The timeout parameter in requests.post() behaves as a connect and read timeout. If the server doesn’t send any bytes for longer than timeout, a ReadTimeout should trigger.
-
With Azure Databricks Runtime (ADR), Python’s networking stack might be subtly affected by the cluster’s managed environment (network/NAT-level buffering, virtualized proxies, or custom firewall policies).
-
Your alternative test with curl works, which confirms the network route and server aren’t blocking the traffic.
-
You see expected behavior running locally; only Databricks hangs indefinitely, even though the server completes successfully.
Potential Causes
-
Databricks network virtualization: Databricks clusters often run in containers or on VMs with network proxies, which can interfere with low-level socket timeout detection by Python requests.
-
Requests library limitations: In some environments (especially with HTTP/1.1 keepalives), the Python socket layer’s timeout detection can be bypassed if the underlying TCP connection is managed by an intermediary.
-
No data transfer during server post-processing: If the server sends no traffic (not even keepalive headers or HTTP chunked responses) during its post-processing, and intermediaries or the OS network stack buffer the connection, the requests library may not detect that the server is “silent” for longer than your timeout.
-
Differences in HTTP stack between requests and curl: Curl might be handling TCP-level inactivity better and not being affected by any intermediate Databricks proxy as Python is.
How to Work Around It
1. Use curl via Subprocess
Since curl works reliably in your environment, consider making the HTTP request via Python's subprocess module, capturing the output as needed.
2. Explicitly Set stream=True
Try setting stream=True in your requests.post(). Then, read the response manually with a controlled timeout using lower-level socket timeouts.
response = requests.post(..., stream=True, timeout=(connect_timeout, read_timeout))
for chunk in response.iter_content(chunk_size=8192, decode_unicode=False):
# process chunk
But if the first byte from the server is delayed until post-processing is complete, this will not help.
3. Use Lower-Level HTTP Client
Try using http.client (stdlib) for more customizable socket-level handling.
4. Test with Different Databricks Runtimes
If possible, test the same code on different runtime versions, or an ML cluster vs. a non-ML cluster.
5. Confirm Network Middleboxes
Check if Azure NSG rules or Databricks cluster network configuration involve proxies or load balancers. These might buffer idle connections differently between Python and system-level curl.
6. Change Server Behavior (if possible)
Ask the server owner to occasionally send whitespace or HTTP/1.1 100-continue interim responses. You mentioned you can't control the server; if that's final, focus on client workarounds above.
Why the Difference?
The most probable cause is that Databricks’ network path or virtualization introduces a condition where Python's requests and underlying sockets do not get notified of a closed socket, or the network stack masks silence. Curl’s handling at the OS level might bypass this issue, or uses different buffer or keepalive logic.
Summary Table
| Approach |
Databricks Python |
Local Python |
Databricks curl |
Local curl |
| requests.post(timeout=10) |
Hangs indefinitely |
Behaves |
N/A |
N/A |
| subprocess.run(['curl']) |
Works |
Works |
Works |
Works |
Recommendation
For robust production pipelines in Azure Databricks, use curl or similar library via subprocess if server silence and networking quirks are causing issues for Python requests.