Databricks Community

Johan_Van_Noten · 3 weeks ago

As one of the steps in my data engineering pipeline, I need to perform a POST request to a http (not -s) server.
This all works fine, except for the situation described below: it then hangs indefinitely.

Environment:

Azure Databricks Runtime 13.3 LTS
Python 3.10.12
Executing from a notebook

Scenario:

Some example upload of a (big) file.

headers = {"Content-Type": f"{mime_type}"}
chunk_size = 1024*1024
response = requests.post(
	destination_repo_url, 
	headers=headers,
	auth=auth,
	timeout=10,
	data=(chunk for chunk in iter(lambda: source_file.read(chunk_size), b"")))
response.raise_for_status()

(Obviously the timeout is to be chosen, but whatever we choose, behavior is identical)

Example without file upload complexity:

response = requests.post(
   destination_REST_URL_triggering_long_operation,
   auth=auth)

Expected behavior:

Operation takes the required time (e.g. 5 minutes), then completes.
The real transfer time is limited, the server's processing is long, but finally should complete without issues.
One would expect no difference in behaviour between a local Python and the one running on the Azure Databricks cluster, or a clear motivation why it would behave differently, and how to avoid it.

Observed behavior:

Operation never ends on client side (server side completes as usual).
The provided timeout doesn't make any difference, while you would expect the read timeout to trigger since nothing is received from the server during its long postprocessing.

Alternatives tried:

curl as a subprocess works as expected.
- You see the upload taking place (in case of upload).
- Then logs multiple lines with no transfer.
- Then completes correctly after the server sends back its 204.
Trying from local system ( so not Databricks), you see using Wireshark:
- POST operation is invoked
- Data is sent (in case of the file transfer) and completes
- Server is postprocessing
- If timeout < processing time
  - Client gives up on ReadTimeout.
    This is expected, normal behavior. To avoid this, set read timeout big enough.
  - Server continues its work and completes it correctly (but doesn't report it anymore because of connection closure by client).
- Else
  - Server replies the usual 204 once completed
  - Client completes the blocking post request normally.
I don't have the server's implementation under control, so I can't change that.

mark_ott · 3 weeks ago

You are experiencing different behaviors running a long-running requests.post() operation in Azure Databricks (Python) versus running it locally. Locally, the timeout behaves as expected, but in Databricks the client “hangs indefinitely” even after server post-processing has completed and a response (204) is sent. However, alternatives like curl as a subprocess in Databricks work as expected.

Key Observations

The timeout parameter in requests.post() behaves as a connect and read timeout. If the server doesn’t send any bytes for longer than timeout, a ReadTimeout should trigger.
With Azure Databricks Runtime (ADR), Python’s networking stack might be subtly affected by the cluster’s managed environment (network/NAT-level buffering, virtualized proxies, or custom firewall policies).
Your alternative test with curl works, which confirms the network route and server aren’t blocking the traffic.
You see expected behavior running locally; only Databricks hangs indefinitely, even though the server completes successfully.

Potential Causes

Databricks network virtualization: Databricks clusters often run in containers or on VMs with network proxies, which can interfere with low-level socket timeout detection by Python requests.
Requests library limitations: In some environments (especially with HTTP/1.1 keepalives), the Python socket layer’s timeout detection can be bypassed if the underlying TCP connection is managed by an intermediary.
No data transfer during server post-processing: If the server sends no traffic (not even keepalive headers or HTTP chunked responses) during its post-processing, and intermediaries or the OS network stack buffer the connection, the requests library may not detect that the server is “silent” for longer than your timeout.
Differences in HTTP stack between requests and curl: Curl might be handling TCP-level inactivity better and not being affected by any intermediate Databricks proxy as Python is.

How to Work Around It

1. Use `curl` via Subprocess

Since curl works reliably in your environment, consider making the HTTP request via Python's subprocess module, capturing the output as needed.

2. Explicitly Set `stream=True`

Try setting stream=True in your requests.post(). Then, read the response manually with a controlled timeout using lower-level socket timeouts.

python

response = requests.post(..., stream=True, timeout=(connect_timeout, read_timeout))
for chunk in response.iter_content(chunk_size=8192, decode_unicode=False):
    # process chunk

But if the first byte from the server is delayed until post-processing is complete, this will not help.

3. Use Lower-Level HTTP Client

Try using http.client (stdlib) for more customizable socket-level handling.

4. Test with Different Databricks Runtimes

If possible, test the same code on different runtime versions, or an ML cluster vs. a non-ML cluster.

5. Confirm Network Middleboxes

Check if Azure NSG rules or Databricks cluster network configuration involve proxies or load balancers. These might buffer idle connections differently between Python and system-level curl.

6. Change Server Behavior (if possible)

Ask the server owner to occasionally send whitespace or HTTP/1.1 100-continue interim responses. You mentioned you can't control the server; if that's final, focus on client workarounds above.

Why the Difference?

The most probable cause is that Databricks’ network path or virtualization introduces a condition where Python's requests and underlying sockets do not get notified of a closed socket, or the network stack masks silence. Curl’s handling at the OS level might bypass this issue, or uses different buffer or keepalive logic.

Summary Table

Approach	Databricks Python	Local Python	Databricks curl	Local curl
requests.post(timeout=10)	Hangs indefinitely	Behaves	N/A	N/A
subprocess.run(['curl'])	Works	Works	Works	Works

Recommendation

For robust production pipelines in Azure Databricks, use curl or similar library via subprocess if server silence and networking quirks are causing issues for Python requests.

Johan_Van_Noten · 3 weeks ago

Thanks for your quick and extensive reply.
Given that I don't have any administration rights on the Azure/Databricks environment and don't have the REST-server under control, some of the sensible suggestions are difficult.
I will work with IT to check the Azure/Databricks settings.
In the meantime I will keep using the curl workaround.

siva-anantha · 2 weeks ago

Hello,
IMHO, having a HTTP related task in a Spark cluster is an anti-pattern. This kind of code executes at the Driver, it will be synchronous and adds overhead. This is one of the reasons, DLT (or SDP - Spark Declarative Pipeline) does not have REST based tasks.

Please review if this task can be done outside Databricks like below,
1) Event based trigger: push the result from Databricks to cloud storage; and this creates an event (Event grid) to a listener like Function/Logic App that will perform HTTP task
2) Classic Poller: Azure Function App to check for an expectation every 'n' mins. if met; execute the HTTP task

Databricks Community

Long-running Python http POST hangs

Key Observations

Potential Causes

How to Work Around It

1. Use `curl` via Subprocess

2. Explicitly Set `stream=True`

3. Use Lower-Level HTTP Client

4. Test with Different Databricks Runtimes

5. Confirm Network Middleboxes

6. Change Server Behavior (if possible)

Why the Difference?

Summary Table

Recommendation

Join Us as a Local Community Builder!

🎤 Call for Presentations: Data + AI Summit 2026 is Open!

Last Chance: Help Shape the 2026 Data + AI Summit | Win a Full Conference Pass

🌟 Community Pulse: Your Weekly Roundup! December 05 – 11, 2025

Jaipur Usergroup First Virtual Meetup: AI/BI Genie + Data Science Careers — 19 Dec | 6 PM IST

Lakehouse, Lagers & Legends — Bangalore Meetup | December 13

Databricks Community

Long-running Python http POST hangs

Key Observations

Potential Causes

How to Work Around It

1. Use curl via Subprocess

2. Explicitly Set stream=True

3. Use Lower-Level HTTP Client

4. Test with Different Databricks Runtimes

5. Confirm Network Middleboxes

6. Change Server Behavior (if possible)

Why the Difference?

Summary Table

Recommendation

Join Us as a Local Community Builder!

🎤 Call for Presentations: Data + AI Summit 2026 is Open!

Last Chance: Help Shape the 2026 Data + AI Summit | Win a Full Conference Pass

🌟 Community Pulse: Your Weekly Roundup! December 05 – 11, 2025

Jaipur Usergroup First Virtual Meetup: AI/BI Genie + Data Science Careers — 19 Dec | 6 PM IST

Lakehouse, Lagers & Legends — Bangalore Meetup | December 13

1. Use `curl` via Subprocess

2. Explicitly Set `stream=True`