cancel
Showing results forย 
Search instead forย 
Did you mean:ย 
Data Engineering
Join discussions on data engineering best practices, architectures, and optimization strategies within the Databricks Community. Exchange insights and solutions with fellow data engineers.
cancel
Showing results forย 
Search instead forย 
Did you mean:ย 

DatabricksConnect from Python/AKS environment calling Databricks Cluster: Spark Query Call Hangs

JTBS
Visitor

I have Python 3.12 Pod in AKS using DatabricksConnect 18.1.1 connecting to Databricks cluster 18.1.

All works great and normally I see no issues running series of Spark queries 

But once a while, even without any load on dedicated cluster we have, query that normally completes under 10 seconds - does not return and will continue to show waiting on client side in AKS - even after 30 mins.

This seems like client call is hanging - not recognizing any issues with gRPC/Network or something else in between. Cluster health seems to be ok

Its not easily reproducible. Currently I have no timeouts set.

There is suggestion to use "databricks_http_timeout_seconds" as it seems like there is no default timeout set - any network errors are not picked up and client call is simply waiting. If I use this timeout , I am hoping to get failure at least in reasonable time and I can retry.

There were also suggestions to set gRPC keepalive that might fix these network specific issues: (Ref: https://community.databricks.com/t5/data-engineering/databricks-connect-serverless-grpc-issue/td-p/1...)

Can anyone suggest if this issue is noticed and will timeout and mainly "databricks_http_timeout_seconds" will fix this issue. OR there other suggestions that might help?

0 REPLIES 0